Content addressed
- Unison language – content addressed on the single closure level
- Holochain DHT – content addressed on the per file level?
- IPFS – content addressed on the per file level
- Git – content addressed on the per file level
- Nix – content addressed on the per package level?
- ...
While access channels to data can still break and
while data still can get deleted/destroyed
Links to data are no longer fundamentally fragile
(think: no longer silently breaking symlinks in file systems)
Related: Content addressed objects do not suddenly switch to an new incompatible version of
- some library or
- some function or
- some (vector)graphic that no longer fits the transclusion context
No, a content address always and forever points to the same data.
It's immutable.
If the word immutability raises the panic flag of running out of storage in no time then that worry is not unreasonable.
But as it stands now (2021) every installed program comes with a redundant copy of near a whole operating system (exaggerating a bit)
Reason: Developers increasingly are giving up on code sharing, given the rampant dependency hell caused by fragile non content addressed links.
Contents
Purely functional data structures
Dealing with immutable data-structures is not easy and was
(especially in the early days of computer science)
considered by some as an potential ultimate roadblock.
But by now a lot of new insights ave been gained and
libraries have been written.
Saving data by storing diffs (and "keyframes")
What is needed is smart ways in storing data by diffing.
Interestingly these is a discovered rather than invented deep formalism that eventually can be used.
From every (infinite?) data-structure (like e.g. lists, trees, ...)
there can be derived a zipper datastructure by "differentiation" of that data-structure.
"Differentiation" really means analytic differentiation as in d/dx(x^2) = 2x
Storing data in the form of what needs to be inserter into the zipper to reconstruct the data corresponds to storing by diffing it seems.
(well a rigid subset of diffing that is – no "key-frames" here).
When getting to know that datastructures can be differentiated for the first time that may sound wild, but it really works.
First one needs the datatype that ought to be differentiated in the form of an algebraic data type (ADT).
An ADT is made up of product types and sum types:
- an n-tuple is a product-type
- en enum is a sum-type
For datatypes like lists and trees one needs nested recursive definitions.
These definitions can be rewritten literally with sums and products (as the names imply).
With some simple math one ends up with a formula representing the datatype (also called a generating function)
Now the normal mathematical derivative can be taken.
The result formula can be converted back into a datatype and voila one has the zipper.
(wiki-TODO: add example math)
A practical data-storage system would maybe need somehow a fractal pattern of key-frames interspersed. (??)
This sounds like a lot of thinking will be needed.
Some interesting trivia:
Actually one can take higher order derivatives of data-structures
(second derivative corresponds to the diffs of the diffs)
and make a whole Taylor series.
- This is category theory – algebraic datatype make Cartesian closed categories is seems – functions correspond to powers
- This is part of the Curry-Howard-Lambeck correspondence – propositions as types as algebraic-expressions
- Generating functions occur in physics in the context of conserved quantities and the Nöther theorem
Related
- Programming languages
- Relations of APM to purely functional programming
- Reversible computing
- Garbage collection (Software)
- "data hoarding"
- "space leak"
External links
- https://www.unisonweb.org/
- Git – https://git-scm.com/
- InterPlanetary File System – https://ipfs.io/
- Nix (Paketmanager) – https://nixos.org/
- https://holochain.org/
- BitTorrent https://www.bittorrent.com/
- Purely functional data structure
- "Purely functional Data Structures" by Chris Ohasaki 1996 – (pdf)