Diffing

From apm
Jump to: navigation, search
This article is a stub. It needs to be expanded.

This page is about various form of generalized differentiation.

Application cases

3D modeling - finding points on surfaces

Usage of: Generalized differentiation (bottom up)

Beautiful differentiation:
This is likely useful for 3D modeling constructive solid geometry
in functional representation (algebraic varieties).
As multidimensional gradients and curvatures are
to be used to find points on surfaces.

Storing content addressed codebases with discovered representation of history

Usage of: Differentiating datastructures

Desired things:

  • compact storage of data
  • fine-grained undoability revertability
  • mix-and-meshability with older versions

Put the abstract Syntax tree (AST) of code into algebraic datatypes (ADTs) Put an edit calculus into the same ADT.

Try to use insights from differentiating datastructures to
arrive at an elegant minimal incidental complexity solution

Question that poses itself: How to represent version history?
As versions can merge and split undo history lists and trees are likely a bad choice.
It's a directed acyclic graph (DAG) structure.
(TODO: Investigate if zippers can be done for DAG datastructures too.)

Enable collaboration in image-based programming via types and structure editing

– Paper: "Typed Image-based Programming with Structure Editing"
– Authors: Jonathan Edwards (independent), Tomas Petricek (University of Kent)

This paper presents a way to ...
merge edit conflicts in a very fine-grained way
merge edit conflicts in a very natural way

It does so by ...

  • introducing an edit calculus of allowed operations on a data-schema to version-manage
  • identifying an uniquely defined "youngest" common ancestor (youngest in terms of number of edits not in terms of time)
  • depending on the nature of new edits they are either appended or migrated back to the common ancestor over it and to the other version.

Side-note: Not all edit calculus operations need to be. Some conflicts should manually be resolved.
Side-note: Always seeking the "youngest" common ancestor there seems somewhat analogy to seeking the most universal element in universal construction in category theory.

As there is edit history and it needs to be kept consistent with the current version structures,
this is not really possible in programming with plaintext files. One needs:

Thus in the abstract: "Version Control for Structure Editing".

Applying it to the whole AST of a programming language not juts its ADTs

As this core idea can be applied to any and all data it could also be applied to
a languages complete abstract syntax tree (AST).
That is an edit calculus not just for edits on algebraic data type (ADTs)
but an edit calculus for efitts on the whole abstract syntax tree (AST) subsuming ADTs.
This more general edit calculus would then need to be capable of representing all valid code transformations
Including both edits like like code substitutions (beta conversions) and edits on the ADTs of the language.

Things that are not so quite clear yet:
Edits in this model form a linear chains.
But generally the topology of versions forms a DAG (directed acyclic graph).
As this model is relational (that is: any two schema versions can youngest common ancestor combined)
It should be compatible with DAG structure. A DAG network of chains?

Also Abstract syntax trees are proper trees.
Thus known insights from differentiating datastructures (zippers for trees) should be usable for a natural choice of edit calculus
Though the restriction of zippers to only allow for local steps (many steps in walking to completely elsewhere in the syntax tree) might be undesirable property.

Question: Would the zipper diffing type edits be somehow nested within the rest of the diffing calculus?

Driving the meta madness to the max: One thing is quite clear. Diffing the list of edits/diffs (using edit calculus on the edit calculus) will make little sense are the successive edits will have virtually no correlation beside happening in proximity in the AST perhaps.

Where to store the list of edit calculus transformations?
As the edit calculus is formal it can be part of the same language as the language it edits.
One would end up storing edits of ADTs (and other code transformations) within ADTs.
A Homoiconic language design would mean that these ADTs are from the very same language.

Kinds of generalized diffing

Differentiating datastructures

See: Curry-Howard-Lambeck isomorphism With lists and trees as algebrais data types (ADTs) One can interpret them as algebraic structures and literally differentiate. These differentiation correspond to so called zipper datastructures usable to represent diffs on these original ADTs.

Eventually an analogy to Taylor series might be found.

Generalized differentiation (top down)

See 2020 link below.
Needs investigating.

Generalized differentiation (bottom up)

See: Beautiful differentiation (Conal Elliott)

Edit calculus

A formal calculus to make edits on structures.

  • Either edits on data scemas like data records or algrebraid catatypes (ADTs)
  • Or (more generally) edits on formal languages abstract syntax tree (AST).

Depending on it's design an edit calculus may seems more invented rather than discovered.
So maybe one can glean a bit from the other cases to arrive at an elegant (low incidental complexity) solution.

Schema diffing – version control for structural-projectional editing

– Paper: "Typed Image-based Programming with Structure Editing"
See discussion of possible application above


  • This seems quite strongly related to homoiconicity
  • There seems to be some analogy to universal construction in category theory

Related

External links