Lab Notes #024 Merkle DAGs and retrofitting into relational

Lab Notes #024 Merkle DAGs and retrofitting into relational

I been reading various papers on CRDTs to deepen my understanding of them. That's when I happened upon a paper that constructed Merkle-CRDTs from Merkle-DAGs.

Op-based CRDTs keep a log of ops in causal order to reconcile concurrent state changes consistently amongst all replicas. Merkle-DAGs can be used to do the same job! This changed my perspective of what CRDTs could be.

It also hinted at how to compose CRDTs, mostly by delegation. CRDTs have internal bookkeeping that represents the value when taken in aggregate. By storing other CRDTs in this bookkeeping and delegating the merge to them, you can compose CRDTs.

I found a second Merkle-CRDT paper, this time using Merkle Search Trees (MST). If constructing CRDTs from Merkle-DAGs is akin to op-based CRDTs, constructing CRDTs from Merkle Search Trees is akin to state-based CRDTs. It's a modified Merkle Tree, where values are ordered and distributed uniformly across all layers of a tree by the content hash.

Interestingly, a merkle search tree can be made into different CRDTs by choice of in the payload of the MST. To construct a Grow-only Set, the payload has a member of the set { ⏉, ⏊ }. If the value is top, the member exists in the set, otherwise, the member is not.

Delta state-based OR-Set CRDTs implemented with Dots grows with the number of replicas. Ops-based OR-Set CRDTs grows with a causal history grows with the number of events. Causal Length Set CRDT doesn't seem to have this limitation, which makes it a great candidate for use in a merkle search tree.

OR-Maps can then be implemented with values tagged with the keys as a tuple. With that, we should have all we need to retrofit all of this into a relational paradigm. There is previous work on doing this with Conflict-free Replicated Relations (CRR), but the state isn't immutable. The content hash nature of Merkle Search Trees gives us a way out, as it can refer to different chunks of immutable values that can be reused.

Looking into the extension ecosystems of both SQLite and Postgres, I originally was going to go about this by writing a custom datatype with functions to augment the query, like the JSON data type. But with further investigation, virtual tables and foreign data wrappers should allow me to leverage synable tables constructed from MSTs.