Lab note #039 Semantics of Composed CRDTs
After the implementation of the Prolly Tree, I set about writing an immutable map. It was pretty straightforward as most of the heavy lifting is done by the Prolly Tree implementation. The immutable map just keeps a simple list of roots after every change. There are a bunch of improvements I can make upon this (but will refrain):
- There is currently no way to make concurrent changes, because there's no branching with a list of roots. Probably some kind of map data structure with tracking of the latest root will work.
- There is currently no way to find an ancestor root without scanning the entire history. I think this can probably be mitigated by some kind of skiplist-like index.
There are other improvements I can make for the time being, but I wanted to move on to implementing basic CRDTs and composing them.
When I was planning the CRDTs, I decided to use on the Causal Length Set and the Merkle CRDTs. When implementing the Causal Length Set and a Grow-only Counter with the Immutable map, it was pretty smooth. However, when I started thinking about compositionality with a Causal Length Map that can contain one of the other CRDTs, it started to break down.
Why did I think this would work in the first place? I started going over my notes, re-reading papers, and going through resources to load it all back into my head. Maybe I didn't know it that well in the first place until I had to implement it.
I think while reading the two papers, they had just stated that if the payloads of the Causal Length Set and the Merkle CRDT were also CRDTs, then the whole thing would still be a CRDT. While technically true, it's not useful. I was able to find cases where the value of the composite CRDT would converge, but it wouldn't make semantic sense for an application I'm thinking about.
This is often glossed over in many of the posts and literature on CRDTs. Compositionality is not straightforward, and need some careful stacking to make sure it all works. As chance would have it, someone asked the same question on the Ink and Switch Discord, and my take-away is just that there's no generalized way. Everyone just muddles through it for their specific application, since the same data structure might have different semantics for different applications. Hence you need to find the one that is tailored to your specific application.
I think this is probably a rich vein to explore. Category theorists should be all over this to find composable structures that are also CRDTs. That said, recently I noticed the CUE lang types and values are defined on a lattice, which means their operations like validation can also be applied in any order.
Could we make a lattice of CRDT types and their values, so we have a way of validating whether a composite type has the semantics that we want? I haven't thought deeply about this, and this is probably another research-level rabbit hole.
For now, the plan is side-step all these nerd-sniping traps to reconcile the typical implementations of CRDTs with the Merkle chain model in my head. Once I can do that, I think I should be able to implement a CRDT Set with the immutable map that I've built. From there, I can implement Maps and Registers. That should be enough to model a relational model. Wish me luck.
Reading through various (but not limited to) things that have to do with composition.