Lab note #040 Hitting a wall and going back to basics with CRDTs

Lab note #040 Hitting a wall and going back to basics with CRDTs

This continues my exploration into CRDTs, beyond the typical tutorials. The more I looked, the more threads there were to pull on. I currently have a couple of lines of questioning to answer:

  • How do we compose CRDTs so they both converge to a single state without coordination but with a semantically coherent state?
  • What are the downsides of using Merkle clocks instead of vector clocks as logical clocks in CRDTs? (I won't address this question in this note)

Concerning the composition of CRDTs, you can't just put one inside of the other and call it a day. I started thinking about the simplest case: Last-write-wins Registers (LWW Regs) inside of an Add-wins Observed Removed Set (AWOR Set). Each is a CRDT, but combined, it's unclear what the merge semantics should be.

A set assumes its elements are immutable values, but it's unclear whether the immutable value should be the LWW Reg as a container or the value inside the LWW Reg.

In the case where the set is holding LWW Reg as a container, the LWW Reg is in charge of determining the merge semantics of its value, while the AWOR Set is in charge of the membership of each container. However, this dynamic results in sets that can have repeated values. The set is only aware we have different containers, not different values inside of a LWW Reg. This violates expectations of a set and is unlikely what a user would want.

But the AWOR Set considers the value inside of a LWW Reg as the immutable value to determine membership, it seems redundant to have a LWW Reg at all. You'd no longer use a LWW Reg and change the semantics of the set to a Last-write wins Observed-remove Set.

This type of oddity goes away with AWOR Maps, since keys are unique, and the values are independent. Hence, AWOR Maps can be in charge of membership based on keys and the value can be any CRDT in charge of the merge semantics of its own internal value.

However, that's not to say there aren't oddities with Maps. If you compose a Map with itself to get nested maps, what happens when you add to a parent that gets deleted on another replica? If we were trying to maintain a tree with nested maps, it'd be possible to get a cycle if two replicas are reparenting.

From How Figma’s multiplayer technology works

This has been addressed with some posts and papers I haven't fully gone through.

But looking through any materials I can get my hands on, it doesn't seem like there's a coherent or general set of guarantees about how to compose CRDTs without these weird oddities between the operations of CRDTs or semantics that would come up.

So right now, people stack CRDTs very carefully to try and avoid these kinds of problems. And if they can't, they write a paper or introduce a centralized server.

I wanted to implement the Merkle CRDT, but I had problems making the leap. I realized that I didn't quite understand what the vector clock represented. While I understand it mechanically, I don't understand how it maps to a Merkle-DAG. If we remove the replicaIds in a clock, I suppose different branches in the Merkle-DAG will represent concurrent changes. While that seems to work with sets, because sets are only concerned with content, it doesn't seem to work with counters, which need to retain the per-replica counts in order to get the right total count.

So I'm admitting that I need to go back to basics and implement the Op-based CRDT to get a good intuitive understanding of the vector clock. Then move to Pure Op-based. And finally, move to the Merkle-based version.