Lab notes #021 CRDTs in depth and AI explaining code

Lab notes #021 CRDTs in depth and AI explaining code

In the course of building the demo todo list local-first app, I've found that different CRDT libraries have different assumptions about the data access pattern you're using. Automerge and Yjs prioritize text editing, but Automerge assumes a JSON interaction whereas Yjs assumes composable data structures. CR-sqlite and Evolu both presume relational access. ElectricSQL allows both SQL and graphQL-like queries. Currently, I'm not sure what to make of this. It could be there's no one-size fits all CRDT library and we have a proliferation for specific use-cases. Or CRDTs will become more like design patterns--subsumed and unnamed in an architecture, but recognized by devs familiar with the pattern.

I'd had a high level understanding of CRDTs for a long time, but last week, I decided to dive in a lot deeper to understand the details, mostly to figure out how CRDTs composed. Bartosz Sypytkowski's series on CRDTs was incredibly helpful. I also read the overview CRDT papers by Marc Shapiro. The result of all this reading resulted in a question about why op-based CRDTs over delta-state CRDTs? What were the tradeoffs? What about between the two types of delta-state CRDTs?

After all that, I come away with a slightly different perspective than I had before. CRDTs use monotonically increasing data with operations restricted by algebraic properties to ensure a convergence of state without coordination with other replicas. But in practice, shipping entire states between replicas is impractical, so CRDT variations try to find the smallest piece of data to send over the wire while retaining CRDT properties and behavior established by state-based CRDTs by making tradeoffs.

This perspective yielded a view that sees an equivalence with incremental computing. Both have a conceptual transformation in mind, but want a substitute transformation that does the same thing, but with less cost. The difference though, is that the user manually sequences a fixed palette of operations or deltas to send over the wire, where as incremental wants these operations or deltas to be generated based on the change in input. I'd always had a hard time understanding explanations of how Differential Dataflow worked, but understood it to be based on counting. I wonder if CRDTs use of vector clocks can help inform that understanding of Differential Dataflow. If anyone's seen any papers equating the two, I'd love to read it.

Another new thing I'm doing is to use Cursor.sh to help me understand parts of the Automerge, Yjs, and Actual code bases. By chatting with the AI, I'm attempting to understand the code bases. I've found it to be–very specific–in what it needs in order to be helpful. You can't really just ask it to explain the core idea of the code-base to you. You actually have to be intentionally specific about providing the right context and asking it very specific questions. Then from the answers that it gives, you piece together all the parts yourself to give a cohesive picture. In fact, I found that plugins that help you jump to definitions or implementations to sometimes be more helpful than whatever the AI could provide as an answer.