Lab note #019 Local-first Data Model

Lab note #019 Local-first Data Model

I'm making progress with the todo list. While I can only add tasks right now, it persists the data without a server and can sync between different open tabs (within the same browser). I've built my share of web apps before, yet I've never built a local-first app. It does require a little bit of rethinking on how to do the data model. Two things came out of that.

First, I found that GPT-4 was actually a good partner for thinking about how to design data models. Usually, it wasn't very good for design tasks that I set it upon, but it did pretty well here. I was able to describe to it what I was trying to build a database table for, throw out a couple sample columns, and then ask it to suggest others I might have missed.

Second, it proved to be a valuable exercise to re-examine the requirements of an in-browser database. Prior to this, my approach to addressing the state management needs of a front-end was based on my experience in writing them. But here, I had concrete needs in front of me.

I wanted to see if I could get something workable with just Automerge and vanilla SQLite WASM. After implementing it, a glaring problem is that we now have two states to sync with each other: the application state for rendering and SQLite for querying and persistence. When the CRDT is over the entire application state, this is not a problem. No matter the change that comes in, it just gets merged and persisted. But I opted for more granularity by making every task a CRDT. This is for a couple reasons:

  • Clients don't need the entire application state all at once. Having a top-level CRDT means shipping the entire application state over from the archive server, even though you don't need most of the data in it.
  • A top-level CRDT makes the data inside of it un-queryable if stored as is. By making every task a CRDT, I can store the data the CRDT represents in a json column separate from the CRDT itself. This doubles the storage requirement, but we can now do queries for a partial fetch of the entire dataset.

However, that now means the top level list of tasks is not managed by CRDTs and hence need to be manually inserted into the application state. There's currently no way to do partial CRDTs that's readily available as an open source library. I think the only viable way is for CRDT changes not to directly change the application state, but for it to find and update the correct task CRDT based on the task id.

Ideally, the query should be reactive, but for now, we can re-query the database and have that fill out the application state whenever there's any new state change. However, this might be too slow, so for now, I optimistically update the application state directly first, and then have the re-query come in with the data later.

To me, this still points to a need for an in-browser database that acts as state management beyond useState/useReducer and their ilk, without a lot of the manual juggling of CRDTs. So far, it's promising, and the goal is to start using this as the task tracker for the todo list itself soon.