Lab notes #005 Moving on from the low level

Reading more papers, mostly focused on Relaxed Radix Balanced Trees, since other trees, such as B-trees, Finger Trees, and HAMT trees seem to predate the research on this one. The premise is generally pretty simple, but the devil is in the details. The papers on it are fairly clear.

I was debating whether to implement it as a solid foundation, but I think it's the wrong way to go initially. The fundamental thesis is up in userland, not down here, so I'd need to find a simpler solution to the commodity.

The hack is that I'll have to fork im crate, and add in an insert method that returns a transient RRB-tree that I can keep inserting, and when it's done, it returns the leaf node itself. But as it turns out, I can just expose a node on lookup instead. that was much easier to do. That way I can persist the node upon updates, and move on to the rest of the database architecture.

As for the database architecture, it was a little difficult to piece together the details of the sequences that happen, but it appears there are two main processes: processing transactions and indexing. For transations:

  • Receive incoming transaction into the transactor
  • Expand functional transactions into more assertions and retractions
  • Apply assertions and retractions into the transactor's live index functionally
  • If all succeeds, log transactions to the durable log
  • Then broadcast assertions and retractions to peers for their own insertions into the live index

For Indexing:

  • Walk the live index to find segments that needs replacing in the storage-backed index. There are some details about what happens when new transactions come in while indexing that are a bit unclear.
  • Broadcast index completion to peers.
  • Peers drop their live index, and can now get the same data from the storage-backed index.

There are so many questions in the details. Are assertions and retractions recorded in the index? Presumably, logging transactions to the durable log occur after applying to the live index because insertion into the live index can fail. But how come insertion into the live index can fail? Shouldn't the Peer cache their live index, so they don't have to make another round trip to storage to get the data that was in their live index?

In addition, I'm reviewing the chapters on functors and making sure that I didn't miss anything in my first read. Back on Bifunctors now. Memorizing all the definitions, help build upon the reading, more than anything else I've studied. Still not clear what a Kleisli Category is, though I understand the example.