Lab notes #003 Database as Value in Depth
I spent a bunch of time mapping out the architecture of Datomic in detail. Diving into all the details revealed all the different places my knowledge was spotty, or where Rich Hickey glossed over the details.
- Are the internal nodes of the storage-backed index only in memory, or are they also written to the KV store?
- What are the reasons why processing incoming transactions to the live index would fail? Why are the transactions inserted into the live index before they're written to the transaction log?
- What are the seven endpoints of the cluster protocol, and what are the pitfalls when coming to that design?
- How do you merge two B+ trees quickly between the live index and the storage-backed index?
- When transactions are being processed into the live index, more transactions can come in. What does it mean to put the index back into the transaction processing queue to pull in the rest of the transactions?
- How would a B+ tree be persistent if there's a doubly linked list at the leaves of the tree? Every node with a pointer to the new node will need to be replaced. It works in a tree because it goes back the height of the tree to the root. But for a doubly linked list, you'd end up replacing the whole tree.
- Are versions encoded in the key of a node?
And many more. I think I'd need to look at how datascript answers some of these questions. Through the course of the investigation, I looked a bunch of different trees that I never dove too deep into before: B+trees, Skip Lists, Treaps, and Zip Trees.
This is a video on responsive compilers. It's a surprise, but also a confirmation of the disparate things I've been investigating are all related. I think there should be a database embedded in programming language environments.