An incomplete Unicit Merkle Tree

An incomplete Unicit Merkle Tree
When I had mistakenly thought that Prolly Trees weren't unict, I thought maybe an interesting way forward was to leverage the same mechanism CRDTs use to converge.

Below is a mathematical description of what I called a Lattice Merkle Tree, still incomplete. I had time-boxed this, so I got as far as I could in a limited amount of time.

I'm putting it out there as a pin, to come back to it at a later time. If anyone wants to carry the torch from here on out, they're welcome to.

We want a data structure that holds an ordered set of keys, as a way to build composite CRDTs. The keys must have a total order.

This data structure should be like a B+tree, Prolly Tree, and Merkle Tree. It should have the following properties:

  • High-fanout on each node.
  • The tree is balanced and low in height to keep query complexity around O(log n)
  • Balancing the tree will change as little of the tree as possible upon insertion of keys.
  • The tree is unicit, where the order of insertion doesn't alter the state of the tree. The same ordered set of keys will yield the same tree.

The construction of the tree is as follows:

  • A leaf node contains entries that are an ordered subsequence of keys in the set.
  • A branch node contains entries of the content hash and the range of keys of every child node.
  • Regardless of type, the ideal node size is \(N\).
  • It's a Merkle tree, so each node is identified with a hash of its contents. A leaf node is a hash of its keys. A branch node is a hash of the concatenations of the child node hashes and ranges.

Balance with history independence

As with any tree, we want ours to be balanced, where the number of entries in each node should be about the same. Balanced trees are desirable for efficient queries. At the same time, we want unicity, where the same set of ordered keys must result in the same tree, regardless of insertion order. Unicit trees are desirable for Merkle Trees for fast equality comparisons by simply comparing root hashes.

In B+trees. nodes are balanced on insert, but the structure of the tree is not history-independent. Two B+trees could be different, even if they contain the same set of keys.

In Merkle Search Trees and Prolly Trees, nodes are probabilistically balanced. While the expected value of nodes is \(N\), the variance of node sizes is large. There is a variant of Prolly Trees with a tighter variance of node sizes, but it is not history-independent.

It turns out unicity is in conflict with balancing nodes. Could we use some of the same math in CRDTs to get a unique balanced tree regardless of the insertion order of keys?

To do that, we'll need:

  • An element type that describes how the ordered set of keys is divided between leaf nodes.
  • A partially ordered set of elements defined with a binary relation that orders two elements, \(A \leq B\)
  • A merge function that combines two elements into a third element that is closed over the element type.

The binary relation of a partially ordered set needs to have the following properties: Reflexive, Antisymmetric, and Transitive.

The merge function needs to have the following properties: Commutative, Associative, and Idempotent.

We'll define these properties later.

Multi-Split Description

How do we define the elements in the partially ordered set that can describe how keys are split between leaf nodes? We define Multi-split descriptions (MSD) that describe the splits.

  • A multi-split description can be represented as a sequence of key groups: \(A = [K_0^A, K_1^A, \ldots, K_i^A]\), where each key group \(K_j^A\) is an ordered subset of keys.
  • The key groups are ordered in the sense that if \(j<k\), then every key in \(K_j\)​ is less than every key in \(K_k\)​ according to the total order of keys.
  • Each key group \(K_j\)​ represents a set of keys that would reside together in a node if the tree were split at this point.
Enjoying what you're reading? Subscribe for more

Partially Ordered Set

To get a partially ordered set (poset) of MSDs, we'd have to define a binary relation \(\leq\) that orders two MSDs. Keep in mind that the keys in a key group of an MSD have a total order, but the MSDs themselves have a partial order.

The binary relation has two criteria: a primary criterion based on closeness/variance and a secondary criterion based on subsequences.

Primary Criterion: Closeness/Variance

Each MSD \(A = [K_0^A, K_1^A, \ldots, K_i^A]\) is evaluated based on how closely each key group \(K_i^A\)​ adheres to a target node size \(N\) and the variance in sizes across the key groups. The metric for closeness/variance is

$$M_A = \sum_{i=0}^{n} (\text{length}(K_i^A) - N)^2$$

where \(n\) is the total number of key groups in the multi-split description \(A\), \(K_i^A\) is the ith key group of MSD \(A\), and \(N\) is the target node size.

Secondary Criterion: Subsequences

There can be multiple MSDs with the same primary metric \(M\), so the secondary criterion serves as a tie-breaker and indication of partial order based on subsequences.  For two multi-split descriptions \(A\) and \(B\), \(A\) is considered less than or equal to \(B\) under this criterion if each key group in \(A\) (\(K_i^A\)), is a subsequence of the corresponding key group in \(B\), (\(K_i^B\)).

Binary Relation Definition

For two multi-split descriptions \(A\) and \(B\), we define \(A \leq B\) in the poset if:

  1. Primary: The closeness/variance measure \(M_A​ \geq M_B\). Notice the direction of the inequality is reversed. The closer \(M\) is to zero, the better balanced it is.
  2. Secondary: If two descriptions \(A\) and \(B\) have the same closeness/variance measure \(M_A = M_B\)​, then \(A \leq B\) if each key group \(K_i^A\) in \(A\) is a subsequence of the corresponding key group \(K_i^B\) in \(B\).

Required Properties of a Partially Ordered Set

  • Reflexivity: \(A \leq A\)
    Every multi-split description \(A\) is reflexive with itself since \(M_A = M_A\)​ and each key group \(K_i^A\)​ is trivially a subsequence of itself. Therefore, \(A \leq A\).
  • Antisymmetry: If \(A \leq B\) and \(B \leq A\), then \(A = B\).
    Assume \(A \leq B\) and \(B \leq A\). Hence, \(M_A \geq M_B\) and \(\text{for each } i, K_i^A \text{ is a subsequence of } K_i^B\). Also, \(M_B \geq M_A\) and \(\text{for each } i, K_i^B \text{ is a subsequence of } K_i^A\). From \(M_A \geq M_B\) and \(M_B \geq M_A\), we deduce that \(M_A = M_B\). And since each \(K_i^A\)​ is a subsequence of \(K_i^B\) and each \(K_i^B\) is a subsequence of \(K_i^A\), each pair of corresponding key groups must be identical. That is, for each \(i\), \(K_i^A = K_i^B\).
  • Transitivity: If \(A \leq B\) and \(B \leq C\), then \(A = C\)
    If \(A \leq B\) (meaning \(M_A \geq M_B\)​ and subsequences align) and \(B \leq C\) (meaning \(M_B \geq M_C\)​ and subsequences align), then \(M_A \geq M_C\)​ and the subsequence condition from \(A\) to \(C\) will also hold, thus \(A \leq C\). Therefore, If \(A \leq B\) and \(B \leq C\), then \(A \leq C\).

Merge function

Now that we have a partially ordered set (poset) of MSDs, we need to define a merge function. The merge function combines two MSDs and returns a third MSD that represents the Least Upper Bound of the two MSDs in the join-semilattice.

$$A \sqcup B = C$$

where \(A\) and \(B\) are MSDs that combine into the merged MSD \(C\).

An outline of this merge function is as follows:

  • For every key group \(K_i^A\) in \(A\), merge it with its corresponding key group \(K_i^B\) in \(B\). Recall each key group is a totally ordered set, and each key is unique, so they can be merged according to a set merge.
  • For the key group \(K_c^{A \sqcup B}\) that changed at index \(c\), due to the insertion or deletion of a key, we rebalance the splits between the key group and its neighbouring siblings. This has to be a function that only relies upon the changed key group \(K_{c}^{A \sqcup B}\), and its previous sibling \(K_{c-1}^{A \sqcup B}\) and next sibling \(K_{c+1}^{A \sqcup B}\).
  • After this rebalancing, we change the content hash references to the child node on the parent node recursively back up to the root node.

The rebalancing function must be a pure function of the three key groups under consideration. There is a list of possible "moves" to rebalance the three key groups:

  • Do nothing (leave \(K_{c}^{A \sqcup B}\) as is)
  • Move the leftmost key of \(K_{c}^{A \sqcup B}\) to \(K_{c-1}^{A \sqcup B}\)
  • Move the rightmost key of \(K_{c}^{A \sqcup B}\) to \(K_{c+1}^{A \sqcup B}\)
  • Borrow the rightmost key of \(K_{c-1}^{A \sqcup B}\) to \(K_{c}^{A \sqcup B}\)
  • Borrow the leftmost key of \(K_{c+1}^{A \sqcup B}\) to \(K_{c}^{A \sqcup B}\)
  • Split \(K_{c}^{A \sqcup B}\) into two nodes (at the middle index or median key value).

For each move, calculate the closeness metric.

$$M_{A \sqcup B} = \sum_{i=c-1}^{c+1} (\text{length}(K_i^{A \sqcup B}) - N)^2$$

where \(K_i^{A \sqcup B}\) denotes the original state of the key groups. Because the rest of the tree is unchanged, we don't need to take into account the contribution of the unchanged nodes to the closeness metric computation.

We also need to calculate the closeness metric of both A and B at the point of insertion before the merge.

$$M_{A} = \sum_{i=c-1}^{c+1} (\text{length}(K_i^{A}) - N)^2$$

$$M_{B} = \sum_{i=c-1}^{c+1} (\text{length}(K_i^{B}) - N)^2$$

Given the closeness metrics, \(M_{A}\), \(M_{B}\), and all moves of \(M_{A \sqcup B}\), we pick the move with the max metric that's less than the metric of both A and B. This should give us the least upper bound of the merge.

$$\max { m \in M_{A \sqcup B} \mid m < M_{A} \text{ and } m < M_{B} }$$

Remember, the smaller the metric, the greater the binary relation. So we want the maximum metric (least) with a metric less than (upper bound) both A and B.

In the case of a split, we have a special case. We need to rebalance the parent node and repeat this calculation, recursively up to the root node, or until we get to an ancestor branch node that doesn't need to rebalance.

In the case of a tie, we choose the move where every key group \(K_i^{A \sqcup B}\) is a supersequence of the key group \(K_j^{A \sqcup B}\) from the other move.

We want to pick the move that gives us an MSD that's greater than the MSD of the other move per the binary relation of the poset as described above.

Required Properties of a Merge Function

This merge function has to satisfy three algebraic properties: commutativity, associativity, and idempotence. Objects that satisfy all three properties are called semilattices.

  • Commutativity: \(A \sqcup B = B \sqcup A\)
  • Associativity: \((A \sqcup B) \sqcup C = A \sqcup (B \sqcup C)\)
  • Idempotence: \((A \sqcup B) \sqcup B = A \sqcup B\)

For all of these, it relies on properties of sets and a property of pure functions.

  1. Merging sets have all of the algebraic properties above.
  2. Pure functions return the same output given the same inputs.

The keys have a total order and are unique. Each key group is an ordered set. Therefore, a merged key group \(K_{i}^{A \sqcup B}\) is the identical regardless of merge order \(K_{i}^{A \sqcup B} = K_{i}^{B \sqcup A}\). The rebalance function is pure.

Therefore, given the same set of input key groups, the resulting key groups are the same. Hence, it does not matter the order the merge function is applied for commutivity and associativity. Idempotence relies on the idempotence of the set merge function–it doesn't matter how many times an element is inserted, the merged set will be the same.

Join semi-lattice

With both a partially ordered set of multi-split descriptions and an algebraic merge function, we can combine the two into a join semi-lattice that can maintain the same splits regardless of the insertion order.

This should give us a unicit tree where the same data gives us the same tree, making it easy to compare hashes as a Merkle tree.

Addendum: monotonically increasing state

So far, the formulation for this only addresses insertions. However, it should also work for deletions, but we'd need a different representation of a key group to maintain a monotonically increasing state. It should be enough to add tombstones: two sets for every key group–a set for addendums, and a set for deletions.

CRDTs often have to keep this monotonically increasing state around to retroactively sync any replicas in the future. We don't have that requirement. For single-threaded use, I think we can collapse the state after every reversal from insertion to deletion or vice versa. For multi-threaded use, we can keep the state around while there are concurrent edits, but after all changes are committed, we can probably compress that state.

I designed this with the help of GPT. It probably warrants a blog post at some point. But in the meantime, you can see the transcripts here:

And the resulting incomplete Isabelle code.