Lab notes #009 Concatenation and Append
Took a bit of a break, but have been back at it. The most involved part of implementing the relaxed radix balance trie is the concatenation. I had to consult three different papers to piece together what the details were, but in the end, I found L'Orange's paper to be the most helpful.
Some details that stuck out during implementation:
- The size tables are cumulative sizes of each of the children. That implies the size tables are only counts of all its sub-tries, not across sibling subitries.
- The RRBT can have a variable branch factor, usually set at 5 for 2^5 (32) children per node for optimal performance. However during testing, it helped to have the branch factor set at 2, so children are more easily enumerated.
- Most RRBT concatenation descriptions don't account for concating tries with different heights. The L'Orange paper does address this. If the left trie is too tall, shorten it by traversing down the rightmost child. If the right trie is too tall, shorten it by traversing down the leftmost child.
- It's apparently more computational efficient to make a concatenation plan before moving nodes around.
- Executing on the concatenation plan also required writing iterators to traverse all the grandchildren of a node. Because Rust discourages doubly linked lists, to iterate across the grandchildren, we simply move to the sibling node to iterate over its children.
- When calculating the concatenation plan, we don't try to fully pack every node. There's some leeway to keep some nodes partially vacant, hence the name "relaxed" in "Relaxed Radix Balanced Tries". It was a little confusing as to how that was calculated.
The optimal length of children isceil(total_num_elems / branch_factor)
. But we don't need to hit the optimal length every time. We can have some leeway, a number of nodes off from the optimal length. This is what the Bagwell paper callse_max
.
The reason why concatenation is important is to implement insertAt
, so that we can insert into the RRBT with effectively O(1) cost (It's actually some O(log n) cost with a ceiling on the height of a trie). Once we have insertAt
, we should be able to implement a persistent map.
However, there's been a set back. After finishing concat, I found that the implementation of append that I got from the Stucki paper wasn't correct. I had leaf nodes that weren't all at the same height. In addition, the relaxation rules weren't built into it. So I decided to reimplement it with the L'Orange paper's description.
- Despite having the advantage of clarity, there was a typo in the L'Orange implementation. The method
copy_first_k()
takes ak
parameter, but doesn't use it in the code. from the implementation ofrrb-push
, I gathered it's suppose to be the number of full nodes below the deepest vacant internal node. Hence, the iteration incopy_first_k()
should befor h <- h(R) downto (h(R) - k + 1)
I spend much of my time just trying to understand and find the correct implementation description that makes sense, and fighting with the Rust type system and borrow checker. I'm less of a fan of the monadic types now, since when you have nested monatic types (such as my use of Vec<Rc<Rrbt<V, B>>>
) it can be hard to keep track of what the type is after a map
or an and_then
.
The rest of the time will be spent on getting append working again, and then working on splitAt
. Once I have that, I should be able to implement insertAt
. But I imagine there will be pitfalls there, since splitAt
only has a paltry description.