Arc Forumnew | comments | leaders | submitlogin
2 points by Pauan 4508 days ago | link | parent

I would first like to give a very big thank you to waterhouse for their post on AVL trees[1]. I had been looking at a variety of trees (AA, finger, scapegoat, and splay, mostly) but had run into some difficulty.

You see, most information on trees assumes that they're going to be mutable. It seems good information on immutable trees is a bit hard to come by, at least in my experience.

So, even though I hadn't liked AVL trees at first, I really warmed up to them. Waterhouse's post demonstrates a very simple way to create persistent immutable AVL trees, which is exactly the kind of thing I was looking for.

I was able to easily copy the algorithm mostly word-for-word into Python, and here's the results. It supports:

* Dense lists (sorted by index, can't have empty spots in the middle of the list[2])

* Dictionaries which map keys to values

* Sets which are dictionaries with only the keys, no values

If you want a sparse list, you can just use a dictionary with numbers as the keys.

---

All three data structures have:

* O(1) time to retrieve the number of elements (the length)

* O(n) time to iterate through all the elements, which is the same as a linked list (though there's additional overhead which makes it slower)

* O(n) time to iterate through all the elements in reverse order[3]

* O(log2(n)) lookup and (immutable) insertion

* Not sure about the performance of deletion... I'd expect it to be roughly O(log2(n)) but it doesn't matter because deleting should be rare

* Ability to get/insert/delete items by index (this may change, though)

---

This is part of my desire to get rid of cons and use some form of binary tree for everything in Nulan, and it's largely thanks to waterhouse that it became a reality.

I have tested waterhouse's algorithm to verify that it works: it seems solid enough to me, but I have no clue how fast it is or how good the rebalancing is. Still, it's a very solid start, which I can improve upon over time. That's what's most important at this stage: just doing the simplest thing that can possibly work.

---

* [1]: http://www.arclanguage.org/item?id=14181

* [2]: Linked lists can't have empty spots in the middle either.

* [3]: Just commited: https://github.com/Pauan/nulan/blob/master/nu_tree.py



1 point by akkartik 4508 days ago | link

AVL trees are just binary trees. Did you really mean http://en.wikipedia.org/wiki/B-tree?

-----

2 points by Pauan 4508 days ago | link

My mistake, I said B-tree when I meant "binary tree"[1]. It has been corrected.

AVL is a particular kind of self-balancing binary search tree[2], which is what I want. Which kind of tree isn't so important, just so long as it's simple and fast enough.

---

* [1]: B-trees are a self-balancing n-ary search tree, and judging by the Wikipedia article, I probably won't use them in Nulan, at least not for a while.

* [2]: Not every binary tree is a search tree, and not every search tree is self-balancing.

I want all three properties because of the good worst-case performance and because I believe that it will be easier to parallelize if the trees are roughly balanced.

As for why I chose binary over n-ary, that's simply for the sake of simplicity. Since it'll be a core datatype, I don't want the programmer API being too complicated.

-----