Archive for September, 2020

I think I have invented a new data structure, the stree.

Saturday, September 5th, 2020

It’s hard to imagine there is a data structure that hasn’t been invented yet. wikipedia around and you’ll find dozens of oddly named trees and graphs and heaps and so on. All the great ones were invented or discovered (however you see it) in the 60s, and here we are 50-60 years later, you think that’d all be done by now.

 

So here, on 9/5/2020 I offer up the Stree.

It’s rather simple which is why I’m guessing somebody’s already done it at least in some form.

It combines a binary search tree with J. W. J. Williams’ binary heap.

The binary heap is a really neat data structure because it is tightly packed (‘complete binary tree’) and sorts pretty quickly, and you can find the child or parent of a node with simple math, if it is stored in a linear array of memory. Genius and brilliant at the same time.

As with all old things people have improved on it (Floyd came up with a faster build-the-initial-tree scheme) and I hope somebody can improve upon my idea, if in fact there isn’t one like it already.

I was looking for a tightly packed binary search tree and I couldn’t find one. So I thought about it for a bit, kinda wondering why it didn’t exist and cameĀ  up with this:

The binary heap makes the tightly packed tree by always only adding a new node at the end of the array of memory, and swapping nodes around until the tree is ‘correct’. In the case of the binary heap, correct is the parent key is greater than the child key. Doing a postorderĀ  traversal of the tree will yield a sorted list (for a max heap tree).

The problem with post order traversal is that you can’t binary search it, or at least I can’t because the tree is sorta turned 90 degrees, and you can’t search sideways.

So how do we make it a binary search tree?

Kinda the same thing the binary heap does, but instead of swapping nodes on insert until the tree is max-heap correct, you swap nodes until it is binary search tree correct.

This is not as simple as making a max heap correct tree, but it’s not that much more complicated.

Since a binary heap is lopsided to favor bigger numbers as you go up the tree, you only have to go up once to find the final position of whatever number you are inserting.

A binary tree has one of its middle values at the top, so when inserting a new value into an stree, you have to swap values up to the top of the tree and then swap values back down (sometimes reswapping something you’ve already swapped) until you find the correct final resting place for the newly inserted value.

There are some other problems, and there are optimizations to be had, I haven’t worked out every single case yet, but I think it’s workable.

What happens, is like a binary heap, you insert by adding a new value at the next available space at the bottom/end leaf of the tree, or make a new row if the bottom row of the tree is full, and then start swapping nodes up until you get to the top. If you’ve found your final resting place (by comparing to the root node, based on what side you started on) you’re done, if not, you start working your way down the tree again until you’ve basically binary searched your way to where the new node is supposed to be.

It turns out that along the way, some of those swaps cause localized parts of the tree to become invalid, and then you have to go back and fix those too, but that only happens in the area of nodes that you’ve swapped, and you just keep track of things that need to be fixed as you’re doing all your initial swapping.

Worst case insert ends up being 2 * O(log n) plus a little bit more for some possible fixups.

I haven’t worked out node removal yet, but the concept is basically the same but backwards, you know which node you want to delete, and you know the spot in the tree that has to be freed up at the end, so you just have to swap up and down the tree until you get there.

I’m still working on this, but this is my basic idea, apparently data structures are not patentable, so I’m marking my stone in the sand here, as having come up with a usefully neat compact, binary search tree data structure.

Gotta go, my kid just woke up.