Tuesday, April 1, 2008

Safe lists with iterators - useless musings about the data structures

While thinking about making the cubes persistent, and keeping in mind my bug within the list code, I came to a conclusion that to have any sort of iterators, the pointer/malloc based lists will be more a nuissance than help, so I have to come up with something that would encourage a more productive programming pattern.

I liked the feeling I got while coding the allocator for the timer structures, so decided to reuse this "generation+index" model once more - this time for lists.

How ? Each list essentially is managed as a resizable array of element structures. Each element is either allocated or is free. If it is free - then it belongs to the free list, and further discussion about working with it is quite boring - it is a simple unidirectional list, with the new elements grabbed from its head, and the newly freed elements being put onto its tail. But it is much more interesting to think about the allocated entries.

Each entry has a lock count - which greatly simplifies the iterator creation - simply increment the lock counter, and return the index (and element generation, of course). Then the "get_next" iterator will retrieve the "next" index from this list item, decrement the lock count, and it if is zero - then return the element to the free list.

There's only one "but" in all this: the explicit deletion of the element.

Suppose we have the list:

A->B->C->D

we start walking the list, and end up at B - whose index is stored elsewhere - so the picture will look like this: (the number of brackets is the number of locks):

(A)->((B))->(C)->(D)

Now we need to "delete" the item B explicitly and requeue it to the end of the list... So the structure now looks like:

(A)->(C)->(D)->(B')
(B)->(C)


Theoretically all is great - however, now assume the worst-case scenario that we also need to delete and requeue "C"... so we end up with this situation:

(A)->(D)->(B')->(C')
(B)->C

The "C" is now freed - so technically speaking it is undefined - so the get_next iterator will fail...

The way out of this might be to have the locks done only for the iterators use, and check the "previous" item's lock in case of deletion. Then the whole exercise will be done as follows:

1) after the first yield:

A->(B)->C->D

2) after the deletion of B:

A ->C->D->B'
(B)->C

3) now when we attempt to delete and requeue C, on the picture it is all nice... we need to check if there exists an "iterator-locked something" - like B in this case. Except the only small problem - C only has one "prev" index - so the only thing it knows about is A...

Ok, take 2 - we lock "this" and "next" items for the purposes of iterator, so the picture is:

1) after the first yield:

A->(B)->(C)->D

2) after the deletion of B:

A ->(C)->D->B'
(B)->(C)

3) after the deletion of C:

A->D->B'->C'
(B)->(C)->D

All looks great, except now we also would like to delete "D" - which brings us to the problem in the previous example!

However, I think there is still a way out.

Upon the request to delete the "iterator-locked" node, the next node also needs to be "iterator-locked":

1) after the first yield:

A->(B)->C->D

2) delete the B:

A->(C)->D->B'
(B)->(C)->D

3) delete the C:

A->(D)->B'->C'
(B)->(C)->(D)


Intuitively, this approach should prevent the need to find back "the nodes who refer to the node being deleted". Although, of course if has a big drawback - the iterator will need to walk all of the deleted nodes - with allegedly much more recent versions available at the end of the list - so these should be skipped. Which, in case of a large interval between the executions of the iterator and the massive updates to the list, will cause an excessive CPU cycles while skipping the empty elements. Which could probably be ok - I just might need to add a special case - "tried to skip the dead elements, but too much junk in the way - try again later".

No comments: