In-Memory Multiversion Data Structures

Today is a continuation of sorts from my recent In-Memory Optimistic Concurrency Control post.

The core idea for multiversioning is that you're managing multiple versions of data, usually at the row/record/object level, and different concurrent transactions/operations get to see a version that gives them a consistent view. This allows you reduce blocking to maintain consistency and thus increase concurrency in many practical cases.

Multiversion In Databases

A not-so-very technical discussion of Multi Version Concurrency Control for Firebird is a good practical explanation of the concept. Back when I first was introduced to this, I believe it was on Interbase, somewhere in version 4 or 5 I think, and it was quite mind-opening to think about data this way, especially because I was used to a more classical approach like Microsoft SQL Server had.

Speaking of which, Inside Microsoft SQL Server 7 was an excellent resource for learning how a lot of the subsystems, algorithms and data structures were implemented in this more classical version. But that's really a subject for a future post - on with how we make things work in memory!

In-Memory Data Structures

There are a handful of things you have to consider and implement when you're using this sort of scheme.

Data Layout

Usually the link between older data and latest data will be done by embedding pointers for an in-memory implementation (or possibly by moving data around if you want to pack things more tightly). You can keep the oldest data in-line and link to future changes, or keep the most recent version and link to older versions.

Data API

Traversing this data everywhere is usually somewhat clumsy, so if you have a layer that will provide data access for you, it's straightforward to simply pass in your concurrency token / active transaction identifier into access methods, and have that layer take care of presenting the correct data for you. Alternatively, associate the token with a cursor/iterator, or with a 'multiple query'/'cursor factory'-type of object that will generate simpler objects wrapping the correct token to pass along.

Garbage

One of the things to solve in this scheme, in addition to the data layout question and the API style question, is the question of how to avoid garbage from collecting.

The first thing you need is a way to identify whether something is unreachable. In the database case, the oldest active transaction is kept around; new transactions always start from latest, so the oldest transaction eventually moves forward and acts as a watermark. For an in-memory structure, you would similarly have readers and writers cooperating on maintaining the correct book-keeping.

Once you have that, there are a couple of approaches to when you do this work. One is to do it in a batch or sweep, where you pick the 'right moment' to go scan for older information and clean it up. The other is to do it 'on the go', by having readers and/or writers clean up older version when a record or object is found that isn't reachable.

The former might make the reading/writing more predicable because they do less work, but if you've taken the trouble to pack your data densely, it's kind of nice to leverage the locality. But of course for an in-memory implementation, your locality costs will be different than for a storage-based system.

You can also combine both approaches. If I'm not mistaken, that's what InterBase/Firebird does - collect as it goes, and schedule a sweep every some number of transactions (which is done via readers that simply scan every table).

Conflicts

Another thing to consider is whether you have a stricter split between readers and writers - which is easier to do with specific use cases where you know your workloads - or whether you don't know at the start what kind of transaction you have - as would happen on a database that's just going through SQL statements.

In the latter case, you might need to 'upgrade' your readers into writers, at which point you might need to deal with all the usual writer-writer conflicts. The former case is, of course, much simpler, especially if you have a single writer and you can avoid the problem altogether.

Use Cases

Well, databases of course, where you want to run some longer reporting queries without blocking writers, are the classic case, but those are not necessarily in-memory.

In a text or code editor, you can have a single writer respond to keystrokes and update spans without blocking other threads that could be doing rendering, parsing or spellchecking.

In a game or simulation, you can update the next tick or frame without affecting workloads like rendering or game AI that might be running concurrently or on a more frequent cadence. I've seen designs with two arenas for this as well, but if there are relatively few updates compared to all the data you'd be moving around, the extra cost of multiversioning your access can be lower than copying everything every frame.

Immutable Alternatives

An alternative design I've seen pop up more recently is having immutable data structures that get re-used/re-referenced on update.

One example of this design is .NET LINQ nodes, where tree rewrites that occur during transformations - say during an analysis or transformation pass on your way to produce a SQL string - ends up creating a new tree by reusing parts of the other and rewriting bottom-up. This is straightforward because there are no parent nodes, so you can reuse subtrees easily.

To look at some concrete examples, ExpressionNormalizer and ProjectionRewriter are examples of rewriting visitors in the OData library I worked on a very long time ago. They visit a LINQ expression tree and touch it up by rebuilding some subtrees and passing others through.

Another similar example with some additional is the Roslyn syntax or semantic trees. Here are some good sources covering their approach and uses.

Finally, a really long time ago, I implemented a system where immutable data was used with relational data, where updates would come in weekly batches, and the data access could simply apply deltas in layers by querying starting from the most recent data and "tunneling through" to older data sets.

Happy data versioning!

Tags: design perf

Home