In-Memory Optimistic Concurrency Control

Optimistic Concurrency Control

I wrote about optimistic and pessimistic concurrency control a long time ago. I still remember which is which with the following examples.

The pessimistic approach is a user thinking "I'm sure someone will try to update this row at the same time as I will, so I better ask the database server to not let anyone touch it until I'm done with it."
The optimistic approach is a user thinking "I'm sure it's all good and there will be no conflicts, so I'm just going to remember this row to double-check later, but I'll let the database server do whatever it wants with the row."

In databases, this can be supported on a table by having a specific column, eg ROWVERSION in MS SQL Server, or it can be a behind-the-scenes bit of support with some hidden version column, or the implementation might end up hashing all values.

In HTTP, this manifests as entity tags or etags which can be used on RESTful architectures.

In-Memory Data Structures

Can we use this even outside of storage systems? You better believe we can!

Let's call the value we use for optimistic locking the concurrency token to avoid conflating it with other things.

For an in-memory data structure, this could be as simple as keeping a sequence number for an object, and then having other systems keep that number around as the concurrency token.

If a single value is too coarse, then you can keep multiple ones to cover different subsets. For example, you might have a value that covers the text content in a text editor, and another one that covers the selection range. This way, updating the selection doesn't invalidate the underlying text. In practice, text selection maintenance is trickier, mind you.

Another option is to use actual values with semantic meaning rather than simple counters, if that's appropriate for your use case. Sometimes the concurrency token ends up being the primary key or identifier for the object, like the file name for some piece of content, and all you'll really need to test is existence.

Yet another one is to combine the version information with another value when building the concurrency token.

Finally, consider that in hierarchies or networks, you can propagate the concurrency tokens to implement things like a 'table lock' or a 'range lock'.

Scenarios

A scenario where I've used this in practice was in connection management / pairing to a consumer device over a UDP (unreliable) channel. We kept an in-memory table of connections, and each incoming request, outgoing request or in-memory API would include a peer id value that was essentially an index with extra bits to version what that index was representing at the time.

This allows immediate access to any information related to the peer (via the index), while allowing messages and other software components to run asynchronously, and makes sure we could detect and handle stale requests for disconnected devices.

As an extra note - at trust boundaries like the network or inter-process communication, this scheme required a cryptographic check to avoid spoofing these identifiers for network traffic, and a check for process/identifier visibility for IPC.

In this case, we're using a 'row access' approach to manage a state machine. This pattern is also seen in the 'entity' management portion of an entity component system, where an entity ID acts as a handle if it encodes this version information in the identifier.

Speaking of state machines, if you have competing processes (rather than the simpler queued approach I've shown before), optimistic concurrency control again lets you have them work independently without losing data (and by "losing data", you should really be thinking of maintaining ACID properties).

Alternatives

Well, you have pessimistic locking, which in an in-memory context would most likely translate to a mutex somewhere (a simple embedded one or a more sophistic scheme could be used to cover ranges or hierarchies like I mentioned before).

A different approach altogether would be to use something like mergeable data structures or even conflict-free replicated data types or CRDTs, but I have yet to find a need for that in my experience - I have always gotten by with something simpler.

Happy concurrency control!

Tags: design perf

Home