In-Memory ISAM Data Structures

Indexed Sequential Access Method or ISAM is a design for data management well suited for a mix of access patterns including multiple criteria for ordered access, lookups and updates.

Today's post was originally going to be quite long, but eventually I remembered the excellent Data-Oriented Design Book, which covers many of the ideas and approaches.

Databases

I learned a lot of the basics working with old ISAM databases like Paradox and Access.

Paradox was very easy to understand and program with, and the Borland Database Engine and its integration with Delphi allowed me to be quite ridiculously productive at data-oriented applications back in the late nineties.

Much later, I'd become familiar with Microsoft's Extensible Storage Engine, which powered things like Exchange Server as well being available on various Microsoft products. This is a much more feature-rich, but it's still a query-less, record-oriented data layer.

ISAM-style data organization is useful as a storage layer for databases such as SQL Server, which can be thought of as a higher-level access/query engine, doing things like query plans, procedures and triggers, and a lower-level storage engine that manages I/O. I'm going to plug in Inside Microsoft SQL Server 7 just like last time, because again it's a great resource for this sort of architecture.

Note that these things co-evolve in practice, however, and things like locking or resource management often end up somewhat coupled. If you look at it closely, however, you can see the lines in something like OLE DB, where accessing resources by names, knowing about indexes and selectivity allow you to kind-of build higher level things like query engines on top. That's how SQL Server can do things like linked server that are OLE DB data sources after all.

To get a sense of the operators that you might want to see in the higher and lower level layers, you can take a look at the Operator Descriptions for logical and physical operators used in queries.

In-Memory Data Structures

There are two good reasons to borrow design and to some degree implementation ideas from these old classic techonologies.

For design and code health, we have many decades of experience in terms of organizing data, separating data and algorithms, ability to reason about system behavior and difficulty in different patterns of data transformation, etc.
For performance, the care with which these classic engines take in accessing memory vs. accessing disk is useful to apply to other tiered access mechanisms, like cache lines vs. main memory or main memory vs. networked access.

In fact, I've developed client applications that embed sqlite simply for the convenience of the discipline of structuring data, having thought-out access methods, and of course the quite excellent performance and ability to support transactions and evolve things with indexes.

If you're going to be creating your own data structures and access methods - and truly this is a perfectly fine thing to do for scoped components - other considerations for multiple access patterns are borrowing ideas from index design, space management and cases like large row support, and harking back to last month's post, how to build row versioning if your workloads warrant the additional complexity.

More References

The data oriented book is very thorough, but for a shorter treatment, Data Locality in Game Programming Patterns is also very approachable.

Happy in-memory data management!

Tags: design perf

Home