Hashing Streams

Cryptographic code can be CPU intensive, and so it pays to be mindful of what we do here. I've talked about hashing before, and here I go with a quick reminder.

Hashing works on a stream of bytes, at least conceptually. But it turns out in practice that this streaming nature can be leveraged through the APIs as well. Typical examples are streaming through the network and streaming from disk.

Even though there are simple APIs that accept a single buffer to hash, these require you to concatenate everything. The code you write is often shorter, but this extra copying makes things less efficient. If you look around, you'll often find other functions that allow you to feed this in a few buffers at a time, each of which might be a disk block or a network message.

In .NET, you will find ComputeHash as the all-at-once function. There are a couple of overloads, one to select from within a byte array (so you can pool byte arrays and slice them via offsets for example), and one taking a Stream (but of course it will consume the stream in the process). The streaming functions are TransformBlock and TransformFinalBlock.

The method comments on the Transform... methods are a bit confusing because they say that values are 'copied' to the output byte array, but that's an accidental byproduct from one of the base interfaces of the object. Just use the same byte array for input and output. See this StackOverflow page for a discussion.

In Windows C++, you will find BCryptHash as the all-at-once function, and BCryptHashData and BCryptFinishHash as the streaming ones. The function will also wrap creation and destruction of the hash object.

Happy hashing!

Tags:  codingcppdotnet

Home