Post-mortem logging

See this post for a quick introduction to post-mortem debugging, and this other post for ideas on how to use global variables to help debug issues.

First off, there are a handful of things that we can consider under the notion of logging. I'm going to cover a couple of them here then wrap up with some of the common principles. I'm using C++ and Windows APIs in my examples, but the principles apply regardless of language or operating system.

Free-form logging

Let's start with the most common one, the venerable free-form text logging, basis of printf debugging.

A common way to do this is to keep a buffer at a high-level object if possible, and direct printf-like statement to it.

The buffer can be a circular buffer that is kept in memory, so it gets dumped out with the rest of the process if a fault occurs and a minidump is generated. When the time comes to investigate, you'll have a buffer to go look at and see all your logs traced out.

This is often complemented by having more than one target for your logs, for example logging the same messages to a file or a remote location. The in-memory log has the advantage that it will have messages that might not have been flushed, and you also don't have to go looking around for the log files (or you may not have them available at all depending on how the minidump and other files are managed).

For string formatting, I'm usually a big fan of the strsafe.h functions. I particular, StringCchPrintfA can be used to format safely, and if you have a wrapper function that does extra work like add thread ids or check log levels, StringCchVPrintfA will take your va_list.

Actually, the header itself is a good reference if you need to look up things like SAL annotations for strings and format strings (which the compiler can use to give you warnings if you pass in wrong arguments for the format string specified), as well as using variable arguments and forwarding things around.

If you are using STL or STL-like constructs, boost has a nice circular buffer.

One of the things to look out for is not serializing on access. If you don't mind allocating (or have a fixed pool you allocate from), you can chain your messages with something like a singly linked list. These are pretty useful in general, and will likely get their own post at some point.

Structured logging

Free-form logging is nice in that you don't need to put too much thought into adding a new trace - you just write to it, including whatever you need, and you're done. At most, you might want to think about some sort of logging level (error / warning / informational / debug).

Structured logging on the other hand typically means logging some small structured payload, for example 'connection start/stop' flag with a connection identifier and maybe a network address. This keeps the trace short, easier to analyze with tools, and might help you think about completeness (log connections canceled if that's a new event flag, for example).

At the extreme case, you can use this to simply keep a history of state changes in a fixed array of enums. For example, state transitions for PnP devices are pretty complicated, and if you get in a weird situation it's handy to see how you got there or what happened recently. Thankfully, devices keep a history of transitions, and you get a handy debugger command as well to go look at it.

Common principles

There are a few ideas that are common to all of the above.

We're keeping around snapshots of state in memory for analysis.
We have to have bounds on how much state we keep. This typically means overwriting the older states, removing the uninteresting ones aggresively or when needed (things that cancel out like pause/resume or things that get subsumed like details for something that ends up succeeding), or summarizing them (spilling over details into counters for example).
We want to make sure the state is reachable.
We have to be careful that logging doesn't change the state of the system, both in terms of resource usage (so consider preallocating) and in terms of scheduling (so try to avoid blocking threads).

Happy logging!

Tags: debugging design

Home