BuildId on Android

2023-12-03 update: another great read on symbol management is Separating Debug Symbols From Executables.

A few years ago I wrote a post on chasing symbols which is due for an update. When I originally wrote that, I was mostly focused on Windows and Linux development.

Perhaps this coming holiday season would be a good opportunity to revisit that. Today, I wanted to include some notes on Android.

Android follows the pattern of generating symbols in the shared object ELF files. When assemblying an APK, however, it's undesirable to keep the symbols around, and so they'd normally get stripped out.

The process of stripping out symbols leaves you with a new shared object file that simply doesn't have the debug sections.

How exactly you manage this new pair of files is not standardized. But more on that later. First, let's introduce BuildId.

What is BuildId?

If you start your journey on the Android side of things, you might run into a reference to build-id in the native crash page.

The second thing to note is that executables and shared libraries files will show the BuildId (if present) in Android 6.0 and higher, so you can see exactly which version of your code crashed. Platform binaries include a BuildId by default since Android 6.0; NDK r12 and higher automatically pass -Wl,--build-id to the linker too.

So building with the Android toolchain is setting you up with the --build-id flag for the linker. As with a number of things, GCC has shown the way here and the clang/llvm toolkit follows suit.

The man page for ld tells us that this will Request creation of ".note.gnu.build-id" ELF note section. Furthermore, The contents of the note are unique bits identifying this linked file. style can be "uuid" to use 128 random bits, "sha1" to use a 160-bit SHA1 hash on the normative parts of the output contents, "md5" to use a 128-bit MD5 hash on the normative parts of the output contents, or "0x hexstring " to use a chosen bit string specified as an even number of hexadecimal digits ("-" and ":" characters between digit pairs are ignored). If style is omitted, "sha1" is used.

So we want to avoid uuid for our purposes and instead use hashing, which is the defualt. The "md5" and "sha1" styles produces an identifier that is always the same in an identical output file, but will be unique among all nonidentical output files. It is not intended to be compared as a checksum for the file's contents. A linked file may be changed later by other tools, but the build ID bit string identifying the original linked file does not change.

OK, so in practice we can simply dump this from two shared object files and see if one if the stripped version of the other. And, in fact, that is how tools expect this to be used.

This SO post states Its primary use is to make sure that the core file matches the binary which produced it (build-id is located very near the start of the binary, and is included in the core for this purpose). Build-id is also used to locate correct debug info -- the best practice is to build with -O2 -g, save this "full debug" binary, then run strip -g exe exe.stripped and use the exe.stripped in production. When you get a core dump from production, use the original exe to debug it.

You can see Android tools use this information, for example the declaration that simpleperf has is at simpleperf/build_id.h.

How do I use it?

You might want to look at a shared object's build-id to verify manually that it matches what you'd expect by comparing it to a different file.

readelf -n FOO.so
; or consider the filtered:
readelf -n FOO.so | grep Build

To actually see whether a file has debug information, you'd run the following and look for sections such as .debug_loc or .debug_info.

readelf --sections FOO.so
; or consider the filtered, empty means no debug info:
readelf --sections FOO.so | grep debug
; or consider the far more extensive, empty means no debug info:
readelf --debug-dump FOO.so

If you don't have readelf on your path, you should be able to find it under the Android NDK toolchain directory.

Not all symbols are alike

When talking about symbols, however, note that we have the classic DWARF-debug-section symbols that we know and love when using a debugger or similar tools, and then there is the rather different from the Breakpad files.

Quoting the documentation (a theme in today's post): The platform-specific symbol dumping tools parse the debugging information the compiler provides (whether as DWARF or STABS sections in an ELF file or as stand-alone PDB files), and write that information back out in the Breakpad symbol file format. This format is much simpler and less detailed than compiler debugging information, and values legibility over compactness.

The forward-looking statements are certainly good aspiration, but as far as I know there hasn't been substantial traction to get there.

Happy symbol lookups!

Tags:  androiddebugging

Home