Episode Details

223: Compile once, debug twice

Published 8 years, 3 months ago

Description

Picking a compiler for debuggability, how to port Rust apps to FreeBSD, what the point of Docker is on FreeBSD/Solaris, another EuroBSDcon recap, and network manager control in OpenBSD

This episode was brought to you by

Headlines

Compile once, Debug twice: Picking a compiler for debuggability, part 1 of 3

An interesting look into why when you try to debug a crash, you can often find all of the useful information has been optimized out

Have you ever had an assert get triggered only to result in a useless core dump with missing variable information or an invalid callstack?
Common factors that go into selecting a C or C++ compiler are: availability, correctness, compilation speed and application performance. A factor that is often neglected is debug information quality, which symbolic debuggers use to reconcile application executable state to the source-code form that is familiar to most software engineers.
When production builds of an application fail, the level of access to program state directly impacts the ability for a software engineer to investigate and fix a bug. If a compiler has optimized out a variable or is unable to express to a symbolic debugger how to reconstruct the value of a variable, the engineers investigation process is significantly impacted. Either the engineer has to attempt to recreate the problem, iterate through speculative fixes or attempt to perform prohibitively expensive debugging, such as reconstructing program state through executable code analysis.
Debug information quality is in fact not proportionally related to the quality of the generated executable code and wildly varies from compiler to compiler.
Different compilers emit debug information at varying levels of quality and accuracy. However, certain optimizations will certainly impact any debuggers ability to generate accurate stack traces or extract variable values.
In the above program, the value of argv is extracted and then the program is paused. The ck_pr_load_ptr function performs a read from the region of memory pointed to by argv, in a manner that prevents the compiler from performing optimization on it. This ensures that the memory access occurs and for this reason, the value of argv must be accessible by the time ck_pr_load_ptr is executed.
When compiled with gcc, the debugger fails to find the value of the variable. The compiler determines that the value of argv is no longer needed after the ck_pr_load_ptr operation and so doesnt bother paying the cost of saving the value.
Some optimizations generate executable code whose call stack cannot be sufficiently disambiguated to reconcile a call stack that mirrors that of the source program. Two common culprits for this are tail call optimization and basic block commoning.

In another example

If the program receives a first argument of 1, then function is called with the argument of "a". If the program receives a first argument of 2, then function is called with the argument of "b". However, if we compile this program with clang, the stack traces in both cases are identical! clang informs the debugger that the function f invoked the function("b") branch where x = 2 even if x = 1.
Though some optimizations will certainly impact the accuracy of a symbolic