DWARF support in GHC (part 1)

Ben Gamari - 2020-04-03

This post is the first of a series examining GHC’s support for DWARF debug information and the tooling that this support enables:

Part 1 introduces DWARF debugging information and explains how its generation can be enabled in GHC.
Part 2 looks at a DWARF-enabled program in gdb and examines some of the limitations of this style of debug information.
Part 3 looks at the backtrace support of GHC’s runtime system and how it can be used from Haskell.
Part 4 examines how the Linux perf utility can be used on GHC-compiled programs.
Part 5 concludes the series by describing future work, related projects, and ways in which you can help.

DWARF debugging information

For several years now GHC has had support for producing DWARF debugging information. DWARF is a widely-used format (used by Linux and several BSDs) for representing debug information (typically embedded in an executable) for consumption by runtime systems, profiling, and debugging tools. It allows representation of a variety of information:

line information mapping instructions back to their location in the source program (e.g. the instruction at address x originated from myprogram.c line 42).
unwind information allowing call chains to be reconstructed from the runtime state of the execution stack (e.g. the program is currently executing f, which was called from g, which was called from h, …)
type information, allowing debugging tools to reconstruct the structure and identity of values from the runtime state of the program (e.g. when the program is executing the instruction at address x, the value sitting in the $rax register is a pointer to a Foobar object.

Collectively, this information is what allows debuggers (e.g. gdb) and profiling tools (e.g. perf) to do what they do.

The effort to add DWARF support to GHC started with Peter Wortmann’s dissertation work which introduced the ability for GHC to emit basic line and unwind information in its executables. This support has matured considerably over the past few years and should finally be ready for use with GHC 8.10.

There are a few potential use-cases for DWARF information:

Use in native debugging tools (e.g. gdb)
Dumping runtime call stacks to the console using the SIGQUIT signal; this is particularly useful in production
Computing runtime call stacks from within the program (using the GHC.ExecutionStack interface in base)
Statistical profiling using tools like perf.
Capturing call-stacks in exceptions for reporting to the user

We will discuss all of these in this series of blog posts. The rest of this first post will examine how to compile a DWARF-enabled binary.

First steps

As of GHC 8.10.2, GHC HQ will provide DWARF-enabled binary distributions for Debian 9, Debian 10, and Fedora 27 (as of 8.10.1 only Debian 9 is provided). These binary distributions differ in two respects from the non-DWARF distributions:

all provided libraries (e.g. base, filepath, unix, etc.) are built with debug information.
the runtime system is built with a dependency on the libdw library (provided by the elfutils package).

Like other compilers, debug information support under GHC is enabled with the -g flag. This flag can be passed a numeric “debug level”, which determines the detail (and, consequently, size) of the debug information that is produced. These levels are described in the GHC user guide.

When using native debug information we must keep in mind that all code linked into an executable (e.g. native libraries, Haskell libraries, and the code of the executable itself) must be built with debug information. Failure to ensure this will result in truncated backtraces.

To build a package with native debug information we can use cabal-install’s --enable-debug-info flag (or, below, its equivalent key in cabal.project). Here, we will use the vector testsuite as a non-trivial example:

$ git clone https://github.com/haskell/vector
$ cd vector
$ cat >>cabal.project.local <<EOF
allow-newer: base

package vector
  tests: True

package *
  debug-info: 2
EOF
$ cabal new-build vector-tests-O0

For the sake of demonstration we built the vector-tests-O0 testsuite (which builds vector’s tests without optimisation) since this provides slightly more interesting stacktraces. We chose debug level 2 as we will not be using the GHC-specific debug information emitted by debug level 3.

At this point we have a DWARF-annotated binary. This binary is functionally identical to a non-annotated build (apart from containing quite a few more bits, weighing in at over 150 megabytes). Most importantly, no optimizations were inhibited by enabling debug information.

In the next post we will begin to see what this extra 100 megabytes of debug information gives us.