Home Bookclub Episodes Building a Debug...

Building a Debugger

Sy Brand • Tim Misiak | Gotopia Bookclub Episode • May 2025

You need to be signed in to add a collection

Tim Misiak interviews Sy Brand about "Building a Debugger," diving into the deep systems insights gained from debugger development, OS differences, and the future of debugging technologies.

Share on:

Copied!

Transcript

How Understanding Systems Unlocks Better Software

Tim Misiak: Hello and welcome to GOTO Book Club. My name is Tim Misiak and I'm an engineer at Datadog, formerly working on debuggers at Microsoft. Today, I'm thrilled to be talking to Sy Brand, author of the excellent book "Building a Debugger." Sy Brand, in addition to being an author, you're also involved in C++ standards and toolchains through work at Microsoft, right?

Sy Brand: Yes. I work as Microsoft’s C++ developer advocate. I wear a lot of hats - sometimes do C++ standards work, conferences, work on the Microsoft C++ standard library. My background is in debuggers and compilers for GPUs.

Tim Misiak: I knew I was going to love this book when I opened it. I have a background in debuggers, and the title page says "Building a debugger: Deepen your operating systems and system programming knowledge," which really describes my experience with learning about debuggers. There were so many things I learned through working on debuggers that I learned better just by implementing them than through anything else. This book isn't just for people who want to build a debugger though, right? What other types of people will benefit from reading this?

Sy Brand: In building a debugger, you learn so much about how computers actually work, which you can apply in pretty much any area of development. If you're doing systems programming, understanding what's going on under the hood can be invaluable. It helps you understand hardware, your operating system, how they interact, what your operating system provides to your programs, and how to make the most of your debugger. If you're not getting the value of a variable in your debugger or stepping isn't working, you'll understand why and how these things actually work.

Tim Misiak: Understanding how hardware and operating systems work helps when implementing low-level software, but it also helps you more effectively use a debugger. I didn't know about memory access breakpoints before I started working on debuggers.

Sy Brand: They're super useful. I had never really used them until I implemented them. Then I thought, "Why? These things are so cool! I should use these more."

Tim Misiak: When people would join our team at Microsoft to work on debuggers, I would also teach them debugging features. I'd tell them, "You can break into the debugger when this memory changes," and that's a game changer.

Sy Brand: For people who aren't familiar with this, besides regular breakpoints that you set on a line of code, you can set breakpoints on reading from or writing to a memory location. If some memory is being corrupted and you want to know what's causing it, you can set a memory breakpoint, and every time it's updated, your program stops and you can find out what's happening.

Tim Misiak: You mentioned you learned about hardware breakpoints as you were implementing debuggers. Was there anything else that sticks out as things you learned when you started working with debuggers?

Sy Brand: Stack unwinding was a big learning. Before this book, I wrote a series of blog posts on writing a simple Linux debugger. I never really touched stack unwinding much then - I just wrote a basic unwinder that assumed you're compiling with frame pointers, which simplifies things because they let you walk up the stack by reading a known memory location from your current function's stack pointer.

But if you don't have frame pointers, you need to read the stack unwinding tables, and they're horrific. I never realized how much complexity is built into stack unwinding. There's an entire Turing-complete language built into the stack unwind information on Linux. It's horrifying but fascinating.

Tim Misiak: "Horrifying but fascinating" describes a lot of this work. I found the same with stack unwinding on the Windows side - it's just as horrific. There are so many crazy edge cases. The version you might learn in an operating systems class with frame pointers makes it look easy, but the real world has inline frames, interop handlers, edge cases on different CPUs, and that Turing-complete unwinding language in the frame descriptions.

I loved reading that chapter in your book because I have the Windows perspective, and I wondered if it might be better on Linux, but no - it's equally hard with many optimizations and things to deal with.

Sy Brand: I expected to find documentation in the DWARF standard or the System V ABI, but there are 3-4 different documents that are slightly inconsistent and sometimes contradict each other. Then you read the source code for GCC, and that contradicts the specifications and has extensions that you have to implement because everyone follows what GCC does on Linux as the de facto standard. It's horrible but fascinating.

Tim Misiak: You seem familiar with both Windows and Linux debugging. Did you start with Linux first?

Sy Brand: Yes, I'm definitely more familiar with the Linux side. My entry into debugging was getting a job as a compiler developer, but my first task was to output debug information from the compiler and ensure it went through the data pipelines for our debugger. All my hands-on experience was with Linux, and I looked into Windows through interest. I have a perfunctory understanding of Windows debugging but have witnessed all the horrors in Linux.

Tim Misiak: With what you've seen from Windows, do the main concepts seem mostly the same between Windows and Linux, or is anything notably different?

Sy Brand: Many major concepts translate directly. Some things take different approaches - the main system call interface for Linux is a single function, which is horrible API design but works, whereas Windows has a more ergonomic system call interface for low-level debugging features. It's interesting to note these differences in approaches, as these systems have accumulated features and complexity over years but developed in slightly different ways.

Tim Misiak: I found it interesting seeing how Linux and Windows have different philosophies around threading that reflect in their debugging APIs. I was surprised you could discuss register contexts without touching on threads since in Linux you can just have a process with a register context. That was different for me because in Windows, threads are assumed - when you get a register context, it's for a thread, not a process. It's interesting how APIs reflect underlying OS design philosophies.

Sy Brand: Absolutely. In Linux, the delineation between a process and a thread is very thin - they're both implemented as tasks in the kernel. Most differences are in how they inherit things like memory mappings and file handles. They're treated very similarly in the scheduler and ptrace interface. You can initially forget threads exist and just work with a process, then later acknowledge "okay, threads exist and we should handle these."

Recommended talk: How to Stop Testing & Break Your Code Base • Clare Sudbery • GOTO 2022

Rethinking Debugging: Fixing ptrace, DWARF, and Adapting to Modern Languages

Tim Misiak: These debugging interfaces have evolved over a long time. On both Windows and Linux, they're some of the earliest things and don't change often. Sometimes formats don't change because there's no incentive to break compatibility. You see a bit of archaeology in the formats - fields that aren't used anymore but remain. If you had the opportunity to design things from scratch, is there anything particularly gnarly that would be nice to start fresh with?

Sy Brand: Two main things. First is ptrace, which is a single function call used to implement low-level debugger features - reading memory, writing memory, stepping over instructions, and maybe 20-30 different things in one function. Depending on the request, it treats arguments differently and might even report errors differently. Some requests require checking the return status, others require resetting errno before calling and checking it afterward. It's a nightmare as they add more requests over time.

The second is DWARF information, the debug information format for Linux, which has three different custom bytecode interpreters you have to write to parse it. DWARF was designed to have the lowest possible memory footprint. For example, line information that maps machine code addresses back to source code lines would be enormous for large projects, so DWARF stores a program that generates that table rather than the table itself. Your interpreter has to run this program.

This matters less today, but it makes writing a DWARF consumer very difficult, especially when dealing with multiple versions that aren't backwards compatible. If you want to support DWARF 2 through 5, you have to do a ton of work.

Tim Misiak: Backwards compatibility is always hard. With Windows debuggers, we'd want to drop support for old debugging formats, but someone would say a company needs it for a line-of-business app. You end up with mutually incompatible versions.

I understand the desire with DWARF to make it compact when space was at a premium, but now we have much larger binaries with symbol files over 4 gigabytes. Maybe we need different compression approaches that still allow fast access rather than linear searches. I've dreamed about designing something from scratch, but reality is often "good enough" with incremental tweaks.

Sy Brand: Especially because languages keep adding high-level features that need to be represented in debug information. When a language adds a C++20 feature, we need to visualize it properly in the debugger, so we add something to DWARF to express that concept. Redoing all of that from scratch would be a huge undertaking.

Tim Misiak: Language features always create complexity for debugging. You mentioned mangled names from overloading, but then you get anonymous functions, captured variables, closures, and languages like Rust with new concepts. Mapping these to existing symbol formats versus creating new ones is difficult.

We always had to think about whether new language features would be debuggable. That's not even touching really difficult things like async in C# and other newer languages designed in ways that don't fit traditional debugging concepts from the C days.

Sy Brand: That's the kind of stuff that comes into professional work like the Visual Studio debugger - making C++20 coroutines debuggable. If you suspend a coroutine, how does that work? Do you go back to your caller? End up at the next part of the coroutine? Or in some runtime function you don't care about? These problems need creative solutions that touch debuggers, debug information, linkers - everything needs coordination to get information through the entire toolchain pipeline.

Recommended talk: Debugging Under Fire: Keep your Head when Systems have Lost their Mind • Bryan Cantrill • GOTO 2017

Stepping Through Code: Unexpected Complexity

Tim Misiak: I want to change gears slightly. In Chapter 14 on stepping through code, I found stepping to be unexpectedly complex - the deeper you get, the more complex it becomes. Do you think aspects of language design and compilation impact how we support code stepping?

Sy Brand: You're right that it feels like it should be simpler. Stack unwinding sounds complicated because you need to encode where variables are located and how to restore registers. But stepping seems like it should just be "go to the next line of code." Then you ask, "What does 'next line' mean?" and it becomes very difficult.

You might think about stepping through machine code instructions until the line information changes, but what about stepping into versus over functions? What if you hit a branch and end up in another function? What about inline functions, which are a nightmare because they're just a copy-paste of code rather than a function call?

You're logically within the same caller function but also inside the callee. How do you visualize that to the user? What if you're stepping to the next line but jump over a function with a breakpoint? Do you disable that breakpoint or hit it? Then high-level languages add more complexity when you have statements broken across multiple lines and need to figure out how that's encoded in the line table.

Tim Misiak: How would you approach implementing stepping in a production debugger to manage this complexity?

Sy Brand: I used a simplified approach in the book. I like how LLDB handles this with "thread plans" - a stack of plans saying "I'm trying to step over this range of program counter values because the user requested this step operation." When a program stops, it asks each thread "Do you understand why this happened?" until one responds "Yes, this is my stop." This maintains separation of concerns rather than having all breakpoint handling in your stepping code.

Tim Misiak: It's almost like a meta-language describing interactions or a state machine modeling what's happening in the other process. These meta-languages appear throughout debugging - in expression evaluation, type visualization like natvis - they all describe interactions between symbols, debugging targets, and users. Was there anything about stepping that you learned while writing the book that wasn't obvious?

Sy Brand: I'd already explored some of the icky parts of stepping in my blog posts, so there weren't major surprises. As I was writing the early parts of the book, I thought "stepping is going to be a nightmare." Stack unwinding was actually more surprising - I thought it would be fine, but it wasn't. Expression evaluation was another challenging area where I wondered how to write a chapter without saying "first, go write a compiler."

Tim Misiak: I think there's a distinct way you have to approach teaching these topics because of prerequisite chains. Sometimes there's a web of concepts where to explain stepping, you need to assume knowledge of assembly language and disassemblers. It's hard because all these concepts are interrelated.

Sy Brand: I mapped out those dependencies early when writing the book. I knew I'd need an entire chapter upfront on compilation, computer hardware, and operating systems as knowledge needed throughout the book. Then I mapped out dependencies like "to implement stepping, we need breakpoints first." I did a mostly good job, though I had to shuffle chapters when I realized I was reaching for unexplained concepts that would require lengthy asides.

Tim Misiak: I think you structured the book very well. Everything flows logically despite the many interrelated concepts. It's very understandable, and I loved how you used language effectively. I didn't feel lost by any of the C++ code. If I were on a debugger team, I'd make it required reading.

The Future of Debugging

Tim Misiak: Looking forward, is there anything in the debugging realm you're excited about or looking forward to, perhaps in hardware or AI?

Sy Brand: Time travel debugging is very interesting and has developed a lot over the past 5-6 years with tools like rr and UndoDB. It's valuable because when debugging multi-threaded or non-deterministic programs, you often think "I finally reproduced the bug, but it already happened and I can't go back." Time travel debugging lets you go back and understand why things happened.

Another area is debugging optimized code, which I didn't address much in the book. When writing or using a debugger with optimized code, the experience is usually poor - variables are optimized away, stepping jumps around, and there's more inlining. There's a big tooling opportunity in optimized debugging.

Tim Misiak: With time travel debugging, I've found that when people see what's possible, they're amazed. Many don't know these tools exist and haven't tried rr, WinDbg time travel, or UndoDB. People say "it's hard to reproduce my bug," but you only need to reproduce it once if you can record it. It makes some bugs much simpler when you can see stack corruption and reverse-execute to find its cause.

Sy Brand: And use memory breakpoints!

Tim Misiak: Exactly! That's a tool more people should be aware of. And with optimized code, there's always tension between making compilers more efficient by eliminating instructions while keeping code understandable when debugging.

Sy Brand: So many trade-offs.

Tim Misiak: It's been great talking with you, Sy Brand. You've written a wonderful book that I highly recommend to anyone interested in debuggers, systems programming, or anything similar. It's in-depth yet approachable - a great resource for anyone interested in the topic. Thanks very much!

About the speakers

Sy Brand ( author )

Expert in Debugger Technologies and Microsoft’s C++ Developer Advocate

Tim Misiak ( expert )