Behavioral Code Analysis: Why Is It So Hard to Write Good Code?
There’s a link between how organizations write code and how teams work together. Adam Tornhill can make this link visible to help improve your team’s code and your organization's work. This is the first part of a two-part interview.
This episode of the GOTO Book Club was made possible thanks to the support of CodeScene.
CodeScene is a subscription based SaaS and On-prem platform for identifying, prioritizing and managing technical debt. CodeScene is fully automated, and presents innovative visual insights that are accessible to the whole organization. That way, CodeScene serves as a communication tool bridging the gap between the development organization and the non-technical stakeholders so that everyone can have a shared understanding of the current technical debt and other business risks.
There’s a link between how organizations write code and how teams work together. Adam Tornhill can make this link visible to help improve your team’s code and your organization's work. This is the first part of a two part interview. You can find the second part here.
Sven Johann: Hello, everyone. Welcome to the GOTO Book Club. I am Sven Johann. I work for INNOQ. I like improving systems and I like to talk about it, to teach it, to write about it. And today I have, as a guest, Adam Tornhill. Adam, would you like to introduce yourself?
Adam Tornhill: Thanks, Sven, and thanks for having me here. I'm Adam Tornhill. I'm a programmer and a psychologist. I’ve been working in the software industry for over 25 years, and I spent the past 5 years building a platform called CodeScene working with code analysis. I'm also the author of a couple of books and I like to think that the best-known books are "Your Code as a Crime Scene" and "Software Design X-rays."
Sven Johann: Actually, quite an interesting combination. How did it come that you studied computer science and psychology? It's a rare combination.
Adam Tornhill: It might be. I started to work in the software industry in 1997 and after a couple of years, I kind of noticed a pattern that succeeding with software at scale is incredibly hard. I saw one software disaster after another. At some point, I just wanted to try to understand why it is so hard to write good code. And I kind of decided to seek the answers within psychology because psychology has a lot to offer to us in our traditional technical field. You know, psychology is about the way we think, how we reason, how we solve problems and also how we collaborate with others.
Originally, I just signed up for one semester but psychology is really, really fun. So, after that semester, I thought, "Okay, let's dive into one more topic and then one more topic." I ended up spending six years at the university and I took my degree by accident, more or less.
Behavioral code analysis
Sven Johann: Oh, okay. I once did a Coursera course, an introduction to psychology. Basically, it was only the most famous psychology cases, and it was very fascinating. But I stopped after one semester. And I think this rare combination brought, to me at least, a totally new thinking about improving systems. Before I read your first book, "Your Code as a Crime Scene" I used tools like Sonar. I don't think that Sonar is a bad tool. I really like it. It helps you see problems with your code, but it makes it really hard to prioritize your work. If you run a Sonar analysis and it tells you you have 3 million findings and your technical debt is 2,000 years, it's just not helpful. I have to invest a lot of time to understand what's really important and also, even if I figure out what's important, the tool cannot really help me.
That's how we worked in the past, but then your book came along accompanied by tooling and I think it made it really easy to prioritize. Maybe we should talk about the way of prioritization. You called the type of prioritization, if I say that correctly, a behavioral code analysis. Is that correct?
Adam Tornhill: Yes. That's correct, and it's pretty much about taking my psychology background and putting it on top of the technical work that I do. I agree with you, to me, the whole idea of static analysis is also a valuable practice. It's something that works very well as a low-level feedback loop when writing code. But static analysis, like you point out, has this property that it treats all code as equal because there is no such thing as a business priority when you just look at the source code and that leads to having thousands of findings.
What I do instead is that I look at how we as a development organization interact with the code we are building. In practice, that means looking at things like version control data. You could then combine that with Jira data because then you can calculate the impact of each one of the findings. That gives you a window into all those quality issues and tells you that this is what's important and that this might be problematic long term but we can most likely live with it for now.
So that's one of the things. The other thing that I noticed with static analysis is that once an organization grows beyond, I would say, just a handful of people, then organizational factors like coordination between different teams, team coupling, key personnel dependencies, knowledge distribution, all that kind of stuff tend to become maybe even more important than any technical properties of the code. And static analysis was simply not designed to help us with those aspects, right.
Sven Johann: Exactly.
Adam Tornhill: That was pretty much why I decided to call it behavioral code analysis. It's more about the behavior of the organization rather than the code.
Sven Johann: When I tried to prioritize, I also thought about our customers. You know, a lot of code is often used or executed because of customer use. Would that also be a part of an idea to prioritize by how customers are interacting with the system?
Adam Tornhill: I mean, it could definitely be an aspect. You know, software is interesting because it has power laws everywhere. You look at code and like 80 percent of your code is never touched. You look at your feature set and it's most likely the same thing. Most customers use a smaller subset of your features. So clearly there is a priority aspect there as well.
Sven Johann: Yes. For me it was easy to convince, for example, a product owner to improve code when I could say this is part of the 80 percent usage pattern or something like that.
You wrote the book "Your Code as a Crime Scene" and now you published a new book, "Software Design X-rays". What was the motivation and the new learning, that you had to write a follow-up book?
Adam Tornhill: "Your Code as a Crime Scene" is a book I wrote in 2014, and what I did back then was to capture a number of techniques that I had used myself in my previous life as a software consultant. Techniques for prioritizing technical debt to communicate with non-technical stakeholders. It's stuff that I found useful myself. I published this book and it got quite a lot of attention. I realized that if I want these techniques to become mainstream, then there has to be some good tooling around them. So shortly after publishing that book, I founded CodeScene, the company to build those tools.
As part of those early years with CodeScene, I was fortunate to work with so many organizations across the globe on using different technologies of all kinds of scale. I learned so much from that experience. Basically, in "Software Design X-rays" I try to reflect on those learnings: how well do these techniques work in practice, and even more important, how do you act upon them? Because information is only good when actionable. So that was the main motivation.
Hotspot analysis and technical debt
Sven Johann: I liked a lot about the book, that it usually has recommendations on what to do with those findings, but we can dive into that a bit later.
For me still, the most fascinating part of your work is the hotspots and X-rays. So, what is a hotspot?
Adam Tornhill: A hotspot is complicated code that we also have to work with often. It's a combination of technical factors like code quality issues and design problems with frequent development activity in that part of the code. That's a hotspot. And the whole hotspot concept came out of my experience of using static analysis to try to prioritize findings. We talked about it earlier, I simply lacked the priorities from the business side, and I also faced these challenges of communicating with non-technical stakeholders, because very often I try to explain that, "Okay. We have the business pressure for this feature but maybe we should take a step back and improve what's already there so that we can implement it safely and on time."
Hotspot Analysis in CodeScene
I found that those conversations were extremely hard to have because software simply lacks visibility. There's no way I can take my software system, pick it up and show it to my non-technical stakeholders. At the same time I was wrestling with these problems, I was also in the middle of my psychology studies and I took a bunch of courses in forensic psychology. And that's where the hotspot concept actually came from.
I was inspired by a technique called geographical offender profiling that is used in forensics. What you do is basically you look at the geographical distribution of different crime scenes in a city or in an area and then you mathematically calculate a probability surface that can kind of help you predict what's the home base of the offender so you know which area to patrol and supervise.
When I learned about that, I think something just clicked. I thought, "Wow. What if we can do the same for software? What if we can take this large software system, project a probability surface onto it and say, hey, these are the parts of the code that are very likely to change in the future too and these are the most stable parts of the code?" Then we can suddenly get some real priorities on what needs to improve and where code quality is most important. That was a very lengthy answer to a simple question.
Sven Johann: You do that by combining static code analysis and mining software repositories. So, code that has a lot of commits and that's kind of problematic that we should work on. For me, it's so easy now to explain, let's say, to business stakeholders, why this is important because of technical debt. I know the metaphor technical debt lags a little bit with financial debt but still, you can say where we have an interest and we have a principal we have to pay back. And what is actually the interest rate? And you answered the question. The interest rate on code means I have to spend more time thinking about pretty bad code when I add a new feature because the code is not ideal for implementing the new feature. And of course, if I have code that is really bad and I have to change it quite often, the interest rate is very high. So, to me, it really sticks to my mind.
What is your experience with your customers? I am only one single person who is doing it rarely but you do it all the time. Does that speak to, let's say, non-technical stakeholders all the time or rarely. How does it work for you?
Adam Tornhill: The short answer is that I have a good experience which is great. Otherwise, I wouldn't have food on the table and I'm happy about that. Because of these techniques I originally targeted engineering organizations and occasionally, as a bonus, non-technical stakeholders. My experience is that non-technical stakeholders like it because the hotspot analysis sends a positive message. It basically tells you that, "No, you don't have to fix all technical debt. You have to fix this debt. That's really critical. But you can safely live with this technical debt. You need to be aware of it, of course, but you don't need to pay it down now."
It's a positive message for them because they are also wrestling with these long- and short-term priorities, like you have all these features that the business pushes for and at the same time, there is some awareness that we need to improve things. I like to think that's the main contribution.
Sven Johann: Exactly. There is always something to improve. If someone asks me when we have a software system to improve, there are a gazillion things to do. The question is what brings the most value to your work. You always have, of course, opportunity costs. And, you know, if I improve this, I cannot improve the other thing. I think the hotspot analysis makes our not-to-do lists longer and it shortens the to-do list.
Trends analysis: bad code is more or less bad from the beginning
Sven Johann: What you also have is a trends analysis, and I like trends. If you have a negative trend it tells you you have to improve something. It's also the other way around. How do you analyze those trends? How are you doing it with your tooling?
Adam Tornhill: To me, trends are what make all the data truly actionable. I very often meet organizations that might be struggling with something like a legacy code migration or they have an existing system that they're earning money on. They need to continue to be able to maintain it and you have this constant pressure for new features. So, I always tell them that to manage technical debt, step zero, the absolutely first thing to do is to avoid taking on more debt. Simply put a quality bar on what's already there and make sure it doesn't get worse no matter where you start from.
I think that's again something that resonates with managers and technical leaders because I'm yet to meet anyone that tells me, "We want our code to get worse." You never hear that, right? It's actionable, that's the first thing. The other reason I like trends is because they carry so much more information than any absolute values. For example, in CodeScene, we have this concept of code health. It's a metric that goes from 10, healthy code, that's easy to understand and low risk, all the way down to one which is code with severe maintenance issues. And let me say that I will point you to a hotspot with a code health of five. Is that good or bad? Well, if that hotspot had a code health of eight the week before, then it's disastrous because it degrades rapidly. It's something you need to act upon now. But on the other hand, if that hotspot had a code health of two the week before, then it saw a dramatic improvement that needs to be celebrated. So, trends carry this information whereas absolute values don't.
Sven Johann: When I used CodeScene and I saw my first analysis, I was like, "Okay, now I see the red parts." We also have to say we have some screenshots on that one and you have those parts which are bad code change often. And that triggers me. All the red things, those are the ones which should be on my to-do list. But how do I see, let's say, parts which are good but they are getting worse and worse but they don't show up red on CodeScene? So how do I figure out negative trends which are not horrible yet? Do you have those? Can I look at something there in CodeScene?
Code Health Analysis in CodeScene
Adam Tornhill: Yes, you can. If I remember correctly, I had a chapter discussing this in the "Software Design X-rays". I think it's the last chapter where I talk a little bit about how you can use the trends as an early warning system. I think this is really, really important. It's so important that we spend a lot of effort building it into CodeScene as well. We integrated things like total requests and build pipelines, so you can get an alert when things like that happen. I think it's vitally important because one thing I learned that actually surprised me was that, like so many others in the software industry, we have this idea that code starts out fine and then they would add the pressure of the business and the code kind of degrades over time. But that's not what's happening in general. What's happening, in general, is that if you find a module with quality problems, low code health, most likely those problems have been introduced very, very early on, often in the first version of that module. And once a module degrades in code, that tends to be a self-amplifying loop because the hurdle to refactoring is large. So instead, we kind of squeeze in more complexity and more complexity and it gets worse and worse.
Using this kind of information you can actually set different quality gates and say for new code, this is the minimum level of quality or health that we accept. And then you can use it as an early warning system which allows you to prevent a ton of future problems.
Sven Johann: For me that was a surprising insight from the book that bad code is more or less bad from the beginning. But you also wrote that bad code usually stays bad. So, it's bad, you fix it but then it's coming back. Why do you think that's the case?
Adam Tornhill: I think there are several reasons. It's one of the things I noticed early on, shortly after writing "Your Code as a Crime Scene" when I started to work with this in organizations. I kind of figured out that, we found these really, really problematic hotspots. Then you start to look at the trends and you see that these hotspots have been problems for years. And at first, I couldn't explain that. So why hasn't it been refactored? There can be multiple reasons but one of the reasons that I think is very common is that in those really problematic hotspots, there are some code quality issues that tend to lead to organizational problems as well.
If you look at a different behavioral code analysis, you look at the main contributors and you look at how fragmented the contribution activities to that file are. So you look at a purely human perspective, and you will most likely see that that code is not written by one person. In a large organization, it might be written by 50 or 60 contributors and that most likely means that no, you don't have 50 people that know that code. You have no one that actually knows it because everyone has their small piece of information and the original authors might long be gone. It might often be that we don't really understand the problem domain well enough in that code, we might not understand the solution well enough and that kind of, again, raises the hurdle of refactoring it. And in particular, we don't feel any kind of ownership for us as a team. So that makes us much less likely to invest the risk into refactoring it. That's my hypothesis.
Sven Johann: What does it mean for an organization if I have many people working on the code? You said, the problem is not well understood. What are you recommending companies to fix that bad code which is not going away? What can you do to make it happen that it doesn't come back?
Adam Tornhill: I like to think that the single most important thing you can do is to get situational awareness so that everyone in the organization, engineering, business, everyone knows where the problems are, how severe they are and what they actually mean to the business. Because once you have that awareness, the next steps tend to be not easy but doable at least. And then it depends on what kind of issues you have.
Here’s another behavioral code analysis technique that looks at something called system mastery for example. System mastery is not a technical property. It's about how much of the current code is written by the current team. Because I often see this as well, that social and organizational factors like system mastery tend to influence how we perceive the complexity of code. We tend to overestimate the complexity of unfamiliar code. And that means you can hear complaints about, "Hey, this piece of code is overly complicated. It's a mess." Whereas in reality, it might simply be that the team never had a chance to get onboarded properly and understand the domain and the solution.
If you have that situational awareness and know what your actual problem is, then you can also address it and put the right measure upon it.
Sven Johann: One interesting finding of yours for me was, when you analyzed all sorts of systems, you had systems that have code which is, let's say, rarely touched and then you have code which is often touched but only by a handful of people like one or two or three people. And then you have the code which is changed often by many people but also it's not immediately changed. It's changed from time to time and that puts another burden on it. If we have a mental model on the code because we work very often with it, we usually don't introduce parts. But if we rarely touch code, we are not 100% familiar with, then we introduce new problems. That was also something I found quite interesting.
Adam Tornhill: Yes, I definitely think that it's a very common problem. I see that you get lots of contributors on a particular piece of code and if that code is also a hotspot, then the situation isn't particularly good because what it basically means is that even if I write the piece of code myself today, then look at it three days later it looks completely different because five other people have been working on the code, so it's virtually impossible to maintain a stable mental model.
And what's so fascinating about this is that very, very often it's that technical properties of the code possibly lead to organizational issues. So, what you will often see is that when you have that kind of code that attracts many different contributors, it's probably because it has good reasons to do so. It's typically a module that's very low on cohesion, meaning it has tons of different responsibilities. So consequently, different teams working on different features, all end up in the same part of the code because it ties everything together. It's a problem that you simply cannot solve with a reorganization because then you will introduce other bottlenecks.
The proper solution there would actually be a technical refactoring or redesign which would improve the organizational fit. And this is one of the things I found fascinating myself.
Software design X-ray
Sven Johann: Yes. Let’s move the topic to the X-ray. So, what is a software design X-ray?
Adam Tornhill: An X-ray is basically a hotspot analysis at the function level. The reason I started to develop the X-ray analysis was that after writing "Your Code as a Crime Scene", when I analyzed all the systems I noticed that the hotspots that I tend to find that are most problematic are typically really, really large files. It could be thousands of lines of code, occasionally, tens of thousands of lines of code. It can be extreme cases like that.
If you take that large system and you narrow it down to a single hotspot and can say, "Hey, this is your biggest problem." But if I go to a client and say, "Rewrite this code. It's only 20,000 lines of code," they are not going to be happy. It's not actionable. An X-ray analysis takes that complex hotspot, parses it into separate functions and then it looks at the git log. Where do each committed over time? We sum that up and you get hotspots at a method level. And that's what I've been using successfully as a starting point to prioritize refactorings in large, complex processes and modules.
The more metrics the better?
Sven Johann: Yes, if you tell someone that 20,000 lines of code class are a problem, they would say, "Well, we figured that out by ourselves." I think it's also used in many places. For example, Simon Brown has the C4 architectural model and there you also have this kind of zooming or Gernot Starke has arc42. There, we start from a very abstract view and then we dive always deeper. It's like Google Maps when you look at the map and then you can always go deeper and deeper. So, you're not lost in details. You can navigate quite nicely. And I think that's also quite a good idea.
What I was wondering you use very simple metrics. When I look at a typical tool like Sonar, there are lots of metrics which tell me what is a problem. I think you only use lines of code of a class or a method or intention from lines. So, no dependency analysis and things like that. This is based on some research. Why do you focus on very few metrics and why is that still super powerful?
Adam Tornhill: I like to think we evolved that a lot over the past years but basically, the thinking process is that if you look at the research, you will see that a simple metric like the number of lines of code is a horrible metric. It's really, really bad. The thing is that most other metrics are just as bad. And lines of code is intuitive, it's simple to calculate, simple to reason about. So that's basically where I started out. We need a complexity dimension for hotspots. Let's use lines of code. And it surprised me how far that actually gets us. It takes us really, really far.
One thing I learned after writing "Your Code as a Crime Scene" is also that lines of code works well, it correlates with most other metrics. But I also learned that at some point, you want to dig a little bit deeper and provide some actionable advice around what type of refactorings do you need to do and stuff like that. What I've been working on a lot over the past year is to add different static analysis techniques into a behavioral code analysis via CodeScene. What we tried to do is to look more at the design level like identifying brain methods, drive violations, you know, the stuff that really, really matters for maintainable code.
So, these days I might still use the number of lines of code as an initial visualization, to get situational awareness but when I dig deeper, I typically use this more elaborate set of metrics. I hope that makes sense.
Sven Johann: Yes. So, when I look at the hotspots, I look at lines of code and let's say a cyclomatic complexity represented by intention. If that's the correct word. But clone detection, violating the DRY principle, I was not aware that this is also part of the hotspots.
Okay, thank you, Adam. I think we have a couple of follow-up talks so we will have a second part about more organizational problems you can detect. But for now, I want to thank you for the discussion on all the findings and prioritization work you did on the code level. I thank you for your time.
Adam Tornhill: Thank you very much for having me. A pleasure, as always and I hope to talk to you again soon.
Part 2 of this interview can be found here
About the AuthorAdam Tornhill is a programmer who combines degrees in engineering and psychology. He's the founder of CodeScene where he designs tools for software analysis. He's also the author of Software Design X-Rays, the best-selling Your Code as a Crime Scene, Lisp for the Web and Patterns in C. Adam's other interests include modern history, music and martial arts.
Check out some of Adam’s past talks
GOTO Copenhagen 2019 - Prioritizing Technical Debt as if Time and Money Matters
GOTO Chicago 2016 - Treat Your Code as a Crime Scene
GOTO Amsterdam 2016 - Embrace the Past: How SW Evolution Lets You Understand Large Codebases