This episode of the GOTO Book Club was made possible thanks to the support of CodeScene.
CodeScene is a subscription based SaaS and On-prem platform for identifying, prioritizing and managing technical debt. CodeScene is fully automated, and presents innovative visual insights that are accessible to the whole organization. That way, CodeScene serves as a communication tool bridging the gap between the development organization and the non-technical stakeholders so that everyone can have a shared understanding of the current technical debt and other business risks.
There’s a link between how organizations write code and how teams work together. Adam Tornhill can make this link visible to help improve your team’s code and your organization's work. This is the second part of a two-part interview. You can find the first part here.
Sven Johann: Hello. Welcome to a new episode of the GOTO Book Club. I am Sven Johann. I am a Senior Consultant at INNOQ. I spend my time improving software systems. And today I have my second conversation with Adam Tornhill on his book, "Software Design X-rays." Hi, Adam.
Adam Tornhill: Hi, and thanks for having me here again.
Sven Johann: So maybe for those who didn't read first part, we highly recommend reading it, please introduce yourself.
Adam Tornhill: I'm Adam Tornhill. I'm the founder of a company called CodeScene where I tackle the challenge of technical debt. And my background is maybe a little bit different because actually, I have two backgrounds. One in engineering. I've been a software developer for a long, long time. But also, in psychology which is one of my big interests. And what I try to do with "Software Design X-rays" and what I work on these days is to take my psychological background and put it on top of my technical experience. So that's what I do. I'm the author of "Software Design X-rays" and a bunch of other books.
Sven Johann: Thank you. In the first episode, we talked about behavioral code analysis, hotspots, X-rays and now in the second episode, we want to start with change coupling and then move on to what change coupling actually means, not on the code level but also on the team level. What is change coupling?
Adam Tornhill: Change coupling is a very interesting software analysis technique. It's interesting because it's something you cannot do just based on the source code. As a developer, we typically refer to coupling as some kind of dependency. We have some piece of code over here and it uses some piece of code over there. They depend upon each other. Change coupling is different because it's more about logical dependencies. In a behavioral code analysis, you measure how the organization interacts with the code they are building. In change coupling, we pick up patterns and it could be anything from the git log to more advanced architecture analyzers, we pick up patterns in Jira later and figure out that each time I need to modify a piece over here. I have a predictive modification in the module over here. So, there is some kind of logical dependency between them. And using change coupling we can get a lot of interesting feedback on how well our architecture supports the way we work with it. So that's the short description.
Change coupling explained
Sven Johann: We all know the situation where we change, let's say, a piece of code and then we have to change five completely different unit tests because of some copy-paste coding. How does the change coupling relate to clone detection? Because clone detection is the thing I know if I change something and then I have all those subsequent changes.
Adam Tornhill: Yeah, so that's an area where I've done a lot of work. The original challenge I had was that this was way before CodeScene, way before I wrote the book. I was working as a software consultant with a team that was very heavy on test automation. The problem they had was that they got a lot of flaky tests. This was automation at the system level, at the application level. The problem was of course that, as a developer, you pushed some changes, you updated some tests and then you had three other tests failing because they also depended upon some logic.
Now using change coupling, what we could do was we could, first of all, visualize the changes. So, we could figure out that you make this tweak in the application code and now we have to modify this whole cluster of files, right, so that we kind of highlight the problem. But to make it actionable, we wanted to drill deeper and figure out if the reason that we have this change coupling, is due to drive violations, violations of the don't repeat yourself principle. And adding copy-paste metrics on top of this proved to be really, really useful.
Change coupling with clone detection, Linux example
To me, change coupling is an excellent way of benefiting from clone detection tools and copy-paste detection tools because the main problem with copy-paste is that there's simply so much of it. So, you look at the research and you see that there's somewhere between 5% to 20% of all code out there is duplicated to some extent. And not all of that application is bad per se. So, using change coupling, it gives us a window into those massive amounts of duplicated code and we can figure out that, "Okay. These are the software clones that are actually expensive to maintain, right, because we have these predictive modifications."
The copy-paste dilemma
Sven Johann: The clone is not an exact clone, right? Usually, you copy something and then you paste it and then you change this tiny little bit. But of course, if one thing changes here, you have to change all of the rest there. Clone detection tools also find those half-baked copy, let's say, copy-paste-edit pieces of code, right?
Adam Tornhill: Yeah, that's right. I think that clone detection software is a pretty old technique. It's something that's been in industrial use for almost three decades. But again, I think that simply the sheer amount of duplicated code makes it very hard to act upon it because to me, I think one of the main problems with copy-paste is that we, as developers, myself included, are kind of conditioned to despise copy-paste. We have learned over and over again that it's a bad thing to do, right? The problem is that it's simply not true as a general statement because let's say that you have a piece of code here. You copy-paste it and now there are these two copies. They evolve in completely different directions. I could very well argue that copy-paste was a good starting point.
Change coupling by file
Or maybe you duplicate some code and you never need to touch it again. It might not be ideal but is it a really a big problem. So, the problem starts when copy-paste kind of drives us, our behavior, in a negative direction. We have to make those predictive modifications. It's easy to miss updating one of those clones. And that's why I think this combination of change coupling and clone detection really, really adds value to our daily work.
Sven Johann: I agree that it's not always bad. I tend to do first-time copy-paste-edit but when I do it the second time, then it's time to say, "Okay, now I need an abstraction". Daniel Terhorst-North once said "DRY is the enemy of decoupled." So that's something I'm wrangling with but in terms of microservices for example or just different, completely independent modules. Whenever I change something far over here, I have to change something far over there even in a totally different repository. Then things become a bit problematic. I wonder how do I find out if I have a problem because maybe I only have this onetime thing as you said or it's a real problem. How can I find that out?
Adam Tornhill: That's a very interesting question because I think the incredible hype behind microservices over the past five to six years kind of flipped the whole architecture and design thing around a little bit because there are definitely scenarios where you want to violate drive principle because loose coupling is simply more important. It's a more important capability than minimizing the amount of duplication. At the same time, what you want to avoid is to have exactly the same behavior in two different places if that's a behavior that's been frequently modified. Because if you have that, it could very well indicate that maybe you missed a microservice to take on that shared responsibility. Maybe it's a separate business capability, right?
There are two aspects to it. One is getting the data. How do we figure out change coupling across git repositories? The second challenge is how do we act upon it. And I wrote about this in the book. So, change coupling in simplest forms is based on finding patterns in the commits, files that change together over and over again as part of the same commit set. And that clearly doesn't work across repositories. What we do there is that you simply look at a ticket system instead. Things like Jira tickets, Azure DevOp tickets, whatever and see that if the commits reference the same ticket, then we know that they're kind of depending upon each other. And if it happened often enough, then that's a very strong signal. So that way you can detect change coupling down to the function level, even across git repositories.
This is something that's very powerful and you can, of course, add clone detection on top of that as well and then see is this a drive violation that's desirable or is it an actual problem that we lack some level of encapsulation or abstraction.
To me, that it's a hard engineering problem but the more challenging aspect is to make that decision, right. Do we live with this or do we ignore it? What I tend to recommend is I use a general heuristic that I call the heuristic of surprise. I simply visualize the change coupling and then I take everything I know about the architecture and the problem domain and any surprising coupling that I find is usually bad because a surprise is simply one of the most expensive things you can ever have in software architecture.
Sven Johann: Yeah. Rarely people are positively surprised.
Adam Tornhill: Yeah.
Sven Johann: It's always "You know, don't make your manager being surprised because usually, it's negative." But I mean, what would be an example for a surprise. Let’s move that question a bit to the back. You said you wanna show the coupling. So how do you show the coupling? I remember from when I looked at CodeScene and also when I remember the book I show two files, right? And then there is the amount of coupling of those files and how often they change together, so to speak. So, what's the thinking about this approach? You know, showing the files and the amount of coupling.
Adam Tornhill: The interesting thing with change coupling and a lot of those other behavioral code analysis techniques that we talked about in the previous episode like hotspots is that they scale at different levels. You can use change coupling at the function level. You can use it at the file level and you can use it at an architecture level. And that's where I usually start when I pick up a system. I look at the different architectural elements. They could be either different layers, different components or they could be microservices. And I simply figure out the change coupling between, let's say, the different services. So, I typically visualize it. You have all your services like a wheel and then you see the connections between those sections. And that's usually my starting point.
I've been fortunate to work with so many organizations across the globe at all kinds of scale and I see several organizations are doing a really good job with it. But occasionally there is a surprise. And the surprise is very often that even though you put all these efforts into defining proper service boundaries, we know about bonded context, domain-driven design and all that stuff, but that is incredibly hard in practice because we basically have to be domain experts to do that. So, the surprise tends to be that, "Hey, we're actually starting to build our distributed monolith." It's very easy to highlight that with change coupling. When you change the service you have a dependency to five other services and then you look at the details. What kind of functions or couple do you often see that you implement a new capability here on your business capability and to pull that off, you need to query two different services and update the state in a third one. So that's a typical warning sign that I tend to find.
Sven Johann: As we said, maybe we have two code clones in two different services. And then we change it once and that's fine. It just happened once and that's probably not a problem. If we change it the second time, the third time, the tenth time, 100% of the time, we change those files together in different services. Is there any recommendation when I should start worrying about those temporal dependencies?
Adam Tornhill: I find it very hard to give general recommendations in this area because with hotspots it's usually quite easy. If you work on a part of a code a lot and that part of a code has a high degree of technical debt, then it's very easy to say that, "Okay, we need to fix this." But with change coupling, it's so much harder because it depends on the lifecycle of the code base. If it's something you just got started on, there's probably going to be a lot of experimentation and learning going on. So, I do expect a higher degree of change coupling and more complex change coupling.
In a more stable system, once the basic architecture is set, I would be worried about any change coupling that's over 20%. That would worry me.
Sven Johann: Okay.
Change coupling gets more expensive with distance
Adam Tornhill: I just wanted to add that the other heuristics I use is basically that change coupling gets more expensive with distance. And what I mean with distance is first of all architectural distance like are these two completely unrelated parts in theory but in practice tied together. Then that might very well be one of those surprises.
The other one is dependencies that cross team boundaries. That very quickly becomes expensive because the teams end up in all these coordination meetings and they might have conflicting changes in their API and stuff like that.
Sven Johann: Yeah. I can sing a song about those kinds of problems. I think if you have different repositories and one team is responsible for that, okay. It's not nice. But it's still let's say manageable and it's also easy to fix because it's one team. Just a couple of people own different repositories and can think about reducing or getting rid of that coupling. But across teams that's going to be tricky. So, what are some interesting examples from your experience on that problem, you know?
Adam Tornhill: I have so many. I’m not even sure where I should start. I think one of the most common issues I've seen, and this used to be more common maybe a few years back, like, two, three years ago, was that several organizations that I worked with asked me, "How should we organize our development teams?" So, we started out with component-based teams. Then we noticed that we had very, very long lead times for new features because you had to do these handovers all the time between change coupling components. So, we changed to a feature team and now we kind of noticed that our whole quality aspect just went south because suddenly we find that we have 10 different teams all working the same parts of the code.
It's a much harder answer because the organizational aspect always goes hand in hand with the software architecture. You really, really need to balance these two. And if you want to have component teams, then I think a much better approach is what the "Team Topologies" people would recommend, stream align the teams. That's the only way I found that actually scales. You have the teams separated based on business capabilities and those business capabilities are reflecting the architecture. I really haven't seen anything else that works at scale.
Team structure visualised in the code
Sven Johann: Yeah, I was about to say. In the "Team Topologies" book they have this quote from, I believe, Ruth Malan. There is the software architecture but there is also the communication architecture basically on Conway's Law. And if you, as an architect, don't care about team setup and communication structure between teams, you give the architecture task to someone who cares like a non-technical manager. And then you end up in those problems you described.
Is there a way how to use those insights on a change coupling based on different teams working on the same code base to, let's say, to refactor teams towards those stream aligned teams?
Adam Tornhill: Yes, there are. And I like to think this is one of the most important contributions by behavioral code analysis and "Software Design X-rays" and what we do with CodeScene as well of course. It brings visibility to the people’s side of code. I always talk about this like the grand tragedy of software design, that the organization that builds the code is invisible in the code itself.
Microservices, change coupling across teams
Using behavioral code analysis, you can actually visualize that. You can show which team works where in the source code. And you can overlay that with change coupling information so that you can easily show those bottlenecks that when we modify this service here, that team has to coordinate with four other teams. You can visualize that. And from there, it's mostly about domain expertise to figure it out. Quite often, it's not enough to just shift the teams around. Quite often, you have to do a more fundamental change and maybe those two different services or components are actually the same component, so you need to merge them together. And occasionally, you find that an organization simply lacks a team to take on a new responsibility.
Sven Johann: So it's the sames pattern maybe we discussed the last time where the hotspots actually create a not-to-do list. You have so many problems but with the hotspots, you get a focus on the problem. And here we would have the same. If you need to think about your team setup or your "Team Topologies", to use the word, you can look at those parts where many teams own a lot of the same code and also where are teams actually quite distinct from each other in the codebase. And I can use the problematic parts to identify where we need to create better boundaries, team boundaries.
Adam Tornhill: Yeah, and I think that's important because the moment you're able to visualize that, then you can start to have a meaningful conversation between engineering and the business. In my experience, even with organizations that have a fairly high degree of team autonomy, a team can usually not decide what they want to do. They can decide what they want to do differently but they cannot decide for the whole organization. We need some kind of buy-in and being able to show that data really, really helps in my experience.
Sven Johann: Yeah. If you just say you have a problem or you have a feeling, usually people don't react to it. And in your book, you mentioned that you can test your architecture. Let's say architecture in terms of dependencies on teams. If you implement a new feature, how many people do you need to bring into one room to find out how good or bad your real dependencies are, either your hard dependencies or your soft dependencies? If I want to implement a new feature and I have to call a big staff event, I obviously have a problem. What's your experience when you do those tests with your clients?
Adam Tornhill: My experience is that it's very easy to end up doing local optimizations. Organizations always know themselves that, "Hey, we have a lot of coordination meetings and we have a lot of sync meetings." And what typically happens is that many organizations tend to stuff additional ceremonies on top of that. We have a problem with sharing information for example. So, let's do additional information-sharing meetings. Let's do additional status meetings so that everyone involved is up to date and stuff like that. And in my book, that's basically just the way of covering up the symptoms, not the real root cause. Because you really do want to keep coordination and meetings to a minimum. And the only way of doing that is to make sure that each team can operate autonomously, so that they can have this fit between an organizational unit and the actual architecture. So that's where I come from.
Sven Johann: Yeah. I feel the pain already. I've been in meetings with around 60 coordinators and at least I was thinking, "This cannot possibly be true." How do you visualize those dependencies between teams? To me right now it feels like it's something you should use in each and every project to really reduce your team dependencies.
Adam Tornhill: Yeah, and I find it fascinating in particular if we talk about stuff like microservices where we're very heavy up on the measuring all kinds of things. We are measuring performance, we're measuring scalability and all that stuff. We have an alert system. But then we have these other architectural properties that are so important like loose coupling, independent deployability, autonomous teams and we are not really measuring that. So that's where I think behavioral code analysis can kinda fill that gap. And it can work as a monitoring system, as an alert system for architecture. And that's the way I've been using it over the past couple of years and I have a very good experience with that.
Sven Johann: Yeah. And even alert triggers. What are the typical steps to fix the problem?
Adam Tornhill: So, what actually surprised me the most because it was so unexpected for me was that a lot of those problematic dependencies between different teams or coordination bottlenecks, very often their fix turned out to be technical. It's something we got wrong either in the high-level design or in the architecture. We stuffed too many responsibilities into one module and now multiple teams have a reason to touch that module, right, or maybe we do have a very modular architecture but it's the wrong granularity on it. Maybe it's the wrong modeling concepts that we use. And very often it turns out that, as software architects, it tends to be very common that we identify technical building blocks and we kind of build our architecture around that and that's, like, more or less asking for heavy team coupling.
Sven Johann: Yeah. I like that. Little code changes to fix big problems. I am currently in a project with, let's say, 300 people. Without the data visualization on the coupling it's incredibly hard to have a meaningful conversation about what the problems are.
Adam Tornhill: Yeah. And I think that even minor improvements at that scale have a big payoff because imagine just the staff cost of 300 people. If we can save 1%, that can be quite some party.
Sven Johann: Yeah. Exactly. I mean, coordination meetings with 50 people every once in a while, why not analyze the code base instead?
Adam Tornhill: You could almost buy a private jet for that money you save, right?
Sven Johann: Exactly. From my perspective, I could have many more questions but let's close it here. Thank you, Adam, for the conversation. I can really recommend not only reading your book but also check out the tool. Thank you.
Adam Tornhill: Thanks a lot for having me. A pleasure.
Part 2 of this interview can be found here
About the AuthorAdam Tornhill is a programmer who combines degrees in engineering and psychology. He's the founder of CodeScene where he designs tools for software analysis. He's also the author of Software Design X-Rays, the best-selling Your Code as a Crime Scene, Lisp for the Web and Patterns in C. Adam's other interests include modern history, music and martial arts.
Check out some of Adam’s past talks
GOTO Copenhagen 2019 - Prioritizing Technical Debt as if Time and Money Matters
GOTO Chicago 2016 - Treat Your Code as a Crime Scene
GOTO Amsterdam 2016 - Embrace the Past: How SW Evolution Lets You Understand Large Codebases