Home Gotopia Articles Better Tests at ...

Better Tests at GitHub & Commodore 64 Music

The engineering culture, core functionalities, and its monolithic architecture are just some of the factors behind GitHub’s success. Ole Friis Østergaard talks about the special division for analyzing tests that are not behaving as expected. In such a complex environment, their work has a big impact on the entire system. Discover how their engineering culture, approach to software overall, and some Commodore 64 love have inspired all this.

Share on:
linkedin facebook

Read further

The engineering culture, core functionalities, and its monolithic architecture are just some of the factors behind GitHub’s success. Ole Friis Østergaard talks about the special division for analyzing tests that are not behaving as expected. In such a complex environment, their work has a big impact on the entire system. Discover how their engineering culture, approach to software overall, and some Commodore 64 love have inspired all this.


Hannes Lowette: Hi. I'm Hannes Lowette from a company called Axxes in Belgium and I'm sitting here at GOTO Aarhus with Ole Friis Østergaard. Ole, can you introduce yourself to the audience?

Ole Friis Østergaard: Yes. So I'm Ole Friis Østergaard, and working at a company called GitHub where the world collaborates on software. And I've been there for a couple of years also doing npm work. So, GitHub acquired npm a couple of years ago, so I've been doing work there. Previously doing work at Microsoft, also Trifork, the company doing the conference we are at right now, GOTO Aarhus, and a company called Xamarin where we did UI testing of apps on actual devices. A very nice project. My professional work has mostly centered around automatic testing, making the experience nice for developers, but also doing lots of other stuff, DevOps, and actual feature development as well.

How GitHub treats flaky tests

Hannes Lowette: It's good that you mentioned testing. I saw that your first talk at the conference is about making tests better. Can you tell me what that is all about?

Ole Friis Østergaard: It's kind of a big problem nowadays, I would think so. There used to be two kinds of developers, those who write tests and those who don't write tests. And lately, I've only seen one of them, which is the developers that write tests, which is kind of... Testing has won, I guess. Maybe it's not like that everywhere, but I only seem to talk to people who actually write tests. And that also means your test suites are getting bigger and bigger as you write more production code.

And with everything in life, when you get more and more of it, you see different problems as well. That's only natural. So, at GitHub, we have looked into fixing the tests that are flaky. So, I'd say the flaky test is a little hard to define, actually, when you really think hard about it, but on the surface, a flaky test is one that, for no apparent reason is mostly successful, but once in a while turns into a failure.

Hannes Lowette: And then it messes up a test run and you have to rerun it and it takes time.

Ole Friis Østergaard: Yes, it takes time. If you have a continuous integration built, then...

Hannes Lowette:  it  stops.

Recommended talk: Continuous Delivery Pipelines: How to Build Better Software Faster • Dave Farley • GOTO 2021

Ole Friis Østergaard: ...you might have lost 15 minutes of your life and you have to rerun it. This is what most people do, myself included because then it might turn green, and then you just wasted 15 minutes of your life and you just want to deploy this new feature. So, yeah, weeding out these hardly broken tests.

Hannes Lowette: Do you do that for GitHub's own tests or are you also using our customer data in your data set?

Ole Friis Østergaard: No, it's only for GitHub. And actually, only...so far, only part of GitHub's code base. We have microservices and different projects, but we've been focusing on what we call the monolith which is a big Ruby and Rails application. We are planning on expanding into other areas, but we got a lot of experience doing this, which is why we thought it was worth talking about. So, basically, taking the tests, looking at them, why are they flaky, and kind of putting them into various categories.

Hannes Lowette: Right. What kind of categories would we have that make tests flaky? What can we take away as developers from your findings here?

Ole Friis Østergaard: Yes. So, we have a few categories which... The one we've seen the most is...I think it might be about timing. So, most developers don't consider the fact that a test takes time. So sometimes you might read the system time in one line and later compare that to the output of a function on some date that's being printed in a report or whatever. And sometimes in between that, just a few milliseconds have passed, then the minutes might have changed to the next minute, and then it's failing. While if you run this locally, there's a very low probability of this test failing. But if you have a really huge code base with a lot of tests, then it's gonna happen once...

Hannes Lowette: If you have a couple of hundred tests like that then...

Ole Friis Østergaard: Exactly.

Hannes Lowette: ...the probability of having that happen once a week is going up, right?

Ole Friis Østergaard: Yes, exactly. And we have other interesting things like tests leaking state into other tests, which means if you run the problematic test, that's not the one failing, it's just leaking stuff. So other tests fail, which can be really hard to identify.

Hannes Lowette: Like state being left behind and files are in a database or...

Ole Friis Østergaard: Both. Whatever. In memory, for example, changing the time zone, you might want to test that in one part of your code that is robust to time zones.

Hannes Lowette: There is no proper test cleanup or the test cleanup might have failed and that leaks into other tests.

Ole Friis Østergaard: Yes. And you just add more stuff that you can change externally, but you forget to add the cleanup.

Hannes Lowette: But that is also the type of problem that takes a long time to find, right?

Ole Friis Østergaard: Yes, it is. And you have to be wary about that. So, my manager told me about a thing called the completion principle, which is that stuff that's not finished causes tension, and drains energy, while if you finish a task, it gives you energy. And we started out with several hundred flaky tests. And the original idea was basically just to delete all these flaky tests. That means we have a cleaner code base, it's not flaky, so people won't have to rerun the tests once in a while. But we started looking into some of these flakes, and once you start scratching the surface it's like opening a murder mystery book or something like that. Suddenly you want to know exactly why this flakes and...

Hannes Lowette: Crime scene investigation.

Ole Friis Østergaard: Exactly. And I mean, all technical tasks can be like that, but now we suddenly had hundreds of them. So, we did spend some time investigating and that's why we could draw this learning and make the categories. But nowadays, once we think we know mostly why our tests are failing, we try to have a more hands-off approach to it. Currently, we automatically generate pull requests just deleting this test that's suddenly flaky. Which seems kind of rough, but on the other hand, I mean...

Hannes Lowette: Does the original developer get notified like...

Ole Friis Østergaard: Yes. They do.

Hannes Lowette: ...you wrote a flaky test, we're deleting it, but if you think there's still value here, go write a better test somewhere?

Ole Friis Østergaard: Exactly. So, that's our approach. We create the pull request, the code owners are notified, and we let it stay for a couple of weeks or days. And if nothing happens, we're gonna delete that test. But the hope is, of course, that the code owners will see, okay, this... Either they agree with us, this is not worth keeping, or sometimes you actually have duplicate tests, you already test this code elsewhere.

Hannes Lowette: Especially if you write tests on different levels, you have unit tests and integration tests and so on. And certain scenarios are also kind of duplicated anyway.

Recommended talk: Learning Test-Driven Development • Saleem Siddiqui & Dave Farley • GOTO 2022

Ole Friis Østergaard: Exactly. But amazingly often, the owning team says, "Oh, yeah, that's right, I noticed this, and I'll just make a fix." And they have much more context to actually fix the test. But it's been amazing just watching people who get into action because somebody is interested in this nagging test they've noticed themselves before, but there's another team now, which is our team kind of pushing them to do it. And I think mentally that means a lot for them to actually go in and fix it.

Hannes Lowette: And where do you see this project going? Is it something that might get integrated into a GitHub action that we can run on our own test suites in the future or...?

Ole Friis Østergaard: Yes. So, right now we have an internal tool in GitHub, but there are third-party tools out there. There's a service called BuildPulse, and Datadog also has a solution. So if you're already a Datadog customer, you could implement that.

Hannes Lowette: Implement that.

Ole Friis Østergaard: So, basically, the rough idea is you send a JUnit XML file to this service and it analyzes tests and, yeah, it finds stats about which tests are flaky and has this gone better in the last period or worse. You can see the worst offenders and that kind of stuff, so you can take action that way. So, there are tools out there as well, apart from our built-in tools.

Hannes Lowette: That's nice.

Ole Friis Østergaard: Yes.

Engineering culture at GitHub

Hannes Lowette: What has the overall sentiment been with the developers whose tests get removed, right? Because you're writing a tool that tells them whether their tests are good or not and takes pretty aggressive action. Is everybody on board with the mission or do you also run into some resistance there?

Ole Friis Østergaard: At least I personally was super afraid of doing this because we're like a team of 4 people and in GitHub, I think there are around 1500 engineers. So, I felt like it was these four people pushing this strategy to everybody else. And I was sure people would be super angry that we would just delete some of that code. But it's been the opposite. I mean, people have reacted like, "Oh, great idea and nice strategy." And we haven't had any negative feedback about that.

The only thing is that people say, "Oh, that's a good idea but can't we fix it by doing this?" And then you get kind of dragged in. Normally, it's fine because the people with more context, if they take over and fix the test, that's awesome. And we can all draw learnings from that. But, yes, that's the only issue we've seen. People have been super positive, but also they don't feel good about deleting the tests. And I guess that's natural. I mean...

Hannes Lowette: But that's also the result that you want, right? It's like either they fix them or the tests get removed, but the end result is a better test suite.

Ole Friis Østergaard: Yes, exactly. Either way.

Hannes Lowette: But that speaks volumes about the engineering culture, I think then, at GitHub because actually, it's a very personal attack on your code quality. I can imagine that there are definitely engineering teams where that would not go over well.

Ole Friis Østergaard: Yes, it could be but, I mean, as a clever person once said, test code is also code so you have to watch out for that as well and refactor it and make sure it's readable. Yes. And maintain it, basically.

Hannes Lowette: Maintain it.

Ole Friis Østergaard: So that's what we do.

For the love of Commodore 64 music

Hannes Lowette: Interesting stuff. I saw that you also have another talk that I'm really looking forward to seeing tonight, which is about Commodore 64 music.

Ole Friis Østergaard: Yes.

Hannes Lowette: How did that happen?

Ole Friis Østergaard: Very good question. So, I just grew up in the times when, at least in Denmark, the Commodore 64 was the home computer people had. So, I was a paperboy back then and earned my own money to get this computer and...

Hannes Lowette: I think it's probably one of the most sold systems ever, right?

Ole Friis Østergaard: Probably, yes. But I guess people in other parts of the world had other systems like the Amstrad and...what did we talk about? The Sinclair...

Hannes Lowette: Yes, the ZX Spectrum is from...

Ole Friis Østergaard: Yes, exactly. But for me and all my friends, where I come from, it was the Commodore 64. And as time passes, you just get nostalgic. I heard a podcast the other day. It’s called Nostalgia at Rock. And I think that's very true, we just like to dabble in old stuff. But when I go back to the old games, the graphics don't meet my expectations from memory. It's like...

Hannes Lowette: You remember the experience...

Ole Friis Østergaard: Yes, exactly.

Hannes Lowette: ...not the way it looked, right?

Ole Friis Østergaard: And gradually, we've been accustomed to better and better graphics. But the music is still amazing, I think. I really like the 8-bit sound, which has also been coming into fashion again.

Hannes Lowette: That was one of the cool things, I think, about the Commodore 64 we were an IBM PC household at that time. Sound cards did not come default on that, so it could beep a little bit from the internal speaker on the mainboard, but that was it. And then these Commodores, I had a friend who had a Commodore, and they came with a proper sound card in it, at least for the time.

Ole Friis Østergaard: Yes. The Amiga also had a sound card.

Hannes Lowette: Yes. And you had people doing very amazing soundtracks within the limitations of what those things could do.

Ole Friis Østergaard: I think it's especially impressive on the Commodore 64. I mean, it's 64 kilobytes of total addressable memory. That's not a lot. And in my research for this, I looked up the original documentation for the SID chip. They have a nice table of different frequencies you can set and what that corresponds to in notes that we all know about. But they have this little note in the paper saying, "You shouldn't do this. This requires 192 bytes of memory. That's far too much. You can't spend all that memory on this table. So, do something smarter." And I'm thinking, nowadays how many people get an appraisal from their managers for saving 192 bytes?

Hannes Lowette: I've only had one project in my professional life where I had to deal with, like, really small amounts of memory, which was an embedded chip that had eight kilobytes of memory. If you try to get that chip to do something and make an HTTP request fast, that's already pushing the boundaries of what it can do.

Ole Friis Østergaard: Doing SSL...

Hannes Lowette: But it also makes you appreciate the efficiency of people who work with those kinds of devices because they go about writing code and managing memory completely differently. I come from a .NET world, I just assume that memory and disk are ubiquitous and endless.

Recommended talk: Expert Talk: What’s Next For .NET? • Hannes Lowette & Martin Thwaites • GOTO 2022

Ole Friis Østergaard: Yeah. I mean, my laptop has literally a million times the memory that a Commodore 64...

Hannes Lowette: Sixty-four gigs?

Ole Friis Østergaard: Yes, yes.

Hannes Lowette: A million? Yes, a million times.

Ole Friis Østergaard: Yes, even more, because it's Base-2.

Hannes Lowette: That's something we stopped optimizing for at some point along the way.

Ole Friis Østergaard: And it made sense. I mean, I wouldn't...

Hannes Lowette: It does, especially if you're running a lot of containers on a laptop or whatever, optimizing that will cost a lot of time, whereas getting more RAM is, compared to developer salaries it's really cheap, which is why we don't do it efficiently anymore.

Ole Friis Østergaard: Yes. But I think it's amazing thinking back, that back then the Commodore 64 was the new awesomeness. It was, "Oh, all this memory and..."

Hannes Lowette: So, what other stuff have you run into, like, diving into that Commodore 64 world? Because you're looking at it from a different angle now than you were looking at it as a kid playing the games, right?

Ole Friis Østergaard: It turns out it's quite interesting when... So, there's this online library of Commodore 64 music called the High Voltage SID Collection. The SID was the name of the sound chip back then. And you can download a big archive and it's basically all the games you've ever had. The music is part of this library. And you can play that in various players like VLC and other online players, and I don't know what.

Then I thought, "I want to be able to play this and I want to export it into another format," MIDI format, for example, because while I like the 8-bit sound, I would also like to get kind of the information of the music like in MIDI format so I could... Nowadays, people practice the Mario tune on the keyboard and other game tunes, I would like to play some of these games, and the tunes from them.

So I thought, "How hard can it be? It's in one format and I want to convert it to another format." But it turns out this SID format includes executable code. So, what you do is you actually run a Commodore 64 emulator, which pokes the addresses in the SID chip to make it play the sounds just like on a real Commodore 64. So, you can take these SID files and actually run them on a Commodore 64.

Hannes Lowette: Does that mean that the SID files also has control structures in them?

Ole Friis Østergaard: Yes. I mean, it's just machine code. So that was the biggest hurdle, getting through the...

Hannes Lowette: Like, getting that to MIDI, which is just like a timeline space...

Ole Friis Østergaard: Yes, exactly.

Hannes Lowette: ...a sequence of notes and sounds, right? And then...

Ole Friis Østergaard: Yes. But it makes sense because back then musicians were also programmers. When you had to write a soundtrack for a game, you also had to supply the machine code for the game. So, on every frame...so 50 times a second in Europe and 60 times a second in the United States and Japan, it would get a callback and do something.

Hannes Lowette: So that kept the whole thing in sync with the refresh rates on the screens.

Ole Friis Østergaard: This also means if you played the same game in the United States, the music would be a tiny bit faster.

Hannes Lowette: All right. 

Ole Friis Østergaard: And also, the pitch would change a little bit because the frequency generator would be changed.

Hannes Lowette: Is that something you can now just solve in the emulators as well? Can you emulate it in EU and in the U.S.?

Ole Friis Østergaard: Yes, that's not a big problem. That's basically changing the frequency and calling it more or less often. So that's been quite fun, kind of doing parts of the Commodore 64 emulator just to get out these sounds from the old games. It's been super fun. A little more digging than I would have expected at first, but super fun.

Hannes Lowette: A little bit of technological archaeology.

Ole Friis Østergaard: Yes, exactly.

Simplicity in software now, then, and in the future

Hannes Lowette: Looking at what people were doing 30, 40 years ago, and how they...the problems that they had to deal with and look at maybe the simplicity of what they were doing, right?

Ole Friis Østergaard: Yes.

Hannes Lowette: What do you think in the future is gonna happen if we look at our...if people did the same for the code that we write today?

Ole Friis Østergaard: Yes, I don't know. Yeah. People looking back to 2022 like, oh, the simple times we only use seven different programming languages for this solution and only three different cloud databases, and I don't know what. I hope we'll keep it simple in the next 20 years. But yeah, it's quite amazing looking back at what used to be the state of the art...

Hannes Lowette: That goes against the common trend at the moment, keeping things simple, right?

Ole Friis Østergaard: Yes. True.

Hannes Lowette: We are over-complicating our lives with a lot of things, in a lot of ways. I mainly blame conference speakers for that and...maybe myself included. If you tell a story about a certain technology, it solves a certain problem, and people will want to use your solution, but they might not have the same problem. So, they might look at Netflix and Spotify and then how they do things, but they don't have the problems at scale.

Ole Friis Østergaard: Yes, exactly. Simon Brown addressed this very succinctly yesterday just saying, "Don't."

Recommended talk: Five Things Every Developer Should Know about Software Architecture • Simon Brown • GOTO 2020

Hannes Lowette: Yes. Keep it as simple as possible. I think...I still... I think my Twitter, like, bio says something that I'm a monolith advocate. And I still stand by it...

Ole Friis Østergaard: That you're a what?

Hannes Lowette: A monolith advocate.

Ole Friis Østergaard: Yes.

Hannes Lowette: And so I would say, like, if you're building something new and you don't even know where it's going to go, like build something simple that you can push out quick. That doesn't mean writing crap code, I mean, that's a completely different thing. But, like, construct the code well so that you already implement the dotted lines, like we might be able to cut here in the future, but we're still gonna deploy it as one unit of deployment and ship it and see what users do with it.

Ole Friis Østergaard: Exactly. Yeah, I'm also a big fan of that. I like monoliths as well. It's like...for some reason a lot of people like microservices, but I've never really understood why. I mean, if you have a mature system and you have something that's really separate from everything else, well, sure, split it out. But people who do that from the beginning they're just gonna have a lot of issues refactoring across the network, basically.

Hannes Lowette: Like moving something around becomes a lot harder, right, if you have separate services. Whereas, if it's one code base, you just move.

Ole Friis Østergaard: And another point people forget sometimes is that you can deploy a monolith as microservices. You can just enable certain endpoints in one deployment. You can scale that up and down as you want to, depending on the traffic but you...

Hannes Lowette: That's what Stack Overflow did for a long time, right?

Ole Friis Østergaard: Nice. I don't know.

Hannes Lowette: Yes. They called it a scalable monolith or something like that, where basically every service was the same code base, but, like, in orchestration they changed what all the units were doing. It was very interesting. They have a very nice engineering blog about how they did things. And they basically were on ASP.NET with SQL Server for a very, very, very long time, like, really boring technology, pushing that to the absolute limits of what's possible. I think we should take inspiration from all sides of those stories and look at what our problems are before we start applying these things.

Ole Friis Østergaard: Yes, of course, you can separate concerns within a monolith. I mean, programming languages have had those facilities for many, many years.

Hannes Lowette: Of course. Like patterns, like solid, and maybe mediator patterns where you separate some of the services inside the same code base. 

GitHub's monolith & open-source

Hannes Lowette: You said that you were working on the monolith at GitHub, what does that look like? Is that something you can talk about?

Ole Friis Østergaard: Yes. In some parts, at least. It's basically a Rails application. So, we do have microservices as well in GitHub, but mostly for big issues. It's not like every time we do a new functionality it has to go to a microservice, but sometimes it makes sense. The core functionality of GitHub is in this monolith, which is Rails and we keep on the edge of Rails, which means that we can contribute back to Rails. I mean, the monolith in GitHub is not open source, but we still contribute to a lot of open-source projects. We have the core contributors for both Ruby and Rails in GitHub.

Hannes Lowette: Oh, that's nice.

Ole Friis Østergaard: Yes. So, we have the people we can consult with if we have issues with Rails, but mostly all people are super happy that we are ahead of the curve there. I think some years ago it's been discussed publicly as well we were very much behind on an unsupported version of Rails, but then some people said this is not the way to do it and it got upgraded and now we are continually...

Hannes Lowette: And after that, those people moved to the GitHub team or they were already there?

Ole Friis Østergaard: I think they were already there. That's before my time so I don't know any specifics, but there are people who continuously upgrade Rails and keep us...

Hannes Lowette: Is that something you think GitHub is doing well, like contributing back not only to Rails but to a lot of open sources?

Ole Friis Østergaard: Yes, definitely. I think so.

Hannes Lowette: Because that's, I think, one of the things that we see in the open source world is there's not a lot of contributions coming back both in terms of funding and in terms of codes to a lot of open source maintainers. They just get a lot of complaints when stuff doesn't do what people expect it to do, right?

Ole Friis Østergaard: There have been recent incidents about what can an open-source maintainer do with his or her own projects, and what can enterprises expect when they just grab some free software. So, there are a lot of good discussions there, and I don't have a good response to that.

Hannes Lowette: No. But if we wanna keep open source maintainable I think as an industry we're gonna have to rethink how we work with that. I think a lot of companies could do more instead of just being the consumer but also giving back. So, I'm happy to hear that at GitHub at least there is some of that going on, which is great. It makes open source a little bit more maintainable.

Ole Friis Østergaard: Yeah. Yeah, it's interesting because I think a company just grabbing a free product, there are limits to what they can expect in support from taking a free product, of course. On the other hand, if you're an open-source mix and you put stuff out there for free, there are limits to what you can expect from companies using your product. So, it's...

Hannes Lowette: Both of them.

Ole Friis Østergaard: Yes. In both ways, it's complex because we all want to do good, but we have limited resources and limited time.

Hannes Lowette: And limited time. Yeah. And if most of your experiences maintaining something are toxic because people are complaining, but they're not paying you for it...

Ole Friis Østergaard: Yes, exactly.

Hannes Lowette: ...those projects might get orphaned, right?

Ole Friis Østergaard: Often you think, "Oh, this open source project, a lot of companies are using it, it must be well supported," but it turns out it's a random...

Hannes Lowette: It's like one person...

Ole Friis Østergaard: Yes.

Hannes Lowette: ...and they might drop off the grid.

Ole Friis Østergaard: Yes, exactly. And what to do then?

Hannes Lowette: That's true. I think that pretty much brings us back to where we started, at GitHub. I wanna thank you so much for this talk today.

Ole Friis Østergaard: It's been a pleasure.

Hannes Lowette: A lot of interesting stuff there. And I look forward to the Commodore 64 talk.

Ole Friis Østergaard: So am I. I mean, normally, I'm super nervous before a talk, but this, I'm just really looking forward to it. It's going to be nice and lots of music and... You won't learn a single useful thing, I promise you that.

Hannes Lowette: That is okay. I always love those talks where we are taken out of our comfort zone. I've seen talks that had nothing to do with software engineering, and that had zero applicable knowledge, and I enjoyed every minute of them just because the speaker was passionate about what they were speaking about. I think that's a very great way to end GOTO Aarhus tonight...