Erlang: Solving Problems at Scale for 30+ Years

#erlang #elixir #rabbitmq #GOTO Unscripted

There is an entire language ecosystem behind Erlang programming, and Francesco Cesarini, founder and technical director at Erlang Solutions, has been using it to solve problems at scale for more than 30 years. Find out how you can leverage Erlang to your own benefit.


This episode was made possible with the support of Erlang Solutions, the world-leading expert in creating massively scalable distributed systems. Erlang Solutions works with many of the world's leading companies to ensure their digital architecture has minimal downtime, maximum security, and the ability to change live in-production code at rapid speed. Erlang Solutions are the proud organizers of Code Sync conferences including ElixirConf EU, Code BEAM Europe, and Code BEAM America. Code Sync conferences are a fantastic way to learn about how Erlang, Elixir, or other BEAM-based technologies can help you.

Preben Thorø: Maybe even before I introduce you, we're using Riverside here. Is Riverside running on Erlang?

Francesco Cesarini: I'm not aware of [that]. But I think there are quite a few streaming platforms and frameworks written in Erlang, and Elixir, which can be used... I think we were working with ooVoo which was number two after Skype many years ago. All of the connections of the video stream were set up by Erlang. Cisco and Ericsson's video systems are all Erlang-based. There is a membrane which is a video streaming framework, which can be integrated and is written in Elixir. There are many, many, many others out there as well. So, at the end of the day, if you think video streaming, all you're doing is instead of connecting phone calls, you're connecting video streams. So, you know, the business logic is very much the same. 

Preben Thorø: That makes sense. Before we get too far here, may I ask you to introduce yourself?

Francesco Cesarini: So I'm Francesco Cesarini, the founder and technical director at Erlang Solutions. I've been working with Erlang since the '90s. And I'm very fortunate to have seen a programming language becomes an ecosystem of languages. If you would have asked me back in '95, if I was still working with Erlang in 2022, I would have said probably not, but I still am, and we're still kind of solving problems that were relevant then and probably even more relevant today.

Erlang solving problems since 1995

Preben Thorø: Yes. Well, welcome to our Little Unscripted series. You said '95, the mid-90s, Erlang is way older than that, isn't it?

Francesco Cesarini: Well, Erlang, the language itself, theystarted working on in the late '80s. What the computer science laboratory was trying to do is figure out, how do we program the next generation of telecom switches? It took them a few years. I think the first real fast virtual machine was ready in '91. Then in 1992 they just started developing the first product, which was then released in '94. So I'd say '94, '95 is when it was ready to be used outside of the lab and it started becoming mainstream and used within some of the major projects within Ericsson.

Preben Thorø: Ok. I thought it started in the '80s, but I was wrong. Is it a coincidence that Erlang I suppose, has something to do with the Ericsson language… is it a pure coincidence that there was a Danish professor, I think his name was Agner Krarup Erlang or something like that, who invented some queueing theory? Is there a connection there?

Francesco Cesarini: There's a connection. Erlang was named after Agner Krarup Erlang, the Danish mathematician. So for those of you who don't know him, he was a founder of tele theory, the telephony theory. He created the Erlang formula, which is the formula used to figure out if all of the lines within a particular call center are busy at any point in time. But as Ericsson management was paying for the development of Erlang, they made them believe that it was named after Ericsson. So EriLang, Erlang, you know. So, management thinks it was named after Ericsson. Those on the inside know it was named after the mathematician.

The deep secrets of the Erlang language

Preben Thorø: Interesting. Interesting. So Erlang, Ericsson language, that's more like marketing. Interesting. How does it work? Actually, what are the deep secrets of the language?

Francesco Cesarini: Well Erlang itself is just a programming language. I think there are three things, which when put together give you the secret sauce. One of them is the BEAM virtual machine. 

It's a virtual machine that is highly optimized for large-scale concurrency. It's been optimized to scale multiple architectures. And recently they've added that just in time the JIT compiler. So that's one-third of the power. The other third is something we call OTP. OTP is a middleware way of abstracting from the concurrency models, which increases the programmer's productivity. But also on top of increasing the programming productivity, they hide all of the tricky parts of dealing with full tolerance and with concurrency. So by using OTP and by using the programming principles, your systems will scale and by default, be resilient.

Then the third is I would not even say Erlang itself, but the semantics of the Erlang programming language. These are semantics, which most languages running on the BEAM today, including Elixir, by default, inherit. 

Put the three together, that's when you get the real power of the ecosystem. And just to quote Joe Armstrong, you can copy the libraries — which is what's happened with OTP on the JVM, .net, and many other ecosystems — so you can copy the libraries, but if it doesn't run on the BEAM, you cannot emulate the semantics. It's the three put together which give you the full power. And the semantics of the language have a very tight one-to-one mapping with the operations of the virtual machine. Then OTP is built on top of that to facilitate and hide complexity from the programmer.

The BEAM Languages

Preben Thorø: So the idea that Elixir is the new generation of Erlang, that's not true. It's another language running on the same VM.

Francesco Cesarini: That is correct. Well, Elixir compiles to Erlang. That was a choice, I think, because Jose Valim did consciously, to be able to utilize all of the tooling and libraries which existed in the Erlang ecosystem when he created Elixir. And so, I would almost call Elixir a new version of Erlang with a slightly different syntax, different tooling and a different development approach to what we're used to in the Erlang world. And by doing this, by improving the tooling, by providing a framework, which was specific to certain types of problems, he opened the power of Erlang to a wide range of developers, for which it wouldn't have been accessible otherwise.

Preben Thorø: Yes, that's true because I have a feeling that Elixir, as you say, it's addressing a completely new audience as compared to let's call it the original Erlang.

Francesco Cesarini: Correct. Absolutely. You're perfectly right there. He did a fantastic job. I always ask programming language inventors, why did you invent language X, Y, or Z? And when asked that question, his answer was, I wanted to open up the power of Erlang, and Erlang virtual machine, (so the BEAM) to a much larger, a wider group of programmers. And more specifically, I think the first time I asked him that question, his focus was on web developers. So he was asking himself how to bring the power of Erlang to the web development world?

Web developers and Erlang developers, it's telecom versus web, it's two completely different problems we were solving. These two different problems require completely different approaches. They require different toolings, different libraries, different frameworks. That also explains why our attempts of trying to bring Erlang to the web failed back in the mid-2000s. You know, there were a lot of web frameworks written in Erlang but none of them addressed the requirements of the web developers at the time. Instead, what they did is they address the requirements of those developing telco infrastructure.

Fault tolerance in OTP

Preben Thorø: How does fault tolerance work in OTP?

Francesco Cesarini: Yes, so more than OTP, I think the fault tolerance is a very simple notion that you know, you've got processes and processes do not share state. They do not share memory. So what that means is you can have many processes running at the same time. If something goes wrong in a process, so if there's a bug in the code or the process is running or the data gets corrupted, you just terminate that particular process. By terminating that process, all the other processes around it, which are not dependent on it, are not affected.

So imagine that you've got thousands of phone calls going through your system, each phone call is a process. And if something goes wrong with one particular phone call, you lose that phone call, you lose that connection, the other phone calls aren't affected. So that's a core principle of processes and processes not sharing state.

We then take these processes and re-group them into what we call supervision trees. A supervision tree is a process whose only task is to supervise other processes. When supervising these other processes, if a process fails, the supervisor is immediately notified of it and can react. It can decide how to go in and deal with that failure. Could we try to restart that process and reconnect that phone call or do we just ignore it, or are all of the other processes somehow... I mean, maybe it was a group call. 

It was the host process that terminated, and it goes in and decides maybe we should terminate all of the other connections and then restart them. By doing that, what you're doing is you're removing failure and error handling from the hands of the programmer, and you're generalizing it.

So, you might have heard the whole let it crash approach. That's what we refer to in the Erlang world. When we let processes crash, we don't ignore failure or we encourage it. We just handle these errors in a slightly different way. And by handling them more generically, that's how we create this fault tolerance. We isolate failure and then we escalate it only when necessary, controllingl it centrally, in a generic way. This greatly simplifies the code base comparisons with for example C++ programs.There are research projects where they implemented the same solution in both Erlang in C++ and analyzed what each line of code did. Well, in the C++ codebase, about 25 percent of the codebase was error handling and fault tolerance. The equivalent in the Erlang codebase was about 1 percent. So, there's a huge difference. So just by going down the Erlang route, your system becomes fully tolerant, but you'll also reduce your codebase by around 25 percent. I don't know if that makes sense but...

Preben Thorø: Well, it does if that's like exception handling is just being propagated out to the bookkeeper.

Francesco Cesarini: That is correct. Exactly. So we just pushed all of the exception handlings to the supervisor. The supervisor handles it in a standardized way, instead of letting the programmer deal with exceptions because, again, if you have an exception, you don't know why you got that exception, so how do you deal with it? You don't know how to deal with it because if you knew, it wouldn't be there in the first place. So by generalizing how exception handling is managed,you get rid of exceptions or they become a very, very rare occurrence.

Preben Thorø: Akka, like Akka.NET and other frameworks out are.

Francesco Cesarini: Yes.

Preben Thorø: They're very much inspired by this, right? That is like coming directly out of the Erlang world even.

Francesco Cesarini: That is correct. I mean, Jonas Bonér...

Preben Thorø: So that is replicating... Go ahead.

Francesco Cesarini: That is correct. So Jonas Bonér started implementing AKKA when he was working as a consultant on the customer project, and the customer wouldn't allow him to use Erlang He got so frustrated that he took OTP and the whole error handling in OTP and started porting it to the JVM. I think he did an amazing job at bringing it to the JVM. It's not for the faint of heart because the JVM wasn't built for fault tolerance. The JVM was built for parallelism. And what he did is he bought lightweight concurrency in green threads, which used to exist in Java, but got removed early on.

When I was reading your original Java white paper, I had a sense of deja vu, which was a virtual machine with,a concurrency model, built-in memory management and a garbage collector. So this was in the JVM, and I was working with the Erlang virtual machine at the time. But I think there's still a big difference between the Java Virtual Machine and the BEAM today, because to bring AKKA to the JVM, you have to emulate a lot of the semantics and a lot of the functionality which exists in the BEAM, functionality which the BEAM is highly optimized for, which doesn't exist in the JVM.

Preben Thorø: Yes. All the protection spaces around processes need to be replicated into a threat model instead.

Francesco Cesarini: Exactly. So he had to create an abstraction layer. And even there, I he wasn't able to fully recreate an emulator semantics, because the AKKA actors have to yield in the code. You're putting that in the hands of the programmers versus Erlang, where your processes are given a certain number of operations and are allowed to execute, after which they automatically get suspended and the next process gets to execute. So you run the risk of an actor in AKKA starving all of the other processes, all of the other actors. And that's not a risk you have on the BEAM, because they've removed that from the hands of the program. The program doesn't even know how the processes are being scheduled and managed. They shouldn't know. They should just programthinking concurrently and the rest is all abstracted away from them.

Preben Thorø: So now, it becomes really nerdy and really interesting because these processes, yes, in the BEAM, are they process supported by the hardware by the CPU or is it some middle thing between look, a real process, and a thread?

Francesco Cesarini: Yes. So what happens is, when you start the BEAM, it will start a scheduler in a separate thread for every core. So assume you're running on a quad-core machine, the BEAM will start four threads. Each thread will run a scheduler. Each scheduler will have its fair share of processes. So you're assuming you're running you've got 400 processes, each schedule will have about 100 processes each. And then there's migration logic, which ensures that the different schedulers remain fairly balanced. Your processes might be migrated from one scheduler to another. From one thread to another, if most of the threads on a particular scheduler terminate. 

Erlang on iOS

Preben Thorø: I don't know if it still exists, but there used to be a sub-project, some library framework called Lua. Does that still exist? This is something for the audience here. That back then at least allowed you to run Erlang on an iPhone. Does that still exist?

Francesco Cesarini: So yes, they did exist. It never really made it into production. I'm not aware of any Erlang or Elixir development on iPhones [today]. I think you're right, that's a problem which I think was handled very well by other technologies. For the same reason as to why we failed to get Erlang into the web development space, at least us historically, we were all server-side back-end systems. Those are the types of problems we solved. So, even though you could developing Erlang on phones, the toolings and frameworks you needare very, very different.

Preben Thorø: I guess it goes very much against the idea of having a phone but that's another discussion. It's like a client and another server. But anyway, that's great.

Francesco Cesarini: We're seeing a lot happening now for Elixir making its way into an embedded space through your nerves,they're got embedded graphical packages which can run on handheld computers. And also to the point where I think we were running Erlang, and controlling the can bus with Erlang in cars, probably 15 years ago, yeah, almost 15 years ago. But it's only now making it mainstream, now that we're collecting more and more data in the cars themselves, as well as IoT devices themselves. And it's not becoming feasible anymore to go out and push this data to the Edge network and the cloud because of the large volumes of it. So you start analyzing it in the devices themselves, or some cases also in the Edge network, where feasible.

So we're seeing Erlang and Elixir being used in those spaces more and more these days. And I think with the work which has been done in the JIT compiler, which has had a huge performance increases, and the work which has happened with Nx, numerical Elixir, which is enabling the whole Axon framework, which is very similar to PyTorch, will, I suspect you'll be seeing machine learning, moving on to the devices, close to the data, in cars, in IoT devices, and to a certain degree also in the radio base stations and Edge networks. So I think it's still early days but there are a lot of exciting things happening in that space. And all the components are being put in place for it to become viable and an alternative approach in technology for machine learning.

Erlang’s recent evolution

Preben Thorø: Yes, how much is the Erlang universe evolving right now? If we could isolate plain Erlang from Elixir, how much does the original plain thing evolve for the time being?

Francesco Cesarini: Very little. Very, very little. So, in Erlang itself, at least the programming language, there are very few changes happening. Most of the work I think is done around the libraries, the frameworks, but also on the BEAM virtual machine. That's where a lot of the effort is going today.

Preben Thorø: Yes

Francesco Cesarini: But the language itself, whenErlang was released as open source, Ericsson became the benevolent dictator. They've always been very conservative about introducing new changes for two reasons, A, they've got millions and millions of lines of code in production. So, any backward-compatible changes would have a huge impact on all the code they've got in production. And B, if they start pushing out new features, they need to support and maintain them. So yes, they're very, very careful over what gets released.

But a lot of the work and a lot of the focus is onmaking the BEAM scale on multi-core architectures, making it fast, making it lock-free. You're seeing it with every release, all you need to do is go and recompile your code and upgrade the VM In some cases, you don't even need to recompile your code. You just need to restart it. But back in the days, we used to joke that if your program wasn't fast enough, wait 15 months and then you're buying a new computer is gonna run twice as fast. Nowdays, it will run faster the more cores you throw at the problem, that’s all you need to do these days,s just wait for a new version of the BEAM and recompile your Erlang code and it's gonna run faster.

Preben Thorø: Thank you. It's been fascinating talking about this and I think we could go on all day.

Recommended talks

Computer Science - A Guide for the Perplexed • Joe Armstrong • GOTO 2018

The Soul of Erlang and Elixir • Saša Jurić • GOTO 2019

The Do's and Don'ts of Error Handling • Joe Armstrong • GOTO 2018

Reaping the Benefits of Elixir: How to Get Started • Saša Jurić & Erik Schön • GOTO 2020

Building a Blockchain in Erlang • Ulf Wiger • GOTO 2019

Erlang, Elixir, Blockchain & Serverless… What?! • Ulf Wiger, Saša Jurić & Eric Johnson • GOTO 2019

Related Posts