Genetic Algorithms in Elixir

Sean Moriarity | Gotopia Bookclub Episode • September 2023

Delve into the realm of genetic algorithms and its versatility across diverse domains, from finance to artificial intelligence. In the latest episode of GOTO Book Club, author Sean Moriarity spoke to Bruce Tate about the fundamentals, and intricacies of genetic algorithms and their problem-solving prowess. The book's approach is both pragmatic and innovative, harnessing Elixir's features to craft efficient and idiomatic genetic algorithms. What sets this resource apart is its accessibility, inviting readers into the world of genetic algorithms using a familiar language. Master the complete lifecycle of problem-solving with genetic algorithms and learn how Elixir’s simple syntax and functional style make for the creation of idiomatic solutions to optimization problems with genetic algorithms. This is an eye-opening opportunity to explore a potent field without the need to grapple with unfamiliar languages or frameworks.

Share on:
linkedin facebook
Copied!

Transcript

Delve into the realm of genetic algorithms and its versatility across diverse domains, from finance to artificial intelligence. In the latest episode of GOTO Book Club, author Sean Moriarity spoke to Bruce Tate about the fundamentals, and intricacies of genetic algorithms and their problem-solving prowess. The book's approach is both pragmatic and innovative, harnessing Elixir's features to craft efficient and idiomatic genetic algorithms.  What sets this resource apart is its accessibility, inviting readers into the world of genetic algorithms using a familiar language. Master the complete lifecycle of problem-solving with genetic algorithms and learn how Elixir’s simple syntax and functional style make for the creation of idiomatic solutions to optimization problems with genetic algorithms. This is an eye-opening opportunity to explore a potent field without the need to grapple with unfamiliar languages or frameworks.

Is Elixir a good language for machine learning?

Bruce Tate: Hi, everybody. This is Bruce Tate for GOTO, and I'm with Groxio Learning, but I have a special guest for you today. I've got Sean Moriarity , and he's the creator of the Axon project and the co-creator of the Nx project as well. This is all things machine learning in Elixir. You wanna tell the audience about yourself, Sean Moriarity?

Sean Moriarity: Sure. So, my name is Sean Moriarity. Originally from Philadelphia, Pennsylvania. I am the author of "Genetic Algorithms in Elixir," and I also have a new publication out called "Machine Learning in Elixir." Worked in the machine learning ecosystem in Elixir for quite some time now, and I'm excited to talk a little bit about my new project.

Bruce Tate: This is an interesting story because, as many of you might know, Elixir has not always been known as a great language for machine learning. I think that you might have knocked José, the creator of Elixir, out of his chair when you wrote the first book about genetic algorithms and machine learning. Could you tell us a little bit about that?

Sean Moriarity: I was interested in genetic algorithms and evolutionary algorithms in college, mainly for applications in, like, sports betting in particular, because sports betting is very similar to, like, financial theory. Investing. So, portfolio theory. There are a lot of tie-ins with using genetic algorithms and evolutionary algorithms for optimizing a portfolio for risk and reward. I was interested in genetic algorithms, and evolutionary algorithms, and I was also really interested in Elixir, but Elixir was not a good language for doing any of this stuff.

The BEAM, the virtual machine that Elixir runs on, the Erlang virtual machine, is not good for numerical computations. It's well-suited for, concurrency, and building fault-tolerant applications, but numerical applications were not something that it was very good for. But I decided I was just gonna write genetic algorithms in Elixir anyway. And so, I created a project for creating genetic algorithms and solving some toy problems, basically, with genetic algorithms. I thought it was something that other people might also be interested in, so I kind of threw a Hail Mary over to the Pragmatic Bookshelf, who's the publisher behind the Pragmatic Programmer, about a book pitch, basically, for genetic algorithms in Elixir, and miraculously, they ended up accepting it.

The book came out in, I think, February of 2020, or maybe it might've been October of 2020. And after that, I got in touch with José Valim, who's the creator of Elixir, and he was like, "Hey, so, I thought this was pretty interesting. Do you wanna start working on machine learning projects in Elixir?" And I thought that would be pretty awesome. So we started, and kicked off that project around the same time, in October 2020. And three years on now, we've got everything from deep learning to traditional machine learning. You can use pre-trained transformers on a lot of pre-trained models, just directly from Elixir itself. And our performance is pretty competitive with the Python ecosystem. And we have some pretty good abstractions for deploying machine-learning models as well.

Bruce Tate: I wanna slow you down a little bit because we've gone over some pretty interesting things there. So, I've gotta tell you, I've gotta make an admission for the first time on this podcast. I saw that initial book proposal come through, and I said, "This is the wrong thing for Elixir." I was the editor, or kind of the line editor of the Elixir line of books at the time. And I said, "This is really interesting stuff, but this is more interesting from an academic perspective." So I said, "You know, I don't wanna kill this, but I also don't wanna kind of give people false hope," right? So, what I did is I said, "Okay, this is interesting academically, so does anybody else wanna take a shot at this?" And another editor stepped up and, you know, the rest is history.

Numerical Elixir (Nx)

Bruce Tate: But one interesting thing is that there's a pretty blinding moment in time, that just went very quickly, where Elixir was not a good language for these kinds of applications, to the time that Elixir became that. And that's what Nx is about. Could you tell us a little bit about what Nx does, before we get into Axon and the machine learning stack?

Sean Moriarity: Nx stands for Numerical Elixir, which is essentially the foundation for the Elixir machine learning ecosystem. The Nx project started... So, early on, we were trying to decide on how we were gonna design the libraries. Elixir and Erlang, have this concept of native-implemented functions. They're essentially just a way you can write C bindings to some native library, and then get it to work with the Erlang virtual machine. One of the paths we could have taken was saying, "Okay, well these libraries, like TensorFlow and PyTorch, offer C and C++ libraries. We can just essentially build directly on top of that." And it would've been a pretty quick win. We could've gotten something up and running very quickly.

The problem with marrying yourself to another ecosystem is you are essentially blocked anytime they have an issue. And as a smaller consumer of their native libraries, you might not necessarily have the biggest influence over the bugs they're gonna fix and the things they're gonna prioritize. So we made a very intentional decision to not build the library dependent on other ecosystems upstream like that. So, Nx, for those that are familiar with the Python ecosystem, is very, very similar to NumPy, but Nx has the additional abstraction of, like, pluggable compilers and backends, which means that Nx itself just implements a behavior which is, I guess, similar to, like, an abstract class, for those coming from object-oriented programming. Essentially, it's just a contract for people to implement their implementations of some of the numerical routines that we have in Nx.

For example, Nx has something like Nx.cosine, or Nx.cos in this case, and library backend and compiler implementers can implement their versions of cosine for targeted hardware, or specialized routines that are accelerated in some way. The first compiler we implemented was XLA, which is Google's accelerated linear algebra. It's a machine learning compiler for taking these numerical programs and compiling them to the CPU, the GPU, and Google's TPUs, these accelerators. So, Nx serves as the foundation for our entire ecosystem. It also implements automatic differentiation, which is important for implementing some of the optimization routines used in Axon, which is a deep-learning library, but it sounds like we're gonna get to that a little later. So, that's pretty much it about Nx.

Bruce Tate: I love the idea that since you're snapping out this whole numerical model, and then, kind of, the whole way that you think about storing tensors, right? These are concepts that are just kind of ripped out of the language and snapped in. And to make that surfaceable, you did something fairly brilliant in there, and that's taking a traditional function definition, and then offering an alternative implementation. Can you tell us just a little bit about defn?

Sean Moriarity: Nx introduces this idea of numerical definitions. So, in Elixir, functions are declared with the def keyword. In Nx, you get something called defn. So, it's just literally, like, "def" and then an n at the end. That's a numerical definition. These numerical definitions support a tiny subset of the Elixir programming language, so, it's a little more strict. One of the things that we noticed, we tried to model a lot of the library off of JAX, which, you know, it's a library. It offers the NumPy API, jax.numpy, but it supports just-in-time compilation.

As a part of that just-in-time compilation, you get some interesting behavior. So, JAX functions, JAX jitted functions, they have to be completely pure. So, any side effects that happen, only really happen once in these programs. And, you know, they have to have some interesting, like, static shape and type constraints. And it was difficult for people transitioning to write JAX to get around, like, hey, Python has these flexible abstractions. Like, people like writing Python because it's flexible, and JAX JIT was not flexible. So we made kind of an intentional decision to have a completely separate abstraction from traditional Elixir functions because we wanted people to understand that when you write a numerical definition, it's gonna get JIT compiled. You know, anything that's, like, side-effecting is not gonna work very well with what you wanna do. And so, we wanted to keep that abstraction completely separate from the core language.

When you write a numerical definition, it essentially gets immediately compiled, and targeted to whatever the compiler you choose is. So, in this case, if you're using something like EXLA, which uses Google's XLA, your numerical definition will get compiled to the CPU or the GPU, depending on the client that you choose, for the inputs that you give it. And it's a very interesting abstraction because it's something that just extends the language. Like, it didn't require any changes to Elixir upstream. Elixir itself is just a flexible language, with meta-programming and some of the other things you can do. So we didn't have to make any changes to Elixir upstream. It's just something that we were able to natively support, given what Elixir has.

Bruce Tate: Which is beautiful, right? It's like you can snap these guardrails right onto the system, and everything just works, right? I hope that the listeners are starting to get a sense that rather than building something quick and dirty and making a library-type decision, this is something a little bit more related to the infrastructure, in building up layer by layer, slowly. 

Recommended talk: The Soul of Erlang and Elixir • Sasa Juric • GOTO 2019

The potential of Elixir and machine learning together

Bruce Tate: Let's talk a little bit about the next layer, the Axon layer. So, what is Axon?

Sean Moriarity: After the Nx project started to show some initial successes, we wanted to get, like, real applications of what we were building. The first, I would say, real concrete application was neural networks because deep learning and neural networks today are almost synonymous with machine learning. People say machine learning, and, like, 90% of the time, people are talking about deep learning, just because of the popularity of large language models and some of these other pre-trained models for computer vision and natural language processing.

While there is traditional machine learning, and we do support traditional machine learning, we targeted neural networks, because, at the time, they were very, very popular, and that was the first thing. We wanted to prove we were able to do that, because if we were able to do that, then we were essentially able to do anything we wanted to. So, Axon is a library for creating and training neural networks in Elixir. It has a very similar API to Keras, TensorFlow Keras, as well as some other ideas stolen from the PyTorch ecosystem. So, I am a, I would say, machine learning framework junkie and I spend a lot of time just reading about different approaches to solving machine learning problems, and different, like, library design decisions that the creators of PyTorch and Keras and some of the projects in those ecosystems have made.

Looking at the complaints of people trying out different things and seeing what works. And so, Axon borrows a lot of ideas from these other ecosystems, to make it easy to create composable neural networks and then also train these neural networks. And it's a fundamentally different approach than what you see in the Python ecosystem, just completely out of necessity. So, Python supports those object-oriented abstractions, and Axon, being built on top of a functional programming language, has to build on functional constructs. So, it's a little difficult, in terms of, like, comparing apples to apples, you know, something that's implemented in Keras and something that's implemented in Axon, but it is very similar. It will feel very similar to someone coming from another ecosystem directly into the Elixir ecosystem.

Bruce Tate: This is cool, right? So, one of the things that I've noticed is that by slowing down, we're hitting this point where everything seems to be happening at once. It seems like having the Elixir infrastructure underneath, by slowing down and getting the abstractions right, all those things can be brought to bear on the overall project. So, could you talk about the impact of Axon and Nx on the Elixir ecosystem?

Sean Moriarity: First I wanna hit on the point that, like, starting slowly, how fundamentally important that was to what we wanted to do because I think this is happening a lot in the Python ecosystem now. PyTorch, PyTorch 2.0, has made a huge, like, effort to rewrite a lot of its internals in Python, and there are a lot of reasons for that decision. But, you know, one of them is just, like, from a maintainability perspective. Python, for better or worse, is much more approachable than C++ as, like, a, you know, language for writing compilers. And for someone who's, you know, just coming into the PyTorch ecosystem, it's a lot easier for them to just look at some Python code that maybe implements, like, some of their backends for writing these numerical programs than trying to figure out, like, decipher C++. W

We had that kind of same inclination from the beginning, that we should keep as much of what we were building in Elixir as possible because it's more maintainable, it's more approachable, and it was gonna just be easier for us to work with, and to work fast and build on top of, because neither myself nor José nor any of the maintainers who have come on after the fact of the original project, like Paolo and Jonathan, none of us are C++ people. We all are Elixir programmers.

And so, keeping everything in Elixir allows us to work a lot faster than we traditionally would have because we're a very small team of people writing this. It's four, or five people who are the core maintainers of the Nx project. We're able to implement features significantly faster, just because we're working in Elixir, and, you know, only reach into C++ and C and Rust and whatever when it's necessary. Now, the overall impact that Nx and Axon have had on the Elixir ecosystem, I would say it's been pretty large, especially in the amount of time that the projects have been out there. So, we are only around three years into these projects, and there are already some successful applications of these libraries being used in production.

People I think are excited about the prospect of using machine learning in Elixir, especially for companies that are using Elixir for their actual, like, deployment environment, their backend services, and stuff. It's a lot, I would say, easier for someone like them to maybe get thrown a model from the Python ecosystem from their data science or machine learning team, and then to implement, you know, essentially an inference pipeline directly in Elixir, without having to call out to another service or build on top of some complex, like, microservices stack. So, it has had a pretty large impact, and I'm excited to see where the ecosystem grows.

Bruce Tate: We're starting to see all these little pop-up projects, right? And that's always an indication that you're doing something well on the abstractions end. Right? So, we've talked a little bit about the impact of machine learning on Elixir, and the idea that this is unexpected, and pretty exciting, and has hit this critical mass, where everything is rolling now. But we haven't talked about the impact that we might see of introducing Elixir to machine learning. Can you talk about why that might be interesting to us?

Sean Moriarity: Anytime you try to, I guess, like, penetrate an additional market from a programming language perspective, you have to, I would say, like, do it carefully, and think hard about why someone would choose to use your language for whatever it is that they're doing, over what they're used to. And particularly, like, in machine learning, Python is so entrenched, and it's for good reason. There are a lot of really great abstractions and great libraries in the Python ecosystem. It's friendly for, you know, beginner programmers, who might have, like, you know, an academic background, and they're interested in some aspect of, like, numerical computing or machine learning. It's very easy to, you know, pick up Python and just run with it.

So, when we first started these projects, I think a lot of people thought we were kind of crazy, because, you know, trying to target something so entrenched, like Python is in the machine learning ecosystem, you know, other languages have tried to do this, and it doesn't always have the best results. And so, we were trying to, I guess, tread carefully from the very beginning, that, you know, we don't necessarily see these projects as overtaking Python as, like, the primary language for machine learning. But we wanted to give people who were using Elixir and who were interested also in Elixir kind of an alternative to some of the original workflows. As the projects have matured, we're kind of identifying areas where our projects could have a significant advantage over some of the same projects in the Python ecosystem.

I think one of those is in our serving abstraction, which is a... Servings, in the world of machine learning, are just, like, an inference. It's essentially just a fancy way to say that we're gonna get inferences from the model in production. And in the Python ecosystem, there are, like, five or six serving projects, and they're all separate services, like TorchServe, TensorFlow Serving, Kserve, which is, like, a Kubernetes thing. There are all these abstractions for essentially overcoming, I think, some of the shortcomings that Python has as a language for deploying machine learning infrastructure. Whereas in the Elixir ecosystem, we don't necessarily have some of the same shortcomings. So, we have this abstraction, which is Nx. Serving. It is essentially a data structure or behavior that wraps up what you would see in the production inference pipeline. So, it encapsulates pre-processing, actual inference of the model, and then post-processing. And the Nx. Serving abstraction is very, very simple, but it supports some pretty insane things, like, because of the way Elixir is built on top of the Erlang virtual machine, Nx. Serving supports distribution just natively.

If you have a cluster, you can spin up, multiple servings, and they're load-balanced automatically between the nodes in your cluster, or, if you have, let's say, like, multiple GPUs, you can partition inferences between multiple GPUs. And it's a very scalable application. It's a very scalable abstraction. It's also, you get all of the goodies that you would get from building on top of the Erlang virtual machine, to begin with, like, you know, fault tolerance, and good concurrency and, you know, the ability to build robust machine learning applications on a battle-tested and production-ready virtual machine.

Bruce Tate: So, it sounds kind of like there are reasons to use Elixir, and there are reasons to do machine learning. And so, the reasons to use Elixir don't go away just because you're entering this other space, right? So, in some ways, all the things that Elixir does are becoming table stakes, right? And all the things that machine learning does are becoming table stakes. Once we can bring those two things together, some pretty exciting things happen.

Sean Moriarity: Exactly. We try as best as we possibly can to make the ecosystem, you know, interop well with the Python ecosystem. So, we have this library called Bumblebee, which supports a lot of pre-trained machine-learning models in the Python ecosystem. So, essentially what Bumblebee is, is it's very similar to the Hugging Face Transformers library, which has a ton of pre-trained transformer models, and some other, like, computer vision-based models, like ResNets and whatnot. And we built Bumblebee as kind of, like, an intermediary between ourselves and the Python ecosystem. So, we're able to take pre-trained, like, PyTorch models, convert them to what you would need to use in Elixir, and then use them directly in your Elixir applications. We also support a lot of the same tasks as you would see in the Python ecosystem. So, Hugging Face has, like, pipelines is what they call them. We call them servings.

These pipelines support anything from the named entity recognition, to text classification, image classification, to text generation. We support all those as well. So, if you are working with a data science team that, you know, they're not gonna wanna switch right away from using, you know, Elixir, or from using Python to Elixir. So they can still work in Python, they train their models in Python, and then as long as you have, like, trained or saved weights, you can kind of throw them over to your backend team, and they can write the inference pipeline in Elixir.

So, we're very intentionally, I think, you know, friendly, or try to be friendly and supportive of the Python ecosystem, because in some ways it's a necessity. There's no reason to completely disregard all of the incredible work that's been done in the Python ecosystem because it's just unrealistic to think that we would be able to catch up with 30-plus years of a head start in this space, right? So, we try to support interop. We also support ONNX, Open Neural Network Exchange, so you can transfer...or you can essentially take ONNX models that you've trained, you know, in the Python ecosystem and run them with some of the Nx abstractions. So, we have what's called, like, a storage-only backend, where you use the Nx. Serving abstraction as a way to implement an inference pipeline, and it's backed by the ONNX runtime. So, there are a lot of reasons to use Elixir without actually having to switch from using Python.

Recommended talk: Elixir in Action • Sasa Juric & Erik Schoen • GOTO 2020

Use cases for the Bumblebee library

Bruce Tate: Can we talk about some of the use cases that you might have seen in Bumblebee? What can this thing do, and where... Well, first let's talk about what it can do.

Sean Moriarity: Bumblebee can do a lot of the same things that the Hugging Face Transformers library can do. So, we support a pretty large, I would say, relative to the size of the Transformers library, we have a pretty large coverage of the pre-trained models, at least the ones that are the most popular. You could take something like a pre-trained BERT and do text classification with that, and you can fine-tune the models and, you know, pre-trained models from Bumblebee for downstream applications. We also support, like I said, these servings. So, one example that I've seen used is in entity recognition, essentially extracting, like, proper nouns out of some structured or unstructured data, like text, and identifying, like, hey, this is a person, this is a place, this is, you know, an organization. And then we also support, like, text generation. So, we do support some of the latest and greatest chat models out there.

So, LLaMA, for example, is one of the ones that we do support. You can build on large language models in Elixir without having to actually, you know, shell out to Python or something else. So, there are a lot of very powerful applications. And one of the strengths of Bumblebee is that it's a very low-code, I would say, library. Getting up and running with a text generation pipeline is probably, like, four lines of code and you're ready to go. That's pretty powerful, especially for us, because, in the Elixir ecosystem, we don't have a ton of people with machine learning experience. So, Bumblebee can give them access to, like, a quick win.

Bruce Tate: I've talked to several people who have started to make some Bumblebee contributions. They said it's remarkably easy. So, it seems like, once again, the abstractions are good.

Sean Moriarity: Yes And Bumblebee builds... It's in a 100% Elixir library, so really, the only libraries we have that touch any sort of native code are compilers for Nx. Nx itself is a 100% Elixir library, and then it's the compilers that Nx touches that are written in C C++ Rust, and some of the other native languages. But everything, from Nx to Axon to Bumblebee, it's 100% Elixir. So it's a very approachable library. The abstractions are very, I think, easy to understand once you kind of peel back the layers, so it's very powerful.

Why is Elixir a good fit for machine learning?

Bruce Tate: So, we've spent a little bit of time embracing Python, and I'm not gonna say take your shot now, but what are some of the things that Elixir does that may make it maybe even better for machine learning than some of the other machine learning languages?

Sean Moriarity: I think the obvious one here is concurrency. Python, with the GIL, it's kind of difficult to achieve the same, I would say, level of concurrency that you can achieve in an Elixir application. Elixir is good for building, like, robust, fault-tolerant, highly concurrent applications. One of the things that originally drew me to trying to do machine learning in Elixir was there's a book from the Pragmatic Bookshelf as well, "Concurrent Data Processing in Elixir." It's about building these robust data pipelines, highly concurrent data pipelines. That's something that you see a lot in machine learning workloads, and achieving some of the same, I guess, throughput that you would get in the Python ecosystem in the Elixir ecosystem is just trivial. There's a lot of, like, abstractions in the Python ecosystem that are essentially just, like, wrappers around C and C++ implementations, whereas, you know, you don't have to do the same thing in the Elixir ecosystem.

For example, like, tf. data is a data input pipeline for TensorFlow. And the same things you can do in tf.data, you can just do natively with Elixir, because it just supports this, you know, concurrent data processing out of the box. Then, a lot of, like, the, you know, OTP abstractions in Elixir are, I would say, very, very well-suited for building these, like, robust machine learning applications. So, I'm just now, because, you know, I was not... Getting into the language, I liked Elixir aesthetically, but I didn't necessarily appreciate the OTP abstractions as much. And now recently, I'm starting to get really into the OTP abstractions, and building more on what the language is designed to do, and seeing how it, you know, connects well with some of the things you wanna do in, like, the MLOps lifecycle, which is, you know, the lifecycle for deploying machine learning models and trying to identify, I guess, use cases for, oh, this is, like, you know, really powerful, and this is how this benefits the machine learning ecosystem. So, there's a lot of things that I think Elixir does better than Python, just out of, you know, the circumstances of the language. The language is designed for telecom platforms, right? Or built on top of a language designed for telecom platforms. It turns out those abstractions are also really good for building, robust web applications. That's one thing, or those are some of the things I think that Elixir does just better than Python as a consequence of how it's built.

Bruce Tate: What about immutability? Does that play a role, or does the lack of immutability play a role in the way that you've had to build Axon in layers, versus the way you might have done it with something like Python?

Sean Moriarity: I think immutability is an interesting one, because for, like, mathematicians and people who are coming from, like, an academic machine learning background, the immutability and, like, the functional style of writing things in Elixir kind of fits better with what, you know, you would be used to seeing, like, mathematically. Immutability, I think, helps a lot in reasoning about some of these more complex, highly concurrent data pipelines. But then from, like, just the, I guess, aesthetic perspective, writing a program, a numerical program functionally, I think makes a lot more sense than some of the things you would do in the Python ecosystem.

For example, TensorFlow and PyTorch both support what are called in-place operations, where essentially you have a tensor that's backed by some buffer, and you can perform an in-place sort, where that data is completely changed, completely overwritten. And I've had, you know, experiences in the Python ecosystem where I do something in place, and then you get some pretty wonky results because you don't realize that you are mutating some data, like, four or five lines up, or, you know, somewhere at the beginning of the program. You don't necessarily have that same problem in the Elixir ecosystem because everything is immutable by default.

From a performance perspective, it's something that kind of hindered Elixir from the beginning, because, with immutability, that, like, kind of implies some additional copies. But, with this compiler, this JIT compilation concept that we introduced with Nx, we kind of completely bypass any of the issues we have with immutability, because Nx works on, like, a multi-staged programming model. So, when you write a numerical definition, it gets lowered to an Nx expression, and then that gets compiled into a program. So, it's not eager by default. It's a very, you know, I would say, static workflow. You don't necessarily have the same performance hits with immutability that you would if you were just, you know, working natively in Elixir.

Recommended talk: Machine Learning Made Easy With PyCaret • Moez Ali • GOTO 2022

Behind-the-scenes stories

Bruce Tate: You just kind of carved down what your primitive operations are, right? You expand those a little bit, and contract them in other places, right? And that's pretty cool. So, I have a couple of more questions for you. Do you have some favorite moments, you know, of where this whole rollercoaster ride has taken you? Are there some favorite moments?

Sean Moriarity: Yeah, there's a lot of, like, stories related to these projects. Early on, I don't think we... There was a lot of, I would say, initial roadblocks to success. And, like, we have come a long way, but in the beginning, there was not, like, a guarantee that the projects were gonna work out. It was kind of just more of an experiment. So, I can distinctly remember some of the initial, like, trials and tribulations with these projects. One, for example, was the first time we got Nx and, like, to compile a program to the GPU, which was a little rough. So, I go into a deep dive on Twitter about this, but the Erlang virtual machine does some interesting things, like, intentionally, and when you're dealing with subprocesses, and external programs, you can run into some problems.

So, I have a deep dive on Twitter about that, but I do distinctly remember it took, like, a few days to track down some issues we were having with, you know, why could we not compile a program to the GPU? And then we had kind of a breakthrough moment, we were able to... And I think it was the program we were compiling was just, like, one plus one, or something simple. Like, it was nothing crazy. But that was pretty awesome when that first happened. And then I remember we had some initial difficulties with autograds. So, José wrote a lot of the autograd, like, infrastructure. I think he honestly has probably rewritten it, like, six or seven times in, you know, the life of the project. I remember how, just how frustrating it was at times to get some of the things to work, because autograd is not necessarily...well, automatic differentiation is not necessarily something that's, like, straightforward to implement, and straightforward to implement efficiently.

I remember the first neural network we trained, when we finally got, the automatic differentiation system working, and we had written a pure Nx neural network. It was just trained on MNIST, and I remember, I think that was, like, maybe six or seven months into the project when that first worked, and that was pretty awesome. Then, some of the first benchmarks we had were, you know, we showed that the GPU-compiled program, with Nx and EXLA were, like, 4,000 times faster than anything you can do natively in Elixir, and sharing some of those benchmarks, and people were like, "This is crazy. I can't believe this is happening." That was a lot of fun too.

Bruce Tate: Yes. With the coy messaging, you know, this thing that we're working on is X% faster, that was kind of a lot of fun to watch too.

Sean Moriarity: Yes.

Bruce Tate: And what about some moments that were particularly frustrating for you? What have been some of the hard ones to break through? You mentioned the earlier one, with kind of established that initial compilation, but what were some of the other ones that were pretty difficult?

Sean Moriarity: The GPU one, in particular, I just remember being incredibly frustrating. So, I guess, like, high-level, essentially, what was going on there is that, to compile a program to the GPU, specifically NVIDIA GPUs, TensorFlow, and XLA were using something called PTX, which is, like, an NVIDIA assembler, essentially. They were calling it out from a subprocess, using, I think, like, waitpid, or something particular. And there's, like, something where, on Linux systems, the Erlang virtual machine sets SIGCHLD to, I think, sig ignore or something specific, which essentially just results in the calling process of waitpid to completely, like, ignore the result of the program and return negative one or something insane. And, like, tracking that down took forever. And that's pretty obscure about the Erlang, you know, virtual machine, that, like, only a few people would know off the bat, right?

There was another very frustrating segmentation fault we were getting with convolutions on NVIDIA GPUs. And I remember just how frustrating it was, kind of, working with NVIDIA and trying to figure out what the deal was. It turned out the reason for the segmentation fault was the default stack size for the Dirty NIFs that we were using, which is kind of, it's an Elixir-specific thing. The Dirty NIFs we were using, the default stack size was too small. And I had talked to some of the core team for Erlang a few times about this, and they kept saying, well if you just set the stack size a little bigger, does it work? And I was like, "Yeah, I've tried that. It doesn't work," but I was setting the flag wrong or something. So, it took me, like, four or five months to realize that I was setting the flag wrong. I think NVIDIA had been telling me the same thing too. They were like, "Well, what's the stack size? Like, is the stack size too small?" So that was frustrating, but that was probably more frustrating for me, to realize that I was making a silly mistake. It was, like, a typo or something, and you know how frustrating it can be when you realize that you have a typo in your code that's been contributing to a bug for, like, four or five months now. That was frustrating. And there's been some, I guess, difficult moments in implementing Axon.

I think we have the benefit of being able to test against correct implementations in PyTorch and TensorFlow, but when you're implementing numerical algorithms, getting exact, like, correctness, numerical correctness, is very, very difficult. There can be very subtle bugs that just pop up, and they have a drastic impact on the stability of, like, training different models and the predictions you get with different models. And so, that can be very frustrating as well. I remember we implemented... So, one of the things you can do with Bumblebee is Stable Diffusion. You can do image generation. I just remember working for, like, two weeks straight, trying to get the outputs we were getting from Stable Diffusion to match, like, within a very small precision of what you get in Python. I also remember, like, thinking how insane, or, you know, how much admiration I have for people that are implementing these algorithms from scratch, without any reference implementation, and without any reference tests or anything, to say, like, you know, this is the correct implementation of whatever this algorithm is. So, it is always very frustrating to try to track down those small bugs in numerical correctness.

Pitching Elixir for machine learning

Bruce Tate: So, I have two more questions. They're wrap-up questions. One of them is, to make your last pitch for machine learning with Elixir.

Sean Moriarity: For, I think those that are deploying... Well, first, I honestly think that Elixir for machine learning startups is probably the best language that you can have because you can do everything in Elixir. And not just everything from a machine learning perspective, but also from an application development perspective. So, with LiveView, you can write your front end in Elixir. With Phoenix, you can write scalable backends with Elixir. You can have your entire, you know, inference pipeline written in Elixir, and it's fault-tolerant and scalable. You can train your models in Elixir. You can deploy your models in Elixir. I think, for a startup that's, you know, undermanned, or, you know, it doesn't necessarily have large teams, you can build at a very high velocity. I think that's something that you just don't necessarily get from another ecosystem. So, if I was a small team, that is obvious...I think that would be the first language I would reach for, would be, you know, building an application in Elixir, because you can punch above your weight.

Recommended talk: Crafting Robust Architectures for a Resilient Future • Eleanor Saitta & Jez Humble • GOTO 2023

What’s next?

Bruce Tate: I like that. So, you take JavaScript off the table, you take Python off the table, and you centralize everything on one language. That's wonderful. And I have one last question. Is there anything else that you want your listeners to know? What's coming up? What's happening?

Sean Moriarity: I guess, we've kind of ranted a little bit about machine learning in Elixir, but I want the listeners to know, if you're not familiar with the machine learning in the Elixir ecosystem, we have, I think the perfect treat for you. So, I just recently released a book, "Machine Learning in Elixir." It's out in beta now, where you can learn the fundamentals of the machine learning ecosystem in Elixir from the ground up.

So, if you're not familiar with Elixir, it's a great way to learn the language. And then, if you are familiar with Elixir and you're not familiar with machine learning, it's a great way to learn machine learning. "Machine Learning in Elixir" is, I would say, designed to be the authoritative source on everything you can do with machine learning in Elixir. And it's got a lot of inspiration from some of my favorite machine learning books, such as, you know, François Chollet's "Deep Learning with Python" which was kind of the first book I ever read in machine learning. And then some of the other popular machine learning textbooks out there, like "Deep Learning," by Ian Goodfellow and Yoshua Bengio and the other co-authors they have there.

So, I would highly recommend any listeners of these to go check out that book. Send me any errata you find, because it is in beta, so obviously, there's gonna be some issues as we upgrade the libraries and things change. But yeah, I think that's one of my key focuses now is just building out some of the educational material for people in the ecosystem to kind of, you know, cut their teeth on.

Bruce Tate: Wonderful. That's a beautiful conversation going from genetic algorithms to machine learning in Elixir. And so, for Bruce Tate and Sean Moriarity, we've been talking for the GOTO Book Club, and we're signing off. Thank you.

Sean Moriarity: Thank you, everyone.

Bruce Tate: It was a lot of fun.

About the speakers

Sean Moriarity
Sean Moriarity

Author of Genetic Algorithms in Elixir and Machine Learning in Elixir, co-creator of Nx.