In this interview, Phil breaks down what differentiates reinforcement learning from machine learning and how reinforcement learning avoids constraints found in traditional machine learning algorithms.
Top takeaways from our chat:
Reinforcement learning is a very specific part of data science: why did you restrict your book to the topic of reinforcement learning?
Phil: Data science is a large field, you could write a book about 10-20 fundamental concepts pitching data science. Reinforcement learning is a newer tool, a newer technique that is emerging to solve a distinct set of problems.
Machine learning is good at automating decisions, but reinforcement learning brings it to the next level. It’s really good at automating strategies. There’s lots of problems that exist in real life that need not just single, one shot decisions, but rather an overarching strategy to make those decisions.
That’s why I think reinforcement learning is important. It can solve that problem and therefore warrants a full book. There is a lot to talk about within the field of optimal strategies and how to find and derive them, and that’s what the book’s trying to do.
People tend to make the assumption that machine learning, neural networks and reinforcement learning are the same thing. Can you use reinforcement learning in other areas outside of neural networks?
Phil: It gets quite complicated as there is a lot of overlap. So reinforcement learning — inside the act of learning by reinforcement — you can use models. And you can use models to define those policies and strategies and basically define what the agent should do in a particular situation. So we use models to define that, and those models are neural networks.
A part of the prerequisites for the book is some exposure to machine learning, as reinforcement learning does leverage a lot of the tools that were developed for machine learning in itself. Generally, the goal is different though. In machine learning it’s a one shot, one goal decision whereas reinforcement learning is done over a much longer timescale and is done over sequential decision making. So there is overlap and both fields borrow things from each other.
An example of things going the other way: there’s a technique that you can use to help you design and craft your neural networks called neural architecture search, and it’s an automated way of trying to find the right neural architecture for a particular problem. This is typically done in a machine learning context where you’re trying to build some kind of decision making algorithm using a neural network, but actually, that process uses reinforcement learning to find the optimal network, so it kind of goes both ways — they both borrow useful elements from each other.
Reinforcement learning is in fact what nature does — this is what teaches us to repeat ‘this’ action but not to do ‘this’ action again.
Phil: Exactly! The classic experiment people have most likely heard of is the pavlovian experiment, where pavlov attempted to train animals to perform certain tasks. This is the idea where you can provide either a positive or negative reinforcement, a negative reward for performing a task. For example, you can train chickens to pick things out that you’re interested in. This applies to humans as well. As parents we try to reward our children when they show good behavior to promote that behavior, likewise if they do something bad, we have to try and stop that and punishments are a potential avenue.
Grab Phil’s brand new book to dive into everything from basic building blocks to state-of-the-art practices around reinforcement learning.
Dr. Phil Winder is a multidisciplinary engineer who creates data-driven software products. His work incorporates data science, cloud native and traditional software development using a range of languages and tools.