Home Conference Sessions Reinforcement Le...

Reinforcement Learning - ChatGPT, Playing Games, and More

Dean Wampler | GOTO Chicago 2023

You need to be signed in to add a collection

Reinforcement Learning (RL) trains an agent to maximize a cumulative reward in an environment. It rocketed to fame as the tool to achieve expert level performance in Atari games and the game of Go. It is also used for robotics, autonomous vehicles, process automation, and more recently, making ChatGPT more effective. I will begin with why RL is important and how it supports the applications listed above, including "Reinforcement Learning with Human Feedback", an essential tool used to develop ChatGPT. Then I will discuss how RL requires a variety of computational patterns: data management and processing, large-scale simulations and model training, and even model serving. Finally, I will show how Ray RLlib seamlessly and efficiently supports RL, providing an ideal platform for building Python-based, RL applications with an intuitive, flexible API.

Share on:
linkedin facebook
Copied!

Transcript

Reinforcement Learning (RL) trains an agent to maximize a cumulative reward in an environment. It rocketed to fame as the tool to achieve expert level performance in Atari games and the game of Go. It is also used for robotics, autonomous vehicles, process automation, and more recently, making ChatGPT more effective.

I will begin with why RL is important and how it supports the applications listed above, including "Reinforcement Learning with Human Feedback", an essential tool used to develop ChatGPT. Then I will discuss how RL requires a variety of computational patterns: data management and processing, large-scale simulations and model training, and even model serving.

Finally, I will show how Ray RLlib seamlessly and efficiently supports RL, providing an ideal platform for building Python-based, RL applications with an intuitive, flexible API.

About the speakers

Dean Wampler

Dean Wampler

Product Engineering Director for Accelerated Discovery, IBM Research.