# Understanding Algorithms for Reinforcement Learning

##### Janani Ravi

By###### Course info

###### Course info

###### Description

Traditional machine learning algorithms are used for predictions and classification. Reinforcement learning is about training agents to take decisions to maximize cumulative rewards. In this course, Understanding Algorithms for Reinforcement Learning, you'll learn basic principles of reinforcement learning algorithms, RL taxonomy, and specific policy search techniques such as Q-learning and SARSA. First, you'll discover the objective of reinforcement learning; to find an optimal policy which allows agents to make the right decisions to maximize long-term rewards. You'll study how to model the environment so that RL algorithms are computationally tractable. Next, you'll explore dynamic programming, an important technique used to cache intermediate results which simplify the computation of complex problems. You'll understand and implement policy search techniques such as temporal difference learning (Q-learning) and SARSA which help converge on to an optimal policy for your RL algorithm. Finally, you'll build reinforcement learning platforms which allow study, prototyping, and development of policies, as well as work with both Q-learning and SARSA techniques on OpenAI Gym. By the end of this course, you should have a solid understanding of reinforcement learning techniques, Q-learning and SARSA and be able to implement basic RL algorithms.

###### Section Introduction Transcripts

Course Overview

Hi. My name Janani Ravi, and welcome to this course on Understanding Algorithms for Reinforcement Learning. A little about myself, I have a Masters degree in Electrical Engineering from Stanford and have worked at companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on real-time collaborative editing in Google Docs, and I hold four patents for its underlying technologies. I currently work on my own startup, Loonycorn, a studio for high-quality video content. In this course, you will learn basic principles of reinforcement learning algorithms, RL taxonomy, and specific policy search techniques such as Q-learning and SARSA. We'll start off by understanding the objective of reinforcement learning to find an optimal policy, which allows agents to make the right decisions to maximize long-term rewards. RL has a wide variety of use cases such as optimizing trucking routes to conserve fuel, finding the best moves to beat an opponent in chess. We'll study how to model the environment using Markov decision processes so that RL algorithms are computationally tractable. We'll then study dynamic programming, an important technique used to memoize intermediate results, which simplifies the computation of complex problems. We'll understand and implement policy search techniques such as temporal difference learning, also called Q-learning, and SARSA, which help converge to an optimal policy for our RL algorithm. We'll then study reinforcement learning platforms, which allow us to study prototype and develop our policies. We'll work with both Q-learning and SARSA techniques on OpenAI Gym. At the end of this course, you should have a solid understanding of reinforcement learning techniques, Q-learning and SARSA, and be able to implement basic RL algorithms.

Understanding the Reinforcement Learning Problem

Hi, and welcome to this course on Understanding Algorithms for Reinforcement Learning. When a student gets started with machine learning techniques, they typically work on supervised and unsupervised techniques, techniques such as classification, regression, clustering, and dimensionality reduction. Reinforcement learning differs from supervised and unsupervised learning. In fact, reinforcement learning is used for creating agents that know how to explore an uncertain environment. Often in the real world, there is no training data for your machine learning algorithm to work with, either labeled or unlabeled. This is where reinforcement learning comes in. You'll train a model to make decisions where you have no idea initially how those decisions will turn out. The job of your model, or agent in this case, will learn about the uncertain environment until it's known. Your agent in an RL technique is the decision maker, and the decision maker needs to choose his actions appropriately so that rewards are maximized. Your model learns by getting positive reinforcement in the form of a reward or negative reinforcement for every action in every state. The objective of a reinforcement learning algorithm, or the model as we call it, is to determine a policy for such decision making. This policy is what will drive the decisions or the actions of our algorithm, and the output of reinforcement learning is basically a series of actions that have been created using this policy. The output of a classification problem might be labels. The output of a regression problem is predictions. The output of a reinforcement learning algorithm is a series of actions.

Implementing Reinforcement Learning Algorithms

Hi, and welcome to this module where we'll see how we can go about implementing reinforcement learning algorithms. We'll start off by studying the basic concepts of dynamic programming. Dynamic programming is a key technique that is used to implement reinforcement learning. Dynamic programming allows us to cache information at every state, thus making exploring the environment tractable. In this module, we'll then move on to studying Q-learning, which is a technique used in reinforcement learning to find the best policy to use to take decisions within a certain environment. In order to use Q-learning techniques, we'll first need to model our environment as a Markov decision process where all the information about a particular state is embedded in that state itself. How we got to that state is irrelevant. There are a bunch of different techniques that can be used to implement Q-learning. We'll focus on understanding two specific techniques, the temporal difference method and SARSA. We'll also get hands on with Q-learning here. We'll use the temporal difference method to find the shortest path from a source to a destination node on a graph.

Using Reinforcement Learning Platforms

Hi, and welcome to this module where we'll get really hands on with reinforcement learning algorithms by using platforms. Now why are platforms needed in reinforcement learning techniques? If you think about the applications of reinforcement learning in the real world, it will become pretty obvious to you that these are better prototyped using platforms, and they are hard to test in the real world. Let's say you're teaching a robot to walk. If your robot keeps walking around in the real world, it might walk off a cliff or damage itself in some way. The OpenAI Gym is a very popular reinforcement learning platform for testing and prototyping your models. This an open-source platform, which means it's freely available for anyone to use. We'll implement several reinforcement learning techniques. We'll first implement reinforcement learning using the SARSA methodology in the FrozenLake environment on OpenAI Gym. We'll then implement reinforcement learning using the temporal difference method, or Q-learning, in the CartPole environment in OpenAI Gym.