What is reinforcement learning and how it works

Explained with simple examples.

Today we will talk about how machines learn through trial and error and what algorithms they use to outperform humans.

Reinforcement Learning is a machine learning method in which our system (the agent) learns by trial and error. The idea is that the agent interacts with the environment, learning in parallel, and is rewarded for performing actions.

How does it work?

Reinforcement learning uses a method of giving positive rewards for correct actions and negative rewards for incorrect ones. The method assigns positive values to desirable actions to encourage the agent, and negative values to undesirable actions. This programs our agent to look for the long-term and maximum total reward to achieve the optimal solution. These long-term goals do not allow the agent to stop there. Over time, the system learns to avoid negative actions and only performs positive ones.

Learning through interaction with the environment is by trial and error.

Imagine we have a ping pong table and two racquets. Let our goal be a system where the ball should not be missed by either of the racquets. When each of them rebounds the ball, our positive reward is increased by one (+1), respectively, if the ball is missed the agent receives a negative reward (-1).

Essential terms in Reinforcement Learning

Agent: System that performs actions in an environment in order to earn some reward.
Environment (e): The scenario/environment that the agent must face.
Reward (R): An immediate return that is given to an agent, after performing a certain action or task. Positive and negative, as mentioned above.
State (s): The state refers to the current one returned by the environment.
Policy (π): The strategy that an agent applies to decide on the next action based on the current state.
Value (V): the reward that is expected in the long run. Compared to a short-term reward, a discount is taken into account.
Value function: determines the size of the variable, which is the total reward.
Model of the environment: a simulation of the behaviour of the environment (simply put, a demo of your model). This helps to determine how the environment will behave.
Q value or action value (Q): The Q value is very similar to value (V). But the main difference between them is that it takes an additional parameter as the current action.

Where to use Reinforcement Learning?

Reinforcement learning is used in the following areas:

Robotics for industrial automation (e.g. conveyor assembly).
Business strategy planning.
Automation within machine learning itself.
Advanced recommendation systems, e.g. on university resources for additional student learning.
Robotics motion control, autopilot.

Keep in mind that reinforcement learning is computationally and time consuming, especially when the space for action in the model is large.

Which algorithms to use?

The field of reinforcement learning consists of several algorithms using different approaches. The differences are mainly related to their strategies for interacting with the environment.

State-Action-Reward-State-Action (SARSA). This reinforcement learning algorithm starts by giving an agent a factor such as policy (on-policy). A policy is in this case the probability by which the algorithm estimates the chances of certain actions leading to rewards or positive states.

Q-Learning. This approach to Reinforcement Learning takes the opposite approach. The agent does not receive policy (on-policy), so its exploration of the environment is more independent. In Q-learning we have no constraints on the choice of action for the algorithm. It assumes that all subsequent action choices will be optimal by default, so the algorithm performs the choice operation based on maximizing Q’s score.

Deep Q-Networks. This algorithm uses neural networks in addition to reinforcement learning. The neural networks perform independent research on the reinforcement learning environment, in order to select the best value. How the algorithm will behave and select values is based on a sample of past positive actions obtained by the neural network.

How is it different from classical Deep Learning?

Reinforcement learning is similar to deep learning except for one thing: With reinforcement learning, the machine learns by trial and error, using data from its own experience.
Reinforcement learning algorithm is an independent, self-learning system. To get the best results, the machine learns by constant practice, from which follows the concept of learning by trial and error.

Humans are somewhat of an example of learning with reinforcement. For example, trying to learn how to ride a bike, or to swim: the process consists of right and wrong moves.

Deep learning involves exploring an already existing data, based on which we later apply our findings to a new set of data.
Reinforcement learning, on the other hand, is dynamic (independent) learning, which uses trial and error in order to make informed decisions.

Main challenges of Reinforcement Learning

Reinforcement learning, whilst having high potential, can be difficult to deploy and unfortunately remains limited in application. One obstacle to deployment in this realm of machine learning is the reliance on environmental research.

For instance, you deploy a robot that applies reinforcement learning to navigate in the environment. It will look for new states and take different actions as it moves. However, it is difficult to consistently take the best action because of the frequent changes in the environment. If you set the robot’s environment as your home, after rearranging objects or furniture, your device will have to be completely adapted to the new environment.

The time required for proper reinforcement learning can limit its usefulness and require significant computational resources. As learning environments become more complex, demands on time and computational resources increase. There are challenges reinforcement learning specialists will have to address in the near future.

Conclusion

Reinforcement learning is a computational approach to learning based on interactions within an environment.

Moreover, reinforcement learning is an advanced technology that will change our world sooner rather than later. It is what makes machine learning a creative process, as the machine’s independent search for new, innovative ways to solve problems is already creative.

Implementation of Reinforcement Learning is already happening: for example, the famous AlphaGo DeepMind (an algorithm for playing the popular Asian game Go) uses game tactics that were initially thought to be flawed, but the machine subsequently beat one of the strongest players in Go, Lee Sedol. And an advanced version of AlphaGo Zero surpassed AlphaGo DeepMind in just 40 days of self-learning.

Thus, reinforcement learning is already a revolutionary technology and will undoubtedly be the next step in the development of the artificial intelligence industry.