13.11 Review

The following are the main points you should have learned from this chapter:

•

A Markov decision process is an appropriate formalism for reinforcement learning. A common method is to learn an estimate of the value of doing each action in a state, as represented by the $Q(S,A)$ function.
•

In reinforcement learning, an agent should trade off exploiting its knowledge and exploring to improve its knowledge.
•

Off-policy learning, such as Q-learning, learns the value of the optimal policy. On-policy learning, such as SARSA, learns the value of the policy the agent is actually carrying out (which includes the exploration).
•

Model-based reinforcement learning separates learning the dynamics and reward models from the decision-theoretic planning of what to do given the models.
•

For large state or action spaces, reinforcement learning algorithms can be designed to use generalizing learners such as neural networks) to represent the value function, the $Q$ -function and/or the policy.

Artificial Intelligence 3E