foundations of computational agents
The following are the main points you should have learned from this chapter:
A Markov decision process is an appropriate formalism for reinforcement learning. A common method is to learn an estimate of the value of doing each action in a state, as represented by the function.
In reinforcement learning, an agent should trade off exploiting its knowledge and exploring to improve its knowledge.
Off-policy learning, such as Q-learning, learns the value of the optimal policy. On-policy learning, such as SARSA, learns the value of the policy the agent is actually carrying out (which includes the exploration).
Model-based reinforcement learning separates learning the dynamics and reward models from the decision-theoretic planning of what to do given the models.
For large state or action spaces, reinforcement learning algorithms can be designed to use generalizing learners such as neural networks) to represent the value function, the -function and/or the policy.