Reinforcement Learning is Supervised Learning on Optimized Data

The whole article is interesting. It worth reading regularly.

The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming. Methods that compute the gradients of the non-differentiable expected reward objective, such as the REINFORCE trick are commonly grouped into the optimization perspective, whereas methods that employ TD-learning or Q-learning are dynamic programming methods.

FF's Roam Notes

Explorer

Reinforcement Learning is Supervised Learning on Optimized Data

Graph View