Introduction

Often, a very simple pattern of extra rewards suffices to render straightforward an otherwise completely intractable problem.

Current Problems

Consider the following example of bugs that can arise:

In a system that learns to ride a simulated bicycle to a particular location. To speed up learning, they provided positive rewards whenever the agent made progress towards the goal. Because no penalty was incurred for riding away from the goal.

![](/ox-hugo/2021-05-15_15-08-24_screenshot.png” caption=“<span class=“figure-number”>Figure 1: Original problem of riding bicycle)

![](/ox-hugo/2021-05-15_15-10-46_screenshot.png” caption=“<span class=“figure-number”>Figure 2: Speed up learning by adding positive rewards whenever the agent made progress towards goal.)

Hence it’s now better for the bicycle to try to go in a cycle than to go to the goal.

Preliminaries

A finite-state Markov decision process(MDP) is a tuple , where:

  • is a set of states;
  • is a set of actions;
  • is the next state transition probability;
  • is the discount factor;
  • is a bounded real function called the reward function.

A policy over a set of states S is a function .

Thus,

  • Value function
  • Action function

Hence, the optimal value function is

The optimal Q-function is

The optimal policy is

Views

We will run on a transformed MDP where , and is also a bounded real-value function called the shaping reward function.

We are trying to learn a policy for some MDP M, and we wish to help our learning algorithm by giving it additional shaping rewards which will hopefully guide it towards learning a good policy faster.

But for what forms of shaping-rewards can we guarantee that , the optimal policy in will also be optimal in ?

Definition A shaping reward function is potential-based if there exists s.t.

If is a potential-based shaping function, then every optimal policy in will also be an optimal policy in .

Proof omission.

Conclusion

This suggests that a way to define a good potential function might be to try to approximate .