Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning

In traditional goal-conditioned RL, an agent is provided with exact goal they intend to reach. However it is not realistic to know the configuration of goal before performing a task.

This paper propose a new representation learning algorithm, which can be used in goal-conditioned RL (also common RL), using bisimulation relation to use seen state-goal representation to replace unseen state.

Two states are bisimilar if they share both the same immediate reward and equivalent distributions over the next bisimilar states.

Let $ϕ (s_{i}, g_{i})$ as state-goal encoder, $ψ (s_{i})$ as state encoder.

directly optimize $ϕ$ The bisimulation relation can be described as the distance of two state $s_{i}, s_{j}$ :

\begin{equation*} d = \phi(s_i, g_i) - \phi(s_j, g_j) = (R_i - R_j) + (P(s^{’}_i) - P(s^{’}_j) \end{equation*}

The distance closer, the bisimulation relation stronger.
state abstraction The information $ϕ (s_{i}, g_{i})$ contained only need to consist of the difference of goal state and current state, which is $ϕ (s_{i}, g_{i}) = ψ (g_{i}) - ψ (s_{i})$ . For any state $s_{j}$ who has strong bisimulation relation with $s_{i}$ , $ψ (g_{j}) - ψ (s_{j}) = ψ (g_{i}) - ψ (s_{i})$ .
reinforcement learning algorithm update

So, at test time the task goal g is unknown but instead specified by a separate state-goal pair $s_{a}, g_{a}$ that achieves an analogous outcome with respect to another state.

But how to find a bisimulation state?

Use value function to choose.

FF's Roam Notes

Explorer

Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning

Graph View