Which Mutual Information Representation Learning Objectives are Sufficient for Control
This paper formalize the sufficiency of a state representation for learning and representing the optimal policy, and study several popular mutual-information(MI) based objectives through this lens.
A useful state representation should be sufficient to learn and represent the optimal policy or the optimal value function, while discarding irrelevant and redundant information.
Mutual information
In information theory, the mutual information(MI) between two random variables, $X$ and $Y$, is defined as:
\[ I(X;Y) = \mathbb{E}_{p(x,y)}\log\frac{p(x,y)}{p(x)p(y)}=H(X)-H(X|Y) \]
MI measures the reduction in the uncertainty of one random variable from observing the value of the other. So, the goal of the representation learning for RL is to maximize the MI between the original state and the compact state representation.
Sufficient Representations for RL
Definition 1 A stochastic representation $\phi_{Z}(s)$ is a mapping from states $s\in S$ to a probability distribution $p(Z|S=s)$ over elements of a new representation space $z \in Z$.
Definition 2 A representation $\phi_{Z}$ is $\pi^{\star}$-sufficient with respect to a set of reward functions $R$ if $\forall r \in R, \phi_{Z}(s_1)=\phi_{Z}(s_2) \rightarrow \pi_r^{\star}(A|s_1)=\pi_r^{\star}(A|s_2)$.
Definition 2 A representation $\phi_{Z}$ is $Q^{\star}$-sufficient with respect to a set of reward functions $R$ if $\forall r \in R, \phi_{Z}(s_1)=\phi_{Z}(s_2) \rightarrow Q_r^{\star}(A|s_1)=Q_r^{\star}(A|s_2)$.
Mutual Information for Representation Learning in RL
Several MI objectives:
Forward information \[ J = I(Z_{t+1};Z_t,A_t) \]
State-only transition information \[ J = I(Z_{t+k};Z_t) \]
Inverse information \[ J = I(A_t;Z_{t+k}|Z_t) \]
Experiments
They first optimize each representation learning objective on a dataset which is collected from a uniform random policy, then freeze the weights of the state encoder learned in the first phase and train RL agent with the representation as state input.