← Home

Stochastic Policies

Nov 20, 2025

A # Multivariate Gaussian Distributiuon is described by a mean vector, $\mu$, and a covariance matrix, $\Sigma$. A diagonal Gaussian distribution is a special case where the covariance matrix only has entries on the diagonal.

Representation of the covariance matrix

Single vector

There is a single vector of log standard deviations, $\log \Sigma$, which is not a function of state: the $\log \Sigma$ are standalone parameters.

Neural Network

There is a neural network that maps from states to log standard deviations, $\log \Sigma_{\theta}(s)$. It may optionally share some layers with the mean network.

Note that in both cases we output log standard deviations instead of standard deviations directly. This is because log stds are free to take on any values in ($-\infty, \infty$), while stds must be nonnegative.

Sampling

Given the mean action $\mu_{\theta}(s)$ and standard deviation $\Sigma_{\theta}(s)$, and a vector $z$ of noise from a spherical Gaussian ($z \sim \mathcal{N}(0, I)$), an action sample can be computed with

\[ a = \mu_{\theta}(s) + \Sigma_{\theta}(s) \odot z, \]

Log-Likelihood

The log-likelihood of a k-dimensional action $a$, for a diagonal Gaussian with mean $\mu = \mu_{\theta}(s)$ and standard deviation $\Sigma = \Sigma_{\theta}(s)$, is given by

\[ \log \pi_{\theta}(a|s) = -\frac{1}{2}\left(\sum_{i=1}^k \left(\frac{(a_i - \mu_i)^2}{\sigma_i^2} + 2 \log \sigma_i \right) + k \log 2\pi \right). \]