← Home

Stochastic Close Loop Case

Oct 31, 2020

rl mbrl

Description

当状态和动作确定时，下一状态无法确定。Agent 可以依据模型生成一个策略，即每走一步，生成一步的动作。

\begin{array}{c} p≤ft(\mathbf{s}₁, \mathbf{a}₁, \ldots, \mathbf{s}_T, \mathbf{a}_T\right)=p≤ft(\mathbf{s}₁\right) ∏_t=1^T π≤ft(\mathbf{a}_t \mid \mathbf{s}_t\right) p≤ft(\mathbf{s}_t+1 \mid \mathbf{s}_t, \mathbf{a}_t\right)
π=arg max _π E_{τ ∼ p(τ)}≤ft[∑_t r≤ft(\mathbf{s}_t, \mathbf{a}_t\right)\right]

\end{array}