Categorical Policies
A categorical policy is like a classifier over discrete actions.
The input is the observation, followed by some number of layers, and then have one final linear layer that gives logits for each action, followed by a # Softmax to convert the logits into probabilities.