#rl
63 notes
- Guided Policy Search(GPS) | Abracadabra
- $\pi_{0.6}$: A VLA That Learns From Experience
- A Synchronous Advantage Actor-Critic
- Categorical Policies
- Reinforcement Learning
- Reinforcement Learning for Humanoid Robots
- Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning
- Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance
- Vision-Language Models Provide Promptable Representations for Reinforcement Learning
- Notes on Deep rl at scale: sorting waste in office building with a fleet of mobile manipulators
- Efficient Online Reinforcement Learning with Offline Data
- Hindsight Experience Replay
- Notes on Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation
- Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space
- Near-Optimal Representation Leanring for Hierarchical Reinforcement Learning
- Solving Compositional Reinforcement Learning Problems via Task Reduction
- Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
- Transporter Networks: Rearranging the Visual World for Robotic Manipulation
- Multi-Task Learning with Sequence-Conditioned Transporter Networks
- Multi-Task Reinforcement Learning with Soft Modularization
- Modular Multitask Reinforcement Learning with Policy Sketches
- Meta Reinforcement Learning with Aotonomous Inference of Subtask Dependencies
- Hierarchical Reinforcement Learning for Zero-shot generalization with Subtask Dependencies
- Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning
- Rapid Exploration for Open-World Navigation with Latent Goal Models
- Why Generalization in RL is Difficult
- Which Mutual Information Representation Learning Objectives are Sufficient for Control
- DQN
- Deep Deterministic Policy Gradient
- Deterministic Policy Gradient
- Asynchronous Advantage Actor Critic
- Gradient Temporal-Difference
- Off Policy Actor Critic
- Actor Critic
- Reward Shaping
- REINFORCE
- Policy Gradient
- Trust Region Policy Optimization
- Proximal Policy Optimization
- Model Free RL
- SOLAR
- Bootstrap Ensembles
- Guided Search Method V3
- Guided Search Method V2
- Guided Search Method V1
- Model Free with Model
- Backpropagate Gradient
- Latent Model
- With Model Uncertainty
- Model Based Method 1.5
- Model Based Method 1.0
- Model Based Method 0.5
- Collocation Method
- Shooting Method
- Linear Quadratic Regression
- Monte Carlo Tree Search(MCTS)
- Cross Entropy Methods(CEM)
- Random Shooting Methods
- Stochastic Close Loop Case
- Stochastic Open Loop Case
- Deterministic Case
- Model Based RL