Accelerating Residual RL

Abstract

Residual Reinforcement Learning (RL) is a popular approach for adapting pretrained policies by learning a lightweight residual policy that provides corrective actions. While Residual RL is more sample-efficient than finetuning the entire base policy, existing methods struggle with sparse rewards and are designed for deterministic base policies. We propose two improvements to Residual RL that further enhance its sample efficiency and make it suitable for stochastic base policies. First, we leverage uncertainty estimates of the base policy to focus exploration on regions in which the base policy is not confident. Second, we propose a simple modification to off-policy residual learning that allows it to observe base actions and better handle stochastic base policies. We evaluate our method with both Gaussian-based and Diffusion-based stochastic base policies on tasks from Robosuite and D4RL, and compare against state-of-the-art finetuning methods, demo-augmented RL methods, and other Residual RL methods. Our algorithm significantly outperforms existing baselines in a variety of simulation benchmark environments. We also deploy our learned policies in the real world to demonstrate their robustness with zero-shot sim-to-real transfer.

✨ Residual RL for stochastic base policies

Our second idea is to optimize the Residual RL algorithm for stochastic policies using an asymmetric actor critic approach where we learn the critic for the combined action (i.e. base action + residual action) and the actor for only the residual action.

Our findings reveal that using this architecture yields significant performance improvement for stochastic base policies but both architectures work for deterministic base policies.

Experiments

We conduct experiments with two kinds of uncertainty metrics (distance-to-data and ensemble variance) with both Gaussian mixture model and diffusion based base policies across multiple tasks and environments. We notice strong performance of our method over direct finetuning methods and other Residual RL methods.

Diffusion base policy

GMM base policy

Sim-to-Real-Transfer
We also evaluate the learned policies in real world using sim-to-real transfer.

BibTeX

@ARTICLE{11267054, author={Dodeja, Lakshita and Schmeckpeper, Karl and Vats, Shivam and Weng, Thomas and Jia, Mingxi and Konidaris, George and Tellex, Stefanie}, journal={IEEE Robotics and Automation Letters}, title={Accelerating Residual Reinforcement Learning With Uncertainty Estimation}, year={2026}, volume={11}, number={1}, pages={970-977}, keywords={Uncertainty;Stochastic processes;Reinforcement learning;Imitation learning;Training;Robustness;Tuning;Transforms;Training data;Robot control;Reinforcement learning (RL);deep learning methods;machine learning for robot control}, doi={10.1109/LRA.2025.3636808}}

Accelerating Residual Reinforcement Learning with Uncertainty Estimation

Abstract

✨ Constraining Exploration

✨ Residual RL for stochastic base policies

Experiments

BibTeX