Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-based Sampling with Differential Equations

Method overview
Offline training minimizes diffusion and contraction losses by computing the score Jacobian across denoising steps and penalizing its largest eigenvalues. At deployment, the frozen policy uses the now contractive ODE sampling to generate actions from observations.

Abstract

Diffusion policies have emerged as powerful generative models for offline policy learning, where their sampling process can be rigorously characterized by a score function guiding a stochastic differential equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs discretization errors, large data requirements, and inconsistencies in action generation. While not critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the sampling flows of diffusion SDEs. Contraction pulls nearby flows closer to enhance robustness against solver and score errors while mitigating unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and negligible computational cost. Empirically, we evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world. Across benchmarks, CDPs often outperform base policies, with pronounced benefits under data scarcity.

Contractive Diffusion Policies

Contractive Diffusion Policies (CDPs) enhance robustness by pulling noisy action trajectories closer together, mitigating solver and score errors while stabilizing learning.

Contractive Diffusion Policies concept (cone)
The figure illustrates how contractive diffusion sampling behaves in comparison to vanilla diffusion, highlighting its potential to mitigate integration, and score-matching errors, while also rendering the entire process less sensitive to the initial action seed.

Intuition

To take a closer look, we compare CDP against vanilla diffusion during training by sampling from the learned policy at each epoch. This illustrates how contraction reshapes the sampling process, concentrating actions near meaningful modes for improved accuracy.

Training intuition across epochs
Training CDP vs. vanilla diffusion on a 2D dataset. As training progresses, both methods produce increasingly accurate samples. However, CDP samples tend to concentrate near the mean of distinct action modes, effectively mitigating inaccuracies from both the solver and score matching.

The following figure shows how contraction affects the generated flows in a toy ODE. Comparing these flows to one in the first figure, we observe why contraction proves useful in diffusion sampling.

Contraction effects on diffusion ODE flows in a 2D action space (fixed state)
ODE flows for a toy 2D action space with a fixed state, showing how contraction affects the dynamics of diffusion ODE flows.

Rollouts

Video clips of real world and simulated (added soon) rollouts. CDP is mainly benchmarked in simulation, but we also deploy it on a real Franka robot arm to showcase its reliability in the real world.

Results

Simulation

Simulation results in low-data regime
Experiments on reduced datasets. When training on only 10% of the original dataset size, we observe that CDP decisively outperforms the baselines. This improvement stems from the ability of contraction to dampen score-matching errors, which are amplified in low-data regimes.

Real World

Real-world experiments on the Franka arm
Experiments with the physical Franka arm. We execute each policy 20 times per task, and report the average success rate and execution time. CDP often performs better than standard DBC, particularly for the harder tasks: Slide and Peg.