Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations.

Abstract

Diffusion policies have emerged as powerful generative models for offline policy learning, where their sampling process can be rigorously characterized by a score function guiding a stochastic differential equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs discretization errors, large data requirements, and inconsistencies in action generation. While not critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the sampling flows of diffusion SDEs. Contraction pulls nearby flows closer to enhance robustness against solver and score errors while mitigating unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and negligible computational cost. Empirically, we evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world. Across benchmarks, CDPs often outperform base policies, with pronounced benefits under data scarcity.

Contractive Diffusion Policies

Contractive Diffusion Policies (CDPs) enhance robustness by pulling noisy action trajectories closer together, mitigating solver and score errors while stabilizing learning.

The figure illustrates how contractive diffusion sampling behaves in comparison to vanilla diffusion, highlighting its potential to mitigate integration, and score-matching errors, while also rendering the entire process less sensitive to the initial action seed.

Intuition

To take a closer look, we compare CDP against vanilla diffusion during training by sampling from the learned policy at each epoch. This illustrates how contraction reshapes the sampling process, concentrating actions near meaningful modes for improved accuracy.

Training CDP vs. vanilla diffusion on a 2D dataset. As training progresses, both methods produce increasingly accurate samples. However, CDP samples tend to concentrate near the mean of distinct action modes, effectively mitigating inaccuracies from both the solver and score matching.

The following figure shows how contraction affects the generated flows in a toy ODE. Comparing these flows to one in the first figure, we observe why contraction proves useful in diffusion sampling.

Contraction effects on diffusion ODE flows in a 2D action space (fixed state)

ODE flows for a toy 2D action space with a fixed state, showing how contraction affects the dynamics of diffusion ODE flows.

Rollouts

Video clips of real world and simulated (added soon) rollouts. CDP is mainly benchmarked in simulation, but we also deploy it on a real Franka robot arm to showcase its reliability in the real world.

Results

Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-based Sampling with Differential Equations

Abstract