Contractive Diffusion Policies

Robust Action Diffusion via Contractive Score-based Sampling with Differential Equations

ICLR 2026 Conference Paper
1Department of Electrical and Computer Engineering, McGill University 2Center for Intelligent Machines, McGill University 3Department of Computer Science, McGill University 4Mila–Quebec AI Institute
Method overview
Methodology overview of Contractive Diffusion Policies. Offline training minimizes diffusion and contraction losses by computing the score Jacobian across denoising steps and penalizing its largest eigenvalues.
At deployment, the frozen policy uses the now contractive ODE sampling to generate actions from observations.

Overview

Diffusion policies have emerged as powerful generative models for offline policy learning, where their sampling process can be rigorously characterized by a score function guiding a stochastic differential equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs discretization errors, large data requirements, and inconsistencies in action generation. While not critical in image generation, these inaccuracies compound and lead to failure in continuous control settings.

We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the sampling flows of diffusion SDEs. Contraction pulls nearby flows closer to enhance robustness against solver and score errors while mitigating unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and negligible computational cost. Empirically, we evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world.

While much of today's robotics research is driven by ever larger models and empirical scaling laws, we show substantial and theoretically grounded improvements to the action diffusion process itself by encouraging contractive flows in diffusion sampling. Across various benchmarks, contractive policies yield:

higher average reward and success rates in offline learning settings, and
minimal computational overhead with efficient eigenvalue computation.

Contractive Diffusion Sampling

CDPs enhance robustness by pulling noisy action trajectories closer together through a contraction loss, which mitigates solver and score errors while stabilizing learning.

Contractive Diffusion Policies concept (cone)
The figure illustrates how contractive diffusion sampling behaves in comparison to vanilla diffusion, highlighting its potential to mitigate integration, and score-matching errors, while also rendering the entire process less sensitive to the initial action seed.

Contraction with a Toy Example

To take a closer look, we compare CDP against vanilla diffusion during training by sampling from the learned policy at each epoch. This illustrates how contraction reshapes the sampling process, concentrating actions near meaningful modes for improved accuracy.

Training intuition across epochs
CDP vs. vanilla diffusion on a 2D dataset. As training progresses, both methods produce increasingly accurate samples. However, CDP samples tend to concentrate near the mean of distinct action modes, effectively mitigating inaccuracies from both the solver and score matching.

Prior work shows contraction in diffusion sampling reduces score-matching and discretization errors; CDP adapts this with promoting contraction along reverse ODE trajectories.

Contraction effects on diffusion ODE flows in a 2D action space (fixed state)
ODE flows for a toy 2D action space with a fixed state, showing how contraction affects the dynamics of diffusion ODE flows.

Real-World Rollouts

CDP is benchmarked extensively on D4RL and Robomimic benchmarks in simulation, but we also deploy CDP on a real Franka robot arm to showcase its reliability in the real world. To make learning more challenging, we do not use a wrist view, only agent view observations are available to the model. This makes consistency of action generation even more critical, especially in Slide and Peg tasks.

Highlights of CDP Results

Simulation

Simulation results in low-data regime
Experiments on reduced datasets. When training on only 10% of the original dataset size, we observe that CDP decisively outperforms the baselines. This improvement stems from the ability of contraction to dampen score-matching errors, which are amplified in low-data regimes.

Real World

Real-world experiments on the Franka arm
Experiments with the physical Franka arm. We execute each policy 20 times per task, and report the average success rate and execution time. CDP often performs better than standard DBC, particularly for the harder tasks: Slide and Peg.

BibTeX

@inproceedings{
abyaneh2026contractive,
title={Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations},
author={Amin Abyaneh and Charlotte Morissette and Mohamad H. Danesh and Anas Houssaini and David Meger and Gregory Dudek and Hsiu-Chin Lin},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=iKJbmx1iuQ}
}