Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Dynamic Optimal Transport

Optimal transport becomes especially powerful once distances between measures are seen as actions of moving mass. This chapter first develops the dynamic language: continuity equations describe admissible measure evolutions, while the Benamou--Brenier formula identifies W2\Wass_2 with a least-action principle. These ideas prepare the gradient-flow and generative-model chapters that follow.

Evolutions Over the Space of Measures

We start with the continuity equation because it is the common language for particles, densities and weak measure evolutions. It also makes precise which velocity fields actually move mass.

Lagrangian and Eulerian Descriptions

Consider an evolution tαtP(Rd)t\mapsto\alpha_t\in\mathcal P(\RR^d). It can be described in a Lagrangian way as the advection of particles along a time-dependent vector field vt(x)v_t(x):

dx(t)dt=vt(x(t)).\frac{\d x(t)}{\d t}=v_t(x(t)).

Writing TtT_t for the associated flow map, so that Tt(x(0))=x(t)T_t(x(0))=x(t), the advected measure is

αt=(Tt)α0.\alpha_t=(T_t)_\sharp\alpha_0.

For empirical measures, αt=n1i=1nδxi(t)\alpha_t=n^{-1}\sum_{i=1}^n\delta_{x_i(t)}, each particle solves (1).

In the Eulerian description, the same motion is written directly on the evolving measure:

αtt+div(vtαt)=0.\frac{\partial\alpha_t}{\partial t} +\operatorname{div}(v_t\alpha_t)=0.

This PDE is often called the advection equation, the continuity equation, or Liouville’s equation when it acts on phase space. It is a classical PDE only when αt\alpha_t has a smooth density. For general measures, and in particular for empirical measures, it is understood weakly: for any smooth test function (t,x)φ(t,x)(t,x)\mapsto\varphi(t,x) compactly supported in time,

01 ⁣Rd(tφ(t,x)+vt(x),xφ(t,x))dαt(x)dt=0.\int_0^1\!\int_{\RR^d} \left( \partial_t\varphi(t,x) +\dotp{v_t(x)}{\nabla_x\varphi(t,x)} \right) \d\alpha_t(x)\d t =0.

This weak equation is obtained from (3) by integration by parts. For smooth positive densities, the classical and weak formulations are equivalent; for particle clouds, the weak form remains meaningful.

Proof

Let φ(t,x)\varphi(t,x) be a smooth test function vanishing at t=0t=0 and t=1t=1. Since αt=(Tt)α0\alpha_t=(T_t)_\sharp\alpha_0,

ddtφ(t,x)dαt(x)=ddtφ(t,Tt(y))dα0(y).\frac{\d}{\d t}\int \varphi(t,x)\d\alpha_t(x) = \frac{\d}{\d t}\int \varphi(t,T_t(y))\d\alpha_0(y).

The chain rule gives

ddtφ(t,Tt(y))dα0(y)=(tφ(t,Tt(y))+xφ(t,Tt(y)),tTt(y))dα0(y).\frac{\d}{\d t}\int \varphi(t,T_t(y))\d\alpha_0(y) = \int \left( \partial_t\varphi(t,T_t(y)) +\dotp{\nabla_x\varphi(t,T_t(y))}{\partial_t T_t(y)} \right) \d\alpha_0(y).

Using the definition of vtv_t and the push-forward relation, this is

(tφ(t,x)+xφ(t,x),vt(x))dαt(x).\int \left( \partial_t\varphi(t,x)+\dotp{\nabla_x\varphi(t,x)}{v_t(x)} \right) \d\alpha_t(x).

Integrating in time and using the boundary values of φ\varphi gives (4).

From Measure Evolutions to Vector Fields

For a given evolution (αt)t(\alpha_t)_t, there are typically infinitely many velocity fields vtv_t satisfying

tαt+div(αtvt)=0.\partial_t\alpha_t+\operatorname{div}(\alpha_t v_t)=0.

This non-uniqueness comes from the kernel of the weighted divergence. The linear space of vector fields that leave a measure α\alpha invariant is

Hα={v:div(αv)=0}.\mathcal H_\alpha = \{v:\operatorname{div}(\alpha v)=0\}.

It is usually non-trivial: if α\alpha is an isotropic Gaussian, Hα\mathcal H_\alpha contains rotational vector fields generated by anti-symmetric matrices.

Dacorogna--Moser Inversion

Reconstructing particles from an observed density evolution is therefore ill-posed. A simple choice, introduced by Dacorogna and Moser Dacorogna & Moser, 1990, imposes that the flux αtvt\alpha_t v_t is a gradient field. Formally,

vt=1αtΔ1(tαt),v_t = -\frac{1}{\alpha_t} \nabla\Delta^{-1}(\partial_t\alpha_t),

with suitable boundary conditions, for instance vanishing at infinity. This formula is useful conceptually but delicate when αt\alpha_t vanishes, and it does not generally produce a gradient velocity field.

Least-Square Inversion and Gradient Structure

A more robust choice, used implicitly in flow matching, optimal transport and Wasserstein gradient flows, is to select among all admissible velocities the one with smallest kinetic energy:

minv1201 ⁣Rdvt(x)2dαt(x)dtsubject totαt+div(αtvt)=0.\min_v \frac12\int_0^1\!\int_{\RR^d}\norm{v_t(x)}^2\,\d\alpha_t(x)\d t \quad \text{subject to} \quad \partial_t\alpha_t+\operatorname{div}(\alpha_t v_t)=0.
Proof

Introduce a Lagrange multiplier ϕt\phi_t for the continuity equation. The constrained problem has the formal saddle formulation

minvmaxϕ01[12Rdvt(x)2dαt(x)+Rdϕt(x)(div(αtvt)(x)+tαt(x))dx]dt.\min_v\max_\phi \int_0^1 \left[ \frac12\int_{\RR^d}\norm{v_t(x)}^2\,\d\alpha_t(x) + \int_{\RR^d}\phi_t(x) \left(\operatorname{div}(\alpha_t v_t)(x)+\partial_t\alpha_t(x)\right) \d x \right]\d t.

Integrating by parts in the divergence term gives, for each tt,

(12vt2ϕt,vt)dαt+ϕttαt.\int \left( \frac12\norm{v_t}^2-\dotp{\nabla\phi_t}{v_t} \right) \d\alpha_t + \int\phi_t\,\partial_t\alpha_t.

The pointwise minimizer in vtv_t is therefore vt=ϕtv_t=\nabla\phi_t. Substituting this into tρt+div(ρtvt)=0\partial_t\rho_t+\operatorname{div}(\rho_t v_t)=0 gives the weighted Poisson equation in (14). The inverse notation is a shorthand for solving this equation on zero-mean right-hand sides, modulo additive constants.

In general this inversion is still computationally demanding, but special choices of (αt)t(\alpha_t)_t lead to simpler formulas; this is the mechanism exploited later by flow matching.

Benamou--Brenier Dynamic Formulation of OT

The dynamic formulation identifies W2\Wass_2 with the kinetic energy of the cheapest continuity-equation path. It is the point where OT becomes a least-action principle.

Instead of assuming that a whole curve (αt)t[0,1](\alpha_t)_{t\in[0,1]} is prescribed, one fixes only the endpoints α0\alpha_0 and α1\alpha_1 and minimizes the least-square energy (12). The theorem of Benamou and Brenier states that this geodesic energy is exactly the squared Wasserstein distance Benamou & Brenier, 2000.

Proof

For the inequality “dynamic \leq static”, assume first that a Monge map TT exists and define (αt,vt)(\alpha_t,v_t) by (18). Since the Lagrangian velocity T(x)xT(x)-x is independent of tt,

01 ⁣vt2dαtdt=T(x)x2dα0(x),\int_0^1\!\int\norm{v_t}^2\,\d\alpha_t\d t = \int\norm{T(x)-x}^2\,\d\alpha_0(x),

so the dynamic cost is no larger than the static Monge cost. Without a Monge map, the same construction uses an optimal coupling π\pi: sample (X,Y)π(X,Y)\sim\pi and move along the straight path γX,Y(t)=(1t)X+tY\gamma_{X,Y}(t)=(1-t)X+tY. This path measure has action xy2dπ(x,y)\int\norm{x-y}^2\d\pi(x,y); projecting path velocities onto their conditional mean at time tt gives an admissible Eulerian velocity with no larger action, so the dynamic value is no larger than the Kantorovich value.

Conversely, for a smooth deterministic path, take the flow TtT_t defined by T˙t=vtTt\dot T_t=v_t\circ T_t and T0=IdT_0=\Id. Then αt=(Tt)α0\alpha_t=(T_t)_\sharp\alpha_0 and (T1)α0=α1(T_1)_\sharp\alpha_0=\alpha_1. Jensen’s inequality gives

T1(x)x201vt(Tt(x))2dt.\norm{T_1(x)-x}^2 \leq \int_0^1\norm{v_t(T_t(x))}^2\d t.

After integration with respect to α0\alpha_0, the Monge cost is bounded above by the dynamic action. For general finite-energy solutions of the continuity equation, the superposition principle lifts the curve to a probability measure on absolutely continuous paths; applying Jensen’s inequality pathwise gives a coupling of the endpoints whose quadratic cost is no larger than the action. Thus the Kantorovich value is bounded above by the dynamic value.

Although (17) is not jointly convex in (αt,vt)(\alpha_t,v_t), it becomes convex after replacing velocities by the momentum measure mt=vtαtm_t=v_t\alpha_t and using the perspective action. In the absolutely continuous case αt=ρtdx\alpha_t=\rho_t\,\d x and mt(x)=ρt(x)vt(x)m_t(x)=\rho_t(x)v_t(x),

W22(α0,α1)=inftρt+divmt=0ρt=0dx=α0, ρt=1dx=α101 ⁣Rdmt(x)2ρt(x)dxdt,\Wass_2^2(\alpha_0,\alpha_1) = \inf_{\substack{\partial_t\rho_t+\operatorname{div}m_t=0\\ \rho_{t=0}\d x=\alpha_0,\ \rho_{t=1}\d x=\alpha_1}} \int_0^1\!\int_{\RR^d} \frac{\norm{m_t(x)}^2}{\rho_t(x)} \d x\,\d t,

with the usual convention that the integrand is 0 when (ρt,mt)=(0,0)(\rho_t,m_t)=(0,0) and ++\infty when ρt=0\rho_t=0 but mt0m_t\neq0. For singular endpoints or curves, the same statement is interpreted with vector-valued momentum measures and the corresponding recession convention. This convex reformulation enables geodesic interpolation by convex optimization once the domain is discretized.

<IPython.core.display.Image object>

Benamou--Brenier geodesic between two sampled silhouettes. A discrete quadratic OT plan between finely subsampled cat and two-disks point clouds induces the McCann interpolation Zt=(1t)X+tYZ_t=(1-t)X+tY, which is the Lagrangian realization of the least-action solution. The left panel renders local color images of the smaller-bandwidth kernel-smoothed densities with enough padding to include the full silhouettes. The right panel overlays shortened velocity arrows centered at evenly subsampled midpoint particles Z1/2Z_{1/2}; each displayed arrow runs in data coordinates from a source-side tail to a target-side head along the matched characteristic direction YXY-X, but is not drawn as the full endpoint segment from XX to YY.

The interactive demo keeps the same Lagrangian picture: particles are matched once, then move along straight characteristics. The time and velocity scale controls separate the path αt\alpha_t from the underlying displacement field.

Interactive panel. Use the time and velocity-scale controls to follow the Benamou-Brenier geodesic as a moving density with an Eulerian velocity field.

Extensions of the Dynamic Formulation

The same variational grammar extends beyond the quadratic Wasserstein distance. One changes either the kinetic exponent, the mobility or the balance equation, while keeping a continuity-type constraint and a convex perspective action.

Dynamic Unbalanced OT

Unbalanced dynamic transport is obtained by allowing mass to be created and destroyed along the path. The continuity equation is replaced by a balance equation, and the action penalizes both spatial motion and growth. This dynamic formulation underlies the Hellinger--Kantorovich and Wasserstein--Fisher--Rao metrics Liero et al., 2016Chizat et al., 2018; its equivalence with static entropy-transport and cone formulations is developed in Liero et al., 2018Chizat et al., 2018.

A representative quadratic action is

tρt+ ⁣mt=st,01 ⁣(mt2ρt+κ2st2ρt)dxdt,\partial_t\rho_t+\nabla\!\cdot m_t=s_t, \qquad \int_0^1\!\int \left( \frac{|m_t|^2}{\rho_t} +\kappa^2\frac{s_t^2}{\rho_t} \right)\d x\,\d t,

with the same perspective convention as above. Equivalently, writing mt=ρtvtm_t=\rho_t v_t and st=ρtgts_t=\rho_t g_t, one minimizes 01(vt2+κ2gt2)dρtdt\int_0^1\int(|v_t|^2+\kappa^2g_t^2)\d\rho_t\d t under

tρt+ ⁣(ρtvt)=gtρt.\partial_t\rho_t+\nabla\!\cdot(\rho_t v_t)=g_t\rho_t.

The parameter κ\kappa fixes the relative cost of reaction and transport: changing it rescales the radial/angular balance in the associated cone metric. For measure-valued triples, the action is understood in the lower-semicontinuous perspective sense

Aκ(ρ,m,s):=(m˙2ρ˙+κ2s˙2ρ˙)dλ,(ρ˙,m˙,s˙)=(dρdλ,dmdλ,dsdλ),\mathcal A_\kappa(\rho,m,s) \eqdef \int \left( \frac{\norm{\dot m}^2}{\dot\rho} + \kappa^2\frac{\dot s^2}{\dot\rho} \right)\d\lambda, \qquad (\dot\rho,\dot m,\dot s) = \left( \frac{\d\rho}{\d\lambda}, \frac{\d m}{\d\lambda}, \frac{\d s}{\d\lambda} \right),

where λ\lambda dominates ρ\rho and the total variations of mm and ss. The value is independent of this choice. The convention is 0/0=00/0=0 and a/0=+a/0=+\infty for a>0a>0, so finite action forces both the flux and the source to be absolutely continuous with respect to the transported mass.

Proof

The cone construction turns variation of mass into radial motion and spatial transport into angular motion on C[Rd]\mathfrak C[\RR^d]. Applying the Benamou--Brenier theorem on the cone to the lifted endpoint measures gives a dynamic least-action problem on C[Rd]\mathfrak C[\RR^d] whose static value is the cone value. This is the standard static/dynamic identification for the Hellinger--Kantorovich and Wasserstein--Fisher--Rao metrics Liero et al., 2016Liero et al., 2018Chizat et al., 2018Chizat et al., 2018.

Projecting a cone curve back to the base space with weight r2r^2 produces a measure curve ρt\rho_t, a spatial flux mtm_t and a source term sts_t satisfying the balance equation. With the matching normalization of the cone metric, the cone kinetic energy decomposes exactly into the perspective action Aκ\mathcal A_\kappa in (30). Conversely, any finite-action triple (ρt,mt,st)(\rho_t,m_t,s_t) can be lifted to a cone curve whose radial velocity realizes the growth term and whose spatial velocity realizes the transport term, with the same action after relaxation. The two infima are therefore equal; lower semicontinuity gives the general finite-measure statement from the smooth positive case.

<IPython.core.display.Image object>

Balanced and unbalanced Sinkhorn-barycenter interpolations between two one-dimensional Gaussian mixtures with swapped modal masses. The balanced row conserves total mass, so excess mass from the dominant left mode must move along the line toward the dominant right target mode, producing transient mass in the middle. The unbalanced row uses KL-relaxed marginal constraints; mass can be attenuated near overrepresented modes and recreated near underrepresented modes, giving a reaction--transport interpolation closer to the Wasserstein--Fisher--Rao intuition.

The interactive demo below exposes this balance directly. A high reaction weight keeps more mass local by fading and recreating modes, while the balanced path must carry mass through space.

Interactive panel. Use the growth and time controls to compare motion with source terms in dynamic unbalanced transport.

References
  1. Dacorogna, B., & Moser, J. (1990). On a Partial Differential Equation Involving the Jacobian Determinant. Annales de l’Institut Henri Poincaré C, Analyse Non Linéaire, 7(1), 1–26.
  2. Benamou, J.-D., & Brenier, Y. (2000). A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84(3), 375–393.
  3. Dolbeault, J., Nazaret, B., & Savaré, G. (2009). A new class of transport distances between measures. Calculus of Variations and Partial Differential Equations, 34(2), 193–231.
  4. Maas, J. (2011). Gradient flows of the entropy for finite Markov chains. Journal of Functional Analysis, 261(8), 2250–2292.
  5. Mielke, A. (2013). Geodesic convexity of the relative entropy in reversible Markov chains. Calculus of Variations and Partial Differential Equations, 48(1–2), 1–31.
  6. Liero, M., Mielke, A., & Savaré, G. (2016). Optimal transport in competition with reaction: the Hellinger–Kantorovich distance and geodesic curves. SIAM Journal on Mathematical Analysis, 48(4), 2869–2911.
  7. Chizat, L., Schmitzer, B., Peyré, G., & Vialard, F.-X. (2018). An interpolating distance between optimal transport and Fisher–Rao metrics. Foundations of Computational Mathematics, 18(1), 1–44.
  8. Liero, M., Mielke, A., & Savaré, G. (2018). Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures. Inventiones Mathematicae, 211(3), 969–1117.
  9. Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F.-X. (2018). Unbalanced optimal transport: dynamic and Kantorovich formulation. Journal of Functional Analysis, 274(11), 3090–3123.