Optimal transport becomes especially powerful once distances between measures
are seen as actions of moving mass. This chapter first develops the dynamic
language: continuity equations describe admissible measure evolutions, while
the Benamou--Brenier formula identifies W2 with a least-action
principle. These ideas prepare the gradient-flow and generative-model
chapters that follow.
from pathlib import Path
import sys
from IPython.display import Image as DisplayImage
from IPython.display import display
here = Path.cwd()
myst_dir = None
for candidate in [here, here.parent, here / "myst", here.parent / "myst", here.parent.parent / "myst"]:
if (candidate / "ot4ml_web.py").exists():
myst_dir = candidate.resolve()
sys.path.insert(0, str(myst_dir))
break
if myst_dir is None:
raise RuntimeError("Could not locate myst/ot4ml_web.py")
repo_root = myst_dir.parent
thumbnails = repo_root / "notebooks-figures" / "thumbnails"
def show_book_figure(name, width=760):
display(DisplayImage(filename=str(thumbnails / f"{name}.png"), width=width))
We start with the continuity equation because it is the common language for
particles, densities and weak measure evolutions. It also makes precise which
velocity fields actually move mass.
This PDE is often called the advection equation, the continuity equation, or
Liouville’s equation when it acts on phase space. It is a classical PDE only
when αt has a smooth density. For general measures, and in particular
for empirical measures, it is understood weakly: for any smooth test function
(t,x)↦φ(t,x) compactly supported in time,
This weak equation is obtained from (3) by integration
by parts. For smooth positive densities, the classical and weak formulations
are equivalent; for particle clouds, the weak form remains meaningful.
Proof
Let φ(t,x) be a smooth test function vanishing at t=0 and t=1.
Since αt=(Tt)♯α0,
Reconstructing particles from an observed density evolution is therefore
ill-posed. A simple choice, introduced by Dacorogna and Moser
Dacorogna & Moser, 1990, imposes that the flux αtvt is a
gradient field. Formally,
with suitable boundary conditions, for instance vanishing at infinity. This
formula is useful conceptually but delicate when αt vanishes, and it
does not generally produce a gradient velocity field.
A more robust choice, used implicitly in flow matching, optimal transport and
Wasserstein gradient flows, is to select among all admissible velocities the
one with smallest kinetic energy:
The pointwise minimizer in vt is therefore vt=∇ϕt.
Substituting this into
∂tρt+div(ρtvt)=0 gives the weighted
Poisson equation in (14). The inverse notation
is a shorthand for solving this equation on zero-mean right-hand sides,
modulo additive constants.
In general this inversion is still computationally demanding, but special
choices of (αt)t lead to simpler formulas; this is the mechanism
exploited later by flow matching.
The dynamic formulation identifies W2 with the kinetic energy of the
cheapest continuity-equation path. It is the point where OT becomes a
least-action principle.
Instead of assuming that a whole curve (αt)t∈[0,1] is prescribed,
one fixes only the endpoints α0 and α1 and minimizes the
least-square energy (12). The theorem of Benamou and
Brenier states that this geodesic energy is exactly the squared Wasserstein
distance Benamou & Brenier, 2000.
Proof
For the inequality “dynamic ≤ static”, assume first that a Monge map
T exists and define (αt,vt) by (18). Since
the Lagrangian velocity T(x)−x is independent of t,
so the dynamic cost is no larger than the static Monge cost. Without a Monge
map, the same construction uses an optimal coupling π: sample
(X,Y)∼π and move along the straight path
γX,Y(t)=(1−t)X+tY. This path measure has action
∫∥x−y∥2dπ(x,y); projecting path velocities onto their
conditional mean at time t gives an admissible Eulerian velocity with no
larger action, so the dynamic value is no larger than the Kantorovich value.
Conversely, for a smooth deterministic path, take the flow Tt defined by
T˙t=vt∘Tt and T0=Id. Then
αt=(Tt)♯α0 and (T1)♯α0=α1.
Jensen’s inequality gives
After integration with respect to α0, the Monge cost is bounded above
by the dynamic action. For general finite-energy solutions of the continuity
equation, the superposition principle lifts the curve to a probability
measure on absolutely continuous paths; applying Jensen’s inequality pathwise
gives a coupling of the endpoints whose quadratic cost is no larger than the
action. Thus the Kantorovich value is bounded above by the dynamic value.
Although (17) is not jointly convex in (αt,vt),
it becomes convex after replacing velocities by the momentum measure
mt=vtαt and using the perspective action. In the absolutely
continuous case αt=ρtdx and
mt(x)=ρt(x)vt(x),
with the usual convention that the integrand is 0 when
(ρt,mt)=(0,0) and +∞ when ρt=0 but mt=0. For
singular endpoints or curves, the same statement is interpreted with
vector-valued momentum measures and the corresponding recession convention.
This convex reformulation enables geodesic interpolation by convex
optimization once the domain is discretized.
Benamou--Brenier geodesic between two sampled silhouettes. A discrete
quadratic OT plan between finely subsampled cat and two-disks point clouds
induces the McCann interpolation Zt=(1−t)X+tY, which is the Lagrangian
realization of the least-action solution. The left panel renders
local color images of the smaller-bandwidth kernel-smoothed densities with
enough padding to include the full silhouettes. The right panel overlays
shortened velocity arrows centered at evenly subsampled midpoint particles
Z1/2; each displayed arrow runs in data coordinates from a source-side
tail to a target-side head along the matched characteristic direction Y−X,
but is not drawn as the full endpoint segment from X to Y.
The interactive demo keeps the same Lagrangian picture: particles are matched once,
then move along straight characteristics. The time and velocity scale controls
separate the path αt from the underlying displacement field.
Interactive panel. Use the time and velocity-scale controls to follow the Benamou-Brenier geodesic as a moving density with an Eulerian velocity field.
The same variational grammar extends beyond the quadratic Wasserstein
distance. One changes either the kinetic exponent, the mobility or the
balance equation, while keeping a continuity-type constraint and a convex
perspective action.
Unbalanced dynamic transport is obtained by allowing mass to be created and
destroyed along the path. The continuity equation is replaced by a balance
equation, and the action penalizes both spatial motion and growth. This
dynamic formulation underlies the Hellinger--Kantorovich and
Wasserstein--Fisher--Rao metrics
Liero et al., 2016Chizat et al., 2018; its equivalence with
static entropy-transport and cone formulations is developed in
Liero et al., 2018Chizat et al., 2018.
The parameter κ fixes the relative cost of reaction and transport:
changing it rescales the radial/angular balance in the associated cone
metric. For measure-valued triples, the action is understood in the
lower-semicontinuous perspective sense
where λ dominates ρ and the total variations of m and s. The
value is independent of this choice. The convention is 0/0=0 and
a/0=+∞ for a>0, so finite action forces both the flux and the source
to be absolutely continuous with respect to the transported mass.
Proof
The cone construction turns variation of mass into radial motion and spatial
transport into angular motion on C[Rd]. Applying the
Benamou--Brenier theorem on the cone to the lifted endpoint measures gives a
dynamic least-action problem on C[Rd] whose static value is the
cone value. This is the standard static/dynamic identification for the
Hellinger--Kantorovich and Wasserstein--Fisher--Rao metrics
Liero et al., 2016Liero et al., 2018Chizat et al., 2018Chizat et al., 2018.
Projecting a cone curve back to the base space with weight r2 produces a
measure curve ρt, a spatial flux mt and a source term st
satisfying the balance equation. With the matching normalization of the cone
metric, the cone kinetic energy decomposes exactly into the perspective action
Aκ in (30). Conversely, any
finite-action triple (ρt,mt,st) can be lifted to a cone curve whose
radial velocity realizes the growth term and whose spatial velocity realizes
the transport term, with the same action after relaxation. The two infima are
therefore equal; lower semicontinuity gives the general finite-measure
statement from the smooth positive case.
show_book_figure("dynamic-unbalanced-geodesic")
Balanced and unbalanced Sinkhorn-barycenter interpolations between two
one-dimensional Gaussian mixtures with swapped modal masses. The balanced row
conserves total mass, so excess mass from the dominant left mode must move
along the line toward the dominant right target mode, producing transient mass
in the middle. The unbalanced row uses KL-relaxed marginal constraints; mass
can be attenuated near overrepresented modes and recreated near
underrepresented modes, giving a reaction--transport interpolation closer to
the Wasserstein--Fisher--Rao intuition.
The interactive demo below exposes this balance directly. A high reaction weight
keeps more mass local by fading and recreating modes, while the balanced path
must carry mass through space.
Interactive panel. Use the growth and time controls to compare motion with source terms in dynamic unbalanced transport.
Dacorogna, B., & Moser, J. (1990). On a Partial Differential Equation Involving the Jacobian Determinant. Annales de l’Institut Henri Poincaré C, Analyse Non Linéaire, 7(1), 1–26.
Benamou, J.-D., & Brenier, Y. (2000). A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84(3), 375–393.
Dolbeault, J., Nazaret, B., & Savaré, G. (2009). A new class of transport distances between measures. Calculus of Variations and Partial Differential Equations, 34(2), 193–231.
Maas, J. (2011). Gradient flows of the entropy for finite Markov chains. Journal of Functional Analysis, 261(8), 2250–2292.
Mielke, A. (2013). Geodesic convexity of the relative entropy in reversible Markov chains. Calculus of Variations and Partial Differential Equations, 48(1–2), 1–31.
Liero, M., Mielke, A., & Savaré, G. (2016). Optimal transport in competition with reaction: the Hellinger–Kantorovich distance and geodesic curves. SIAM Journal on Mathematical Analysis, 48(4), 2869–2911.
Chizat, L., Schmitzer, B., Peyré, G., & Vialard, F.-X. (2018). An interpolating distance between optimal transport and Fisher–Rao metrics. Foundations of Computational Mathematics, 18(1), 1–44.
Liero, M., Mielke, A., & Savaré, G. (2018). Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures. Inventiones Mathematicae, 211(3), 969–1117.