Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Beyond Comparing Measures

The last group leaves the setting of scalar measures on a common ambient space. Vector- and matrix-valued OT transports mass with internal degrees of freedom, Gromov--Wasserstein compares metric-measure spaces without a prescribed correspondence, and quantum OT replaces scalar couplings by positive operators. In each case, the transport plan also has to encode structure carried by the support, the fibers, or the non-commutative state space.

Vector and Matrix-Valued Measures

Scalar OT transports a nonnegative density. In imaging, color processing, spectral analysis, diffusion tensor imaging and quantum-inspired models, the object attached to a point can instead have several nonnegative components or a positive semidefinite matrix. The first step beyond scalar OT is the positive vector-valued case, where the fiber remains linear and commutative but the channels may interact.

Positive Vector-Valued Measures

This models multi-channel densities such as color histograms, spectral bins or several species transported on the same domain. In a conservative model the mass of each channel is preserved, so one assumes μ0k(X)=μ1k(X)\mu_0^k(\X)=\mu_1^k(\X) for every kk. The natural vector-valued extension therefore starts from the positive cone R+m\RR_+^m.

To keep the notation readable, first assume that the endpoints and the curve have densities. The direct analogue of Benamou--Brenier fixes a vector density ut(x)R+mu_t(x)\in\RR_+^m and a spatial flux Vt(x)=(Vt,1,,Vt,d)(Rm)dV_t(x)=(V_{t,1},\ldots,V_{t,d})\in(\RR^m)^d, where Vt,kV_{t,\ell}^k is the momentum of channel kk in spatial direction \ell. The conservative vector transport cost associated with an action density Φ\Phi is

WΦ2(μ0,μ1):=infu,V01 ⁣XΦ(ut(x),Vt(x))dxdt\mathcal W_{\Phi}^2(\mu_0,\mu_1) \eqdef \inf_{u,V} \int_0^1\!\int_\X \Phi(u_t(x),V_t(x))\,\d x\,\d t

subject to the endpoint constraints u0dx=μ0u_0\d x=\mu_0, u1dx=μ1u_1\d x=\mu_1 and the componentwise continuity equation

tut+xVt=0,(xVt)k==1dxVt,k.\partial_t u_t+\nabla_x\cdot V_t=0, \qquad (\nabla_x\cdot V_t)^k = \sum_{\ell=1}^d\partial_{x_\ell}V_{t,\ell}^k.

Thus each component satisfies its own continuity equation, but the cost may still couple the components. Singular curves are handled as in scalar dynamic OT by replacing densities and fluxes by measures and using the lower semicontinuous perspective recession convention.

A simple quadratic family is obtained from a mobility matrix M(u)S+m\mathsf M(u)\in\mathbb S_+^m:

ΦM(u,V)==1dVM(u)V,\Phi_{\mathsf M}(u,V) = \sum_{\ell=1}^d V_\ell^\top \mathsf M(u)^\dagger V_\ell,

with the usual convention that the value is finite only when each VV_\ell belongs to the range of M(u)\mathsf M(u). One chooses M\mathsf M so that this matrix perspective is convex and one-homogeneous in (u,V)(u,V); this holds for the linear positive mobilities below. For m=1m=1 and M(u)=u\mathsf M(u)=u, one recovers exactly the scalar Benamou--Brenier action. For

Mdiag(u)=diag(u1,,um),\mathsf M_{\mathrm{diag}}(u)=\operatorname{diag}(u_1,\ldots,u_m),

the channels move independently. Non-diagonal mobilities are the simplest way to couple the coordinates while keeping the same componentwise conservation law. For instance, with q=m1/2(1,,1)q=m^{-1/2}(1,\ldots,1) and κ0\kappa\geq0,

Mκ(u)=diag(u)+κ(k=1muk)qq\mathsf M_\kappa(u) = \operatorname{diag}(u) + \kappa\left(\sum_{k=1}^m u_k\right)qq^\top

increases the mobility in the common channel direction qq while leaving transverse directions controlled by the diagonal part. The local cost of moving one component can therefore depend on the densities and momenta of the other components, even though each component mass remains conserved.

Proof

For Mdiag(u)=diag(u1,,um)\mathsf M_{\mathrm{diag}}(u)=\operatorname{diag}(u_1,\ldots,u_m), the action separates as

=1dVMdiag(u)V=k=1mVk2uk,\sum_{\ell=1}^d V_\ell^\top\mathsf M_{\mathrm{diag}}(u)^\dagger V_\ell = \sum_{k=1}^m\frac{|V^k|^2}{u^k},

where Vk=(V1k,,Vdk)V^k=(V_1^k,\ldots,V_d^k) is the spatial momentum of channel kk. The constraint (3) also separates into tuk+Vk=0\partial_t u^k+\nabla\cdot V^k=0. The minimization therefore splits into mm independent scalar Benamou--Brenier problems. If mk>0m_k>0, normalizing ρtk=utk/mk\rho_t^k=u_t^k/m_k and ptk=Vtk/mkp_t^k=V_t^k/m_k factors the channel action as mkptk2/ρtkm_k\int |p_t^k|^2/\rho_t^k, hence the scalar value is the displayed mkW22m_k\Wass_2^2 term. If mk=0m_k=0, nonnegativity and conservation force the whole channel to vanish. Summing over all channels proves the claim.

The conservative positive-cone model above is the basic extension of Benamou--Brenier. Adding a source term tu+V=S\partial_t u+\nabla\cdot V=S and a convex perspective penalty in SS gives unbalanced or reaction--transport variants Maas et al., 2015Maas et al., 2016Dolbeault et al., 2009Mielke, 2013. These generalized transport models include dissipation and density modulation. The figure below contrasts the exact diagonal case κ=0\kappa=0, where each positive channel is transported by its quantile map, with a large-κ\kappa illustrative common-mode interpolation in which the channels move more coherently. The endpoints are two-mode mixtures: at each spatial mode the two channels have Gaussian profiles with the same center but different amplitudes.

<IPython.core.display.Image object>

One-dimensional positive R+2\RR_+^2-valued transport displayed by arrow glyphs at eight time levels. Each endpoint is a mixture of two localized Gaussian modes, and, inside each mode, both channel profiles have the same center. Each arrow is proportional to the local fiber value (ut1(x),ut2(x))(u_t^1(x),u_t^2(x)), and time runs vertically from the red source to the blue target. Left: for κ=0\kappa=0, the diagonal mobility gives two independent scalar quantile geodesics. Right: a large-κ\kappa common-mode interpolation bends the display toward q=21/2(1,1)q=2^{-1/2}(1,1), illustrating the effect of a mobility that favors coherent channel motion while keeping the same componentwise continuity equation.

The interactive demo keeps the same glyph idea and lets the coupling strength bend the fibers toward a common channel direction.

Interactive panel. Use the coupling and mixture controls to see how vector-valued mass transports both location and channel composition.

Positive Matrix-Valued Measures

The next simplest fiber is the positive matrix cone. This is the simplest tensor-valued model beyond vectors: the diagonal entries behave like positive channels, while the eigenvectors encode local orientations.

If A\mathcal A has density A(x)0A(x)\succeq0, then trA(x)\operatorname{tr}A(x) is the scalar amount of mass at xx, while, wherever trA(x)>0\operatorname{tr}A(x)>0, the normalized matrix A(x)/trA(x)A(x)/\operatorname{tr}A(x) records an internal covariance or orientation. This is the matrix analogue of the positive vector case: diagonal matrices encode nonnegative vector components, and non-diagonal matrices add a local eigenbasis.

The conservative Benamou--Brenier model fixes a matrix density At(x)S+mA_t(x)\in\mathbb S_+^m and symmetric matrix fluxes Pt(x)=(Pt,1,,Pt,d)(Sm)dP_t(x)=(P_{t,1},\ldots,P_{t,d})\in(\mathbb S^m)^d. With no flux through the boundary of X\X, the full matrix mass XAt(x)dx\int_\X A_t(x)\d x is conserved, so the endpoints must have the same total matrix. The model minimizes the matrix-perspective action

Wmat2(A0,A1):=infA,P01 ⁣X=1dtr ⁣(Pt,AtPt,)dxdt\mathcal W_{\mathrm{mat}}^2(\mathcal A_0,\mathcal A_1) \eqdef \inf_{A,P} \int_0^1\!\int_\X \sum_{\ell=1}^d \operatorname{tr}\!\left(P_{t,\ell}^{\top} A_t^\dagger P_{t,\ell}\right) \d x\,\d t

subject to A0dx=A0A_0\d x=\mathcal A_0, A1dx=A1A_1\d x=\mathcal A_1 and to the matrix-valued continuity equation

tAt+xPt=0,xPt==1dxPt,.\partial_t A_t+\nabla_x\cdot P_t=0, \qquad \nabla_x\cdot P_t = \sum_{\ell=1}^d\partial_{x_\ell}P_{t,\ell}.

Here AA^\dagger denotes the Moore--Penrose inverse, with the usual perspective convention: the action is finite only when the columns of each Pt,P_{t,\ell} belong to the range of AtA_t. The map (A,P)tr(PAP)(A,P)\mapsto\operatorname{tr}(P^\top A^\dagger P) is the matrix fractional function and is jointly convex on A0A\succeq0. This gives the simplest non-trivial matrix-valued transport model: spatial motion is conservative, but the fiber carries orientation through the eigenvectors of At(x)A_t(x).

Proof

The continuity equation (11) is diagonal entry by diagonal entry and gives tuk+Vk=0\partial_t u^k+\nabla\cdot V^k=0. Moreover,

=1dtr ⁣(Pt,AtPt,)=k=1mVtk2utk,\sum_{\ell=1}^d \operatorname{tr}\!\left(P_{t,\ell}^{\top}A_t^\dagger P_{t,\ell}\right) = \sum_{k=1}^m\frac{|V_t^k|^2}{u_t^k},

with the same scalar perspective convention as before. The admissible set and the action are therefore exactly those of the diagonal vector model.

The restriction to a fixed diagonal basis gives eigenvalue transport; it should be read as a commuting submodel, not as a claim that non-diagonal excursions can never change the unrestricted value. The genuinely matrix-valued case starts when the eigenspaces vary with xx or along the interpolation, so that the transported object carries both mass and orientation. Static matrix-valued Monge--Kantorovich problems and dual test-function metrics were developed in Ning & Georgiou, 2014Jiang et al., 2012Ning et al., 2015; dynamic versions and related non-commutative geometries appear in Chen et al., 2016Chen et al., 2020Carlen & Maas, 2014Peyré et al., 2019. The figure below shows the analogous independent/coupled contrast for positive 2×22\times2 matrix fibers, using two localized matrix modes whose eigenvalue profiles share a common center at each mode.

<IPython.core.display.Image object>

Positive 2×22\times2 matrix-valued transport on a one-dimensional base. Each endpoint is a mixture of two localized matrix modes; within one mode, both eigenvalue profiles are Gaussian bumps with the same center. Each ellipse is the glyph of a positive semidefinite matrix At(x)A_t(x), with axes given by eigenvectors and eigenvalues. Left: the matrices are diagonal in a fixed basis, giving the commuting tensor analogue of independent vector channels. Right: a coupled illustrative interpolation bends packet motion toward the trace-density transport and uses non-commuting eigendirections; the superposition remains positive semidefinite and produces spatially varying orientations.

Interactive panel. Use the coupling and rotation controls to compare matrix-valued transport of anisotropic local structure.

Gromov--Wasserstein

Gromov--Wasserstein compares spaces through their internal distance structures rather than through a fixed ambient ground cost. This is the right extension for graphs, shapes and point clouds whose points are not pre-aligned.

Discrete Formulation

Optimal transport needs a ground cost CC to compare histograms (a,b)(a,b), and thus cannot be used directly if the histograms are not defined on the same underlying space, or if one cannot pre-register these spaces to define a ground cost. Instead, assume that two matrices DRn×nD\in\RR^{n\times n} and DRm×mD'\in\RR^{m\times m} represent relationships between points. A typical scenario is when these matrices are powers of distance matrices. The discrete Gromov--Wasserstein problem reads

GW((a,D),(b,D))p:=minPU(a,b)i,j,i,jΔ(Di,i,Dj,j)pPi,jPi,j,\operatorname{GW}((a,D),(b,D'))^p \eqdef \min_{P\in\mathbf U(a,b)} \sum_{i,j,i',j'} \Delta(D_{i,i'},D'_{j,j'})^p P_{i,j}P_{i',j'},

where p1p\geq1 and Δ\Delta is usually Δ(u,v)=uv\Delta(u,v)=|u-v|. This is a non-convex quadratic problem over the transport polytope. In the uniform case with m=nm=n and PP constrained to be a permutation matrix, it becomes a Quadratic Assignment Problem, already NP-hard in full generality Loiola et al., 2007. The relaxed coupling formulation can therefore be read as a soft graph-matching model Lyzinski et al., 2016.

<IPython.core.display.Image object>

Gromov--Wasserstein correspondences under increasing deformation. The red and blue point clouds are not compared through an ambient Euclidean cross-cost; instead, the GW coupling compares their internal pairwise distances. A perfectly isometric copy admits a clean structural match, while mild and deliberately stronger deformations progressively bend the correspondence.

The interactive demo uses a fixed structural correspondence and lets the deformation change the pairwise-distance residual. This isolates the quantity minimized by the GW objective.

Interactive panel. Use the deformation and point controls to inspect correspondences when only within-space distances are meaningful.

When D,DD,D' are genuine distance matrices, the construction below defines a distance between metric spaces equipped with a probability distribution, up to measure-preserving isometries Mémoli, 2011Sturm, 2012Schmitzer & Schnörr, 2013. The same construction also explains why GW satisfies the triangle inequality after quotienting by isometries, and its relation to Hausdorff and Gromov--Hausdorff distances is discussed at the end of the section.

General Setting

The natural setting is that of Polish metric spaces; compactness is often assumed when one wants existence and a clean metric statement without adding tightness hypotheses.

For metric-measure spaces X=(X,dX,α)\mathbb X=(\X,d_\X,\alpha) and Y=(Y,dY,β)\mathbb Y=(\Y,d_\Y,\beta), define

GW(X,Y)p:=minπΠ(α,β)X2×Y2Δ(dX(x,x),dY(y,y))pdπ(x,y)dπ(x,y).\operatorname{GW}(\mathbb X,\mathbb Y)^p \eqdef \min_{\pi\in\Couplings(\alpha,\beta)} \int_{\X^2\times\Y^2} \Delta(d_\X(x,x'),d_\Y(y,y'))^p \d\pi(x,y)\d\pi(x',y').
Proof

Let π\pi be any coupling between α\alpha and β\beta. For two independent pairs (X,Y),(X,Y)π(X,Y),(X',Y')\sim\pi, the reverse triangle inequality gives

XXYYXY+XY.\left|\norm{X-X'}-\norm{Y-Y'}\right| \leq \norm{X-Y}+\norm{X'-Y'}.

Taking the LpL^p norm and using Minkowski gives a bound by 2(xypdπ)1/p2(\int\norm{x-y}^p\d\pi)^{1/p}. Optimizing over π\pi proves the claim.

Proof

If GW(X,Y)=0\operatorname{GW}(\mathbb X,\mathbb Y)=0 and π\pi is optimal, then dX(x,x)=dY(y,y)d_\X(x,x')=d_\Y(y,y') holds ππ\pi\otimes\pi-almost everywhere. By continuity, this equality holds on supp(π)2\operatorname{supp}(\pi)^2. Both X\mathbb X and Y\mathbb Y are isometric to the support space (supp(π),dπ,π)(\operatorname{supp}(\pi),d_\pi,\pi), where

dπ((x,y),(x,y)):=12dX(x,x)+12dY(y,y).d_\pi((x,y),(x',y')) \eqdef \frac12d_\X(x,x')+\frac12d_\Y(y,y').

The first projection is measure-preserving and distance-preserving on supp(π)\operatorname{supp}(\pi), and compactness gives surjectivity onto supp(α)\operatorname{supp}(\alpha); the same argument applies to the second projection.

For the triangle inequality, glue optimal couplings between X,Y\mathbb X,\mathbb Y and between Y,Z\mathbb Y,\mathbb Z. The projected X×Z\X\times\Z marginal is feasible, and the pointwise triangle inequality together with Minkowski gives

GW(X,Z)GW(X,Y)+GW(Y,Z).\operatorname{GW}(\mathbb X,\mathbb Z) \leq \operatorname{GW}(\mathbb X,\mathbb Y) + \operatorname{GW}(\mathbb Y,\mathbb Z).

Symmetry and non-negativity are immediate.

The metric structure also gives geodesics. Sturm’s construction allows one to speak about interpolation, barycenters and gradient flows directly on the space of metric-measure spaces, even though the intermediate space lives on a product support and is therefore expensive numerically Sturm, 2012.

Proof

For s<ts<t, couple Xs\mathbb X_s and Xt\mathbb X_t by the diagonal coupling on Z\mathcal Z. The distance difference is exactly

dt(z,z)ds(z,z)=(ts)(dX1(x1,x1)dX0(x0,x0)),d_t(z,z')-d_s(z,z') = (t-s)\big(d_{\X_1}(x_1,x'_1)-d_{\X_0}(x_0,x'_0)\big),

so GW(Xs,Xt)(ts)D\operatorname{GW}(\mathbb X_s,\mathbb X_t)\leq(t-s)D, where D=GW(X0,X1)D=\operatorname{GW}(\mathbb X_0,\mathbb X_1). Applying the triangle inequality to X0,Xs,Xt,X1\mathbb X_0,\mathbb X_s,\mathbb X_t,\mathbb X_1 gives the reverse bound.

<IPython.core.display.Image object>

Local distortion in a mildly non-isometric GW match. The left panel colors transport segments by the average residual induced by the displayed hard correspondence. The right panel shows the pairwise-distance residual matrix dX(xi,xi)dY(yσ(i),yσ(i))|d_\X(x_i,x_{i'})-d_\Y(y_{\sigma(i)},y_{\sigma(i')})|, with darker entries marking larger local distortion. This matrix is the local contribution minimized by the discrete GW objective for the displayed correspondence.

Interactive panel. Use the deformation and shift controls to see where a Gromov-Wasserstein correspondence preserves or distorts pairwise distances.

Proof

Fix any πΠ(α,β)\pi\in\Couplings(\alpha,\beta). It induces a coupling (x,y)(αx,βy)(x,y)\mapsto(\alpha_x,\beta_y) between the profile laws, hence

Wp(EX,EY)pX×YWp(αx,βy)pdπ(x,y).\Wass_p(\mathsf E_\mathbb X,\mathsf E_\mathbb Y)^p \leq \int_{\X\times\Y} \Wass_p(\alpha_x,\beta_y)^p\,\d\pi(x,y).

For fixed (x,y)(x,y), the map (x,y)(dX(x,x),dY(y,y))(x',y')\mapsto(d_\X(x,x'),d_\Y(y,y')) pushes the same coupling π\pi to a coupling between αx\alpha_x and βy\beta_y. Integrating the resulting one-dimensional OT bound over (x,y)(x,y) gives the GW objective for π\pi. Taking the infimum over π\pi proves the claim.

This lower bound is useful computationally because the profile cost matrix Cij=Wp(αxi,βyj)pC_{ij}=\Wass_p(\alpha_{x_i},\beta_{y_j})^p is an ordinary OT cost between points. Solving this easier OT problem gives a geometry-aware initialization for the non-convex GW iterations; it is the same idea used above as a useful initialization principle for a non-convex solver.

Entropic Regularization and Fused GW

For the common squared distortion Δ(u,v)2=(uv)2\Delta(u,v)^2=(u-v)^2, one often computes a stationary point of the entropic relaxation

minPU(a,b)ED,D(P)ϵH(P).\min_{P\in\mathbf U(a,b)} \mathcal E_{D,D'}(P)-\epsilon H(P).

Although the objective is non-convex, successive linearizations lead to a practical mirror-descent scheme Peyré et al., 2016. Up to an irrelevant global factor in the gradient, one alternates

P(+1):=minPU(a,b)P,C()ϵH(P),P^{(\ell+1)} \eqdef \min_{P\in\mathbf U(a,b)} \langle P,C^{(\ell)}\rangle-\epsilon H(P),

with

C():=D2a1m+1n(D2b)2DP()D.C^{(\ell)} \eqdef D^{\odot2}a\,\mathbf 1_m^\top + \mathbf 1_n(D'^{\odot2}b)^\top - 2D\,P^{(\ell)}\,D'^\top.

Each update is an ordinary entropic OT problem and can therefore be solved with Sinkhorn iterations. This improves scalability and smooths the landscape, but it does not remove the non-convexity of the GW objective. This is the standard entropic GW solver used to compute soft maps between domains.

Fused Gromov--Wasserstein augments the structural term with a feature transport cost Vayer et al., 2019. In the discrete case, given a cross-feature cost MRn×mM\in\RR^{n\times m} and a parameter λ[0,1]\lambda\in[0,1], one minimizes

FGWλ,p((a,D),(b,D))p:=minPU(a,b)(1λ)i,jMijPij+λi,j,i,jΔ(Dii,Djj)pPijPij.\operatorname{FGW}_{\lambda,p}((a,D),(b,D'))^p \eqdef \min_{P\in\mathbf U(a,b)} (1-\lambda)\sum_{i,j}M_{ij}P_{ij} + \lambda \sum_{i,j,i',j'} \Delta(D_{ii'},D'_{jj'})^pP_{ij}P_{i'j'}.

The endpoints λ=0\lambda=0 and λ=1\lambda=1 recover feature-only OT and pure GW respectively; intermediate values trade attribute matching against structural matching. The first term compares node attributes in the usual OT sense, and the second compares intrinsic geometry; this is useful when two spaces have both distances and features, and the two sources of information may disagree.

<IPython.core.display.Image object>

Feature information and intrinsic geometry in fused Gromov--Wasserstein. Small inner disks encode binary node features. Feature-only OT follows the attributes even when this crosses the shape structure, pure GW follows the intrinsic ordering, and fused GW balances the feature term with the pairwise-distance distortion.

Interactive panel. Use the geometry-weight and feature-conflict controls to balance structural matching against feature agreement.

Hausdorff and Gromov--Hausdorff Viewpoints

If A,BA,B are compact subsets of a common metric space (Z,dZ)(\mathcal Z,d_\mathcal Z), their Hausdorff distance is

dHZ(A,B)=max{supaAinfbBdZ(a,b),supbBinfaAdZ(a,b)}.d_{\mathrm H}^{\mathcal Z}(A,B) = \max\left\{ \sup_{a\in A}\inf_{b\in B}d_\mathcal Z(a,b), \sup_{b\in B}\inf_{a\in A}d_\mathcal Z(a,b) \right\}.

The Gromov--Hausdorff distance removes the common ambient space by minimizing this quantity over all isometric embeddings into a third space:

dGH(X,Y)=infZ,ϕ,ψdHZ(ϕ(X),ψ(Y)).d_{\mathrm{GH}}(\X,\Y) = \inf_{\mathcal Z,\phi,\psi} d_{\mathrm H}^{\mathcal Z}(\phi(\X),\psi(\Y)).

Equivalently, it is half the minimal distortion of a correspondence between X\X and Y\Y Gromov, 2001Mémoli, 2007. This is a worst-case set distance: every point must be matched with small distortion. Gromov--Wasserstein replaces correspondences by probability couplings and worst-case distortion by averaged distortion. It is therefore better adapted to noisy sampled shapes and weighted graphs, but it can ignore small sets of mass that would dominate the Hausdorff distance.

Quantum Optimal Transport

Quantum optimal transport replaces probability vectors by density matrices and scalar couplings by positive operators on a tensor product space. This is the right language when the transported objects are matrix-valued signals, covariance-like descriptors or quantum states, and it exposes a precise bridge between OT, non-commutative entropy and operator scaling Ning & Georgiou, 2014Chen et al., 2016Chen et al., 2020Peyré et al., 2019Golse et al., 2019Chakrabarti et al., 2019.

Finite-Dimensional States and Couplings

A joint quantum state between Cn\mathbb C^n and Cm\mathbb C^m is a matrix THnm+T\in\mathbb H_{nm}^+ acting on CnCm\mathbb C^n\otimes\mathbb C^m. Its marginals are the partial traces, defined by duality through

tr(FTrBT)=tr((FIm)T),tr(GTrAT)=tr((InG)T).\operatorname{tr}(F\,\operatorname{Tr}_B T) = \operatorname{tr}((F\otimes I_m)T), \qquad \operatorname{tr}(G\,\operatorname{Tr}_A T) = \operatorname{tr}((I_n\otimes G)T).

for all FHnF\in\mathbb H_n and GHmG\in\mathbb H_m. Thus TrB(T)Hn+\operatorname{Tr}_B(T)\in\mathbb H_n^+ and TrA(T)Hm+\operatorname{Tr}_A(T)\in\mathbb H_m^+ play exactly the role of the two marginals of a classical coupling.

The feasible set is never empty, since ABA\otimes B has marginals AA and BB.

Proof

Introduce Hermitian Lagrange multipliers FF and GG for the two marginal constraints. Using (40), the Lagrangian is

tr(FA)+tr(GB)+tr ⁣((CFImInG)T).\operatorname{tr}(FA)+\operatorname{tr}(GB) + \operatorname{tr}\!\left((C-F\otimes I_m-I_n\otimes G)T\right).

Minimizing over T0T\succeq0 gives a finite lower bound if and only if CFImInG0C-F\otimes I_m-I_n\otimes G\succeq0, in which case the infimum in TT is 0. When A,B0A,B\succ0, the coupling ABA\otimes B is strictly feasible, so Slater’s theorem gives equality of primal and dual values. The semidefinite case follows by restricting to supports or by approximation.

The dual potentials have the usual scalar gauge freedom: replacing (F,G)(F,G) by (F+tIn,GtIm)(F+tI_n,G-tI_m) leaves both the constraint and the value unchanged because tr(A)=tr(B)=1\operatorname{tr}(A)=\operatorname{tr}(B)=1.

Entropic Regularization and Bregman Iterations

For ϵ>0\epsilon>0 define

QOTCϵ(A,B)=minT0{tr(CT)+ϵH(T):TrB(T)=A, TrA(T)=B}.\operatorname{QOT}_C^\epsilon(A,B) = \min_{T\succeq0} \left\{ \operatorname{tr}(CT)+\epsilon H(T): \operatorname{Tr}_B(T)=A,\ \operatorname{Tr}_A(T)=B \right\}.

This is the non-commutative analogue of entropic OT: the Shannon entropy of a coupling is replaced by the trace entropy of a density matrix Peyré et al., 2019Chakrabarti et al., 2019.

Proof

The feasible set is compact and nonempty, and it contains the positive definite point ABA\otimes B. The trace entropy is strictly convex on positive semidefinite matrices, hence the regularized primal has a unique minimizer. Slater’s condition justifies the Lagrange dual computation. The Fenchel identity

supT0tr(YT)ϵH(T)=ϵtrexp(Y/ϵ)\sup_{T\succeq0} \operatorname{tr}(YT)-\epsilon H(T) = \epsilon\,\operatorname{tr}\exp(Y/\epsilon)

is the matrix analogue of the scalar exponential conjugacy. Applying it to the Lagrangian with Y=FIm+InGCY=F\otimes I_m+I_n\otimes G-C gives (46), and the stationarity condition gives (47); differentiating the dual objective with respect to FF and GG yields the two marginal equations.

Writing K=exp(C/ϵ)K=\exp(-C/\epsilon), the objective differs by a constant from ϵ\epsilon times the quantum KL divergence

DH(TK)=tr ⁣(T(logTlogK)T+K).D_H(T\mid K) = \operatorname{tr}\!\left( T(\log T-\log K)-T+K \right).

The exact quantum analogue of Sinkhorn is an implicit alternating Bregman projection scheme onto the affine marginal sets

MA={T0:TrB(T)=A},MB={T0:TrA(T)=B}.\mathcal M_A=\{T\succeq0:\operatorname{Tr}_B(T)=A\}, \qquad \mathcal M_B=\{T\succeq0:\operatorname{Tr}_A(T)=B\}.
Proof

Since logK=C/ϵ\log K=-C/\epsilon,

tr(CT)+ϵH(T)=ϵDH(TK)ϵtr(K).\operatorname{tr}(CT)+\epsilon H(T) = \epsilon D_H(T\mid K)-\epsilon\operatorname{tr}(K).

For the projection of a positive definite matrix SS onto MA\mathcal M_A, the affine set contains the positive definite point AIm/mA\otimes I_m/m. The entropy derivative is singular at the boundary, so the projection lies in the interior of the positive cone and the first variation has the form

logTlogSΛIm=0,\log T-\log S-\Lambda\otimes I_m=0,

for a Hermitian multiplier Λ\Lambda. Hence T=exp(logS+ΛIm)T=\exp(\log S+\Lambda\otimes I_m). If S=Te(F,G)S=T_e(F,G), this is again Te(F+ϵΛ,G)T_e(F+\epsilon\Lambda,G); the multiplier is fixed by the marginal equation. The same argument applies to MB\mathcal M_B. Finally, the first-order optimality condition for maximizing (46) over one block is exactly the corresponding marginal equation, so the Bregman and block-dual views coincide.

In the diagonal case this proposition gives the usual multiplicative Sinkhorn updates. In the non-commutative case, however, the exact block equations

TrBTe(F,G)=A,TrATe(F,G)=B\operatorname{Tr}_B T_e(F,G)=A, \qquad \operatorname{Tr}_A T_e(F,G)=B

do not admit scalar division formulas, because the exponential of FIm+InGCF\otimes I_m+I_n\otimes G-C cannot be separated unless the local potential commutes with the cost.

Gurvits Scaling and Quantum Sinkhorn

The algorithm often called quantum Sinkhorn comes from the operator-scaling literature of Gurvits and subsequent developments Gurvits, 2003Gurvits, 2004Georgiou & Pavon, 2015Garg & Oliveira, 2018. It replaces the true Gibbs coupling (47) by the symmetric factorization

Ts(F,G)=exp ⁣(Z2ϵ)exp(C/ϵ)exp ⁣(Z2ϵ)=(UV)K(UV),Z=FIm+InG,T_s(F,G) = \exp\!\left(\frac{Z}{2\epsilon}\right) \exp(-C/\epsilon) \exp\!\left(\frac{Z}{2\epsilon}\right) = (U\otimes V)K(U\otimes V), \qquad Z=F\otimes I_m+I_n\otimes G,

where U=exp(F/(2ϵ))U=\exp(F/(2\epsilon)), V=exp(G/(2ϵ))V=\exp(G/(2\epsilon)) and K=exp(C/ϵ)K=\exp(-C/\epsilon). If [Z,C]=0[Z,C]=0, then Ts(F,G)=Te(F,G)T_s(F,G)=T_e(F,G); otherwise this is a Strang-type symmetric surrogate.

Fix a Choi convention and let K:HmHn\mathcal K:\mathbb H_m\to\mathbb H_n be the completely positive map represented by the positive Choi matrix KK; let K\mathcal K^\star be its Hilbert--Schmidt adjoint. Up to the transpose dictated by the chosen Choi convention, the marginal equations for the symmetric coupling take the operator-scaling form

UK(V2)U=A,VK(U2)V=B,U\,\mathcal K(V^2)\,U=A, \qquad V\,\mathcal K^\star(U^2)\,V=B,

and can be enforced by the congruence normalizations

RV=K(V2),URV1/2(RV1/2ARV1/2)1/2RV1/2,SU=K(U2),VSU1/2(SU1/2BSU1/2)1/2SU1/2.\begin{aligned} R_V&=\mathcal K(V^2), & U&\leftarrow R_V^{-1/2} \left(R_V^{1/2} A R_V^{1/2}\right)^{1/2} R_V^{-1/2}, \\ S_U&=\mathcal K^\star(U^2), & V&\leftarrow S_U^{-1/2} \left(S_U^{1/2} B S_U^{1/2}\right)^{1/2} S_U^{-1/2}. \end{aligned}

These inverse square roots are well-defined when K0K\succ0 and U,V,A,B0U,V,A,B\succ0. This is Gurvits/operator scaling with prescribed targets; when all matrices are diagonal it reduces to classical Sinkhorn scaling, and when the targets are proportional to identities it matches the usual bistochastic operator-scaling normalization, up to the conventional trace normalization.

References
  1. Maas, J., Rumpf, M., Schönlieb, C., & Simon, S. (2015). A generalized model for optimal transport of images including dissipation and density modulation. ESAIM: Mathematical Modelling and Numerical Analysis, 49(6), 1745–1769.
  2. Maas, J., Rumpf, M., & Simon, S. (2016). Generalized optimal transport with singular sources. arXiv Preprint arXiv:1607.01186.
  3. Dolbeault, J., Nazaret, B., & Savaré, G. (2009). A new class of transport distances between measures. Calculus of Variations and Partial Differential Equations, 34(2), 193–231.
  4. Mielke, A. (2013). Geodesic convexity of the relative entropy in reversible Markov chains. Calculus of Variations and Partial Differential Equations, 48(1–2), 1–31.
  5. Ning, L., & Georgiou, T. T. (2014). Metrics for matrix-valued measures via test functions. 53rd IEEE Conference on Decision and Control, 2642–2647.
  6. Jiang, X., Ning, L., & Georgiou, T. T. (2012). Distances and Riemannian metrics for multivariate spectral densities. IEEE Transactions on Automatic Control, 57(7), 1723–1735.
  7. Ning, L., Georgiou, T. T., & Tannenbaum, A. (2015). On matrix-valued Monge–Kantorovich optimal mass transport. IEEE Transactions on Automatic Control, 60(2), 373–382. 10.1109/TAC.2014.2350171
  8. Chen, Y., Georgiou, T. T., & Tannenbaum, A. (2016). Matrix optimal mass transport: a quantum mechanical approach. arXiv Preprint arXiv:1610.03041.
  9. Chen, Y., Gangbo, W., Georgiou, T. T., & Tannenbaum, A. (2020). On the matrix Monge-Kantorovich problem. European Journal of Applied Mathematics, 31(4), 574–600. 10.1017/S0956792519000172
  10. Carlen, E. A., & Maas, J. (2014). An analog of the 2-Wasserstein metric in non-commutative probability under which the fermionic Fokker–Planck equation is gradient flow for the entropy. Communications in Mathematical Physics, 331(3), 887–926.
  11. Peyré, G., Chizat, L., Vialard, F.-X., & Solomon, J. (2019). Quantum entropic regularization of matrix-valued optimal transport. European Journal of Applied Mathematics, 30(6), 1079–1102. 10.1017/S0956792517000274
  12. Loiola, E. M., de Abreu, N. M. M., Boaventura-Netto, P. O., Hahn, P., & Querido, T. (2007). A survey for the quadratic assignment problem. European Journal of Operational Research, 176(2), 657–690. 10.1016/j.ejor.2005.09.032
  13. Lyzinski, V., Fishkind, D. E., Fiori, M., Vogelstein, J. T., Priebe, C. E., & Sapiro, G. (2016). Graph matching: relax at your own risk. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 60–73.
  14. Mémoli, F. (2011). Gromov–Wasserstein distances and the metric approach to object matching. Foundations of Computational Mathematics, 11(4), 417–487.
  15. Sturm, K.-T. (2012). The space of spaces: curvature bounds and gradient flows on the space of metric measure spaces (Preprint 1208.0434). arXiv.