Semi-discrete and Wasserstein-1

This chapter develops three computational consequences of duality. Eliminating one potential gives the semi-dual; for discrete measures, this viewpoint leads to auction algorithms, while a continuous source and discrete target lead to Laguerre-cell geometry. The final part specializes duality to $\Wass_1$ , where Lipschitz functions and flow fields replace convex potentials. The material connects auction and network-flow methods Bertsekas, 1992Bertsekas & Eckstein, 1988, computational geometry Aurenhammer et al., 1998Mérigot, 2011Mérigot, 2013, and the Kantorovich--Rubinstein and Beckmann formulations Kantorovich & Rubinstein, 1958Beckmann, 1952.

from pathlib import Path
import sys

from IPython.display import Image as DisplayImage
from IPython.display import display

here = Path.cwd()
myst_dir = None
for candidate in [here, here.parent, here / "myst", here.parent / "myst", here.parent.parent / "myst"]:
    if (candidate / "ot4ml_web.py").exists():
        myst_dir = candidate.resolve()
        sys.path.insert(0, str(myst_dir))
        break

if myst_dir is None:
    raise RuntimeError("Could not locate myst/ot4ml_web.py")

repo_root = myst_dir.parent
thumbnails = repo_root / "notebooks-figures" / "thumbnails"

def show_book_figure(name, width=760):
    display(DisplayImage(filename=str(thumbnails / f"{name}.png"), width=width))

Semi-dual¶

The semi-dual eliminates one potential by an exact $c$ -transform. It preserves concavity while removing the explicit pointwise inequality constraint.

General Measure Semi-dual¶

For arbitrary measures, partial maximization converts the constrained two-potential dual into an unconstrained optimization over one function.

Denote the extended full-dual objective by

\mathcal E_0(f,g) \eqdef \begin{cases} \displaystyle \int_\X f\,\d\alpha+\int_\Y g\,\d\beta, & (f,g)\in\Potentials(c),\\ -\infty,&\text{otherwise}. \end{cases}

(1)

Thus $\mathcal L_c(\alpha,\beta)=\max_{f,g}\mathcal E_0(f,g)$ . For fixed $g$ , feasibility is equivalent to $f\leq g^{\bar c}$ . Since $\alpha$ is nonnegative, the largest admissible choice $f=g^{\bar c}$ maximizes the objective and gives

\mathcal L_c(\alpha,\beta) = \sup_{g\in\Cc(\Y)}\mathcal E(g), \qquad \mathcal E(g) \eqdef \mathcal E_0(g^{\bar c},g) = \sup_{f\in\Cc(\X)}\mathcal E_0(f,g) = \int_\X g^{\bar c}\,\d\alpha+\int_\Y g\,\d\beta.

(2)

Partial maximization preserves concavity. Moreover, $\mathcal E(g+s)=\mathcal E(g)$ because $(g+s)^{\bar c}=g^{\bar c}-s$ and both measures have unit mass. Potentials are therefore defined only up to an additive constant, while the optimization is unconstrained.

Discrete Semi-dual¶

For two discrete measures

\alpha=\sum_{i=1}^n a_i\delta_{x_i}, \qquad \beta=\sum_{j=1}^m b_j\delta_{y_j}, \qquad \C_{ij}=c(x_i,y_j),

(3)

with common total mass $M$ , use the same notation for vectors:

\mathcal E_0(f,g) \eqdef \begin{cases} \langle f,a\rangle+\langle g,b\rangle,&f\oplus g\leq\C,\\ -\infty,&\text{otherwise}. \end{cases}

(4)

Eliminating the source vector gives

\mathcal L_{\C}(a,b) = \max_{g\in\mathbb R^m}\mathcal E(g), \qquad \mathcal E(g) = \mathcal E_0(g^{\bar\C},g) = \sum_{i=1}^n a_i(g^{\bar\C})_i + \sum_{j=1}^m b_jg_j,

(5)

where

(g^{\bar\C})_i=\min_{1\le j\le m}(\C_{ij}-g_j).

(6)

The function $\mathcal E$ is concave, piecewise affine, and invariant under $g\mapsto g+s\mathbf 1$ . If ties are resolved by choosing $\sigma_g(i)\in\arg\min_j(\C_{ij}-g_j)$ , then a supergradient is

b-\widehat b(g), \qquad \widehat b_j(g)=\sum_{i:\,\sigma_g(i)=j}a_i.

(7)

It is therefore the mismatch between the desired target mass and the mass currently attracted by each target coordinate. At ties, splitting source mass among active targets describes the full superdifferential.

Auction Algorithm¶

The auction algorithm is derived from coordinate maximization of the semi-dual of the linear assignment problem. Its practical form uses bidder-specific dual-weight updates that cross the selected row’s next indifference threshold by $\varepsilon$ ; this controlled relaxation prevents jamming at nonsmooth ties. We follow the account of Mérigot and Thibert Mérigot & Thibert, 2020, itself based on Bertsekas’ auction algorithm and its $\varepsilon$ -scaling refinement Bertsekas, 1981Bertsekas & Eckstein, 1988Bertsekas, 1992. In this section, $\C\in\mathbb R^{n\times n}$ and $n\ge2$ . A permutation matrix $P$ represents the probability coupling $P/n$ .

Coordinate Ascent and Discrete Laguerre Cells¶

Write $g_j$ for the target Kantorovich potential. Specializing the discrete semi-dual to uniform weights gives

\mathcal E(g) = \frac1n\sum_{i=1}^n(g^{\bar\C})_i+\frac1n\sum_{j=1}^n g_j, \qquad (g^{\bar\C})_i=\min_{1\le j\le n}(\C_{ij}-g_j).

(8)

Thus $(g^{\bar\C},g)$ is dual feasible. This is the discrete $\bar C$ -transform of Remark Remark: Discrete $c$ -transform, with the same sign convention as in the general and semi-discrete formulations.

The discrete Laguerre cells are

\operatorname{Lag}^{\mathrm D}_j(g) = \left\{i:\ \C_{ij}-g_j\le \C_{ik}-g_k\ \text{for every }k\right\}.

(9)

They can overlap at ties. They are the finite counterparts of the general semi-discrete Laguerre cells in Definition Definition: Laguerre Cells and Power Diagrams: the displayed cell is that definition restricted to row indices. If every row has a unique minimizer, then

\frac{\partial}{\partial g_j}\mathcal E(g) = \frac{1-|\operatorname{Lag}^{\mathrm D}_j(g)|}{n}.

(10)

An overfull cell therefore calls for a decrease of its dual weight. Proposition Proposition: Dual Certificate for an Assignment shows that a perfect matching in the contact graph $i\in\operatorname{Lag}^{\mathrm D}_j(g)$ certifies optimality of $g$ ; conversely, assignment duality and complementary slackness produce such a matching at every maximizer.

For $i\in\operatorname{Lag}^{\mathrm D}_j(g)$ , define

\operatorname{bid}_j(g,i) = \min_{k\ne j}(\C_{ik}-g_k)-(\C_{ij}-g_j).

(11)

When the cell is nonempty, the largest maximizing decrement along the negative $j$ th coordinate is the largest such bid Mérigot & Thibert, 2020. At a tie, even this largest maximizing decrement can vanish, so naive coordinate ascent may jam before reaching a dual maximizer.

Bids and Relaxed Contacts¶

Auction avoids jamming by moving an unassigned row $\varepsilon$ beyond its next indifference point. Let $j_0$ and $j_1$ be its best and second-best targets:

j_0\in\arg\min_j(\C_{ij}-g_j), \qquad j_1\in\arg\min_{j\ne j_0}(\C_{ij}-g_j).

(12)

The winning dual weight is updated by

\Delta_i = \operatorname{bid}_{j_0}(g,i)+\varepsilon = (\C_{i,j_1}-g_{j_1})-(\C_{i,j_0}-g_{j_0})+\varepsilon, \qquad g_{j_0}\leftarrow g_{j_0}-\Delta_i.

(13)

Unlike exact coordinate maximization, this update uses the selected row’s bid rather than the maximum bid over the whole cell. Afterward, the reduced cost of $j_0$ is exactly $\varepsilon$ above that of the unchanged alternative $j_1$ . The row nevertheless takes $j_0$ ; its former owner, if any, becomes unassigned. Because of this overshoot, a bid need not increase the nonsmooth semi-dual. Coordinate ascent motivates the update, but $\varepsilon$ -complementary slackness is the invariant used in the convergence proof.

A bid gives the newly assigned row this property. Decreasing $g_{j_0}$ can only make $j_0$ less attractive to rows assigned elsewhere, and the previous owner of $j_0$ is removed. Every iteration therefore preserves the condition.

Algorithm: Bertsekas Auction

Input: Cost matrix $\C\in\mathbb R^{n\times n}$ , tolerance $\varepsilon>0$ , initial target potential $g\in\mathbb R^n$ (default $g=0$ ).

Output: Permutation matrix $P$ and target potential $g$ ; the source potential is $g^{\bar\C}$ .

Initialize: Set $P=0$ .

While some row $i$ satisfies $\sum_jP_{ij}=0$ do:

Choose any such row $i$ .
Set $j_0\in\arg\min_j(\C_{ij}-g_j)$ and $j_1\in\arg\min_{j\ne j_0}(\C_{ij}-g_j)$ .
Set $\Delta\leftarrow(\C_{i,j_1}-g_{j_1})-(\C_{i,j_0}-g_{j_0})+\varepsilon$ .
Set $g_{j_0}\leftarrow g_{j_0}-\Delta$ .
If $P_{i_0,j_0}=1$ for some $i_0$ then, set $P_{i_0,j_0}\leftarrow0$ .
Set $P_{i,j_0}\leftarrow1$ .

Return $P$ and $g$ .

Figure Div shows actual iterates on the planar point clouds used in Div. For the current target potential, define the auction reduced costs

r_{ij}(g)=\C_{ij}-g_j-(g^{\bar\C})_i\geq 0.

(15)

Exact zeros are the discrete Laguerre contacts of row $i$ , whereas an owned edge only needs to satisfy $r_{ij}(g)\leq\varepsilon$ by $\varepsilon$ -complementary slackness.

At an intermediate state, the current ownership matching is $M=\{(i,j):P_{ij}=1\}$ ; the labels in the figure report its cardinality.

Geometric progression of the unit-mass transportation auction. Thick violet segments form the current partial ownership matching, while thin translucent segments show the $2n$ unmatched edges with smallest auction reduced costs. The labels report matching cardinality and cumulative bid count. With $\varepsilon=0.002$ , the final assignment is reached after 505 bids and coincides with the exact squared-distance optimum.

Interactive panel. Vary the bid increment and inspect how assignments and target dual weights evolve toward complementary slackness.

Fixed Tolerance¶

For a prescribed tolerance, relaxed contact gives both an optimality certificate and a finite bound on the number of bids.

Proposition: Fixed-

\varepsilon

Auction Convergence and Complexity

Set $R_{\C}=\max_{i,j}\C_{ij}-\min_{i,j}\C_{ij}$ . Started from $g=0$ , Algorithm Algorithm: Bertsekas Auction terminates after at most

n\left(\left\lfloor R_{\C}/\varepsilon\right\rfloor+1\right)

(16)

bids. It returns a permutation matrix satisfying $\varepsilon$ -complementary slackness and

0 \le \frac1n\langle\C,P\rangle - \min_{P'\in\mathcal P_n^{\mathrm{perm}}}\frac1n\langle\C,P'\rangle \le \varepsilon.

(17)

Dense scans require $O(n^2(1+R_{\C}/\varepsilon))$ operations and $O(n^2)$ storage. If $\C$ is integer-valued and $\varepsilon<1/n$ , the assignment is exactly optimal.

$\varepsilon$ -Scaling¶

The cold-start estimate above exposes the limitation of a single tolerance. A large $\varepsilon$ is fast but gives only a coarse certificate, whereas the small $\varepsilon$ required for high accuracy can produce a bid count proportional to $R_{\C}/\varepsilon$ . Continuation first learns a rough dual-potential landscape and then sharpens it instead of restarting from zero.

Algorithm Algorithm: Auction With $\varepsilon$ -Scaling starts at the cost scale $\max\{R_{\C},\eta\}$ and halves the tolerance until it reaches the requested value $\eta$ . Each phase rebuilds the ownership matrix but retains the target potential from the preceding phase. The previous complete assignment already certifies approximate contact for this potential, so the next auction refines an existing dual landscape.

Algorithm: Auction With

\varepsilon

-Scaling

Input: Cost matrix $\C\in\mathbb R^{n\times n}$ , final tolerance $\eta>0$ .

Output: Permutation matrix $P$ and target potential $g$ ; the source potential is $g^{\bar\C}$ .

Initialize: Set $R_{\C}\leftarrow\max_{i,j}\C_{ij}-\min_{i,j}\C_{ij}$ and $g\leftarrow0$ .

If $R_{\C}=0$ then, return any permutation matrix and $g=0$ .

Set $\varepsilon\leftarrow\max\{R_{\C},\eta\}$ .

While $\varepsilon>\eta$ do:

Set $(P,g)\leftarrow\operatorname{Auction}(\C,\varepsilon,g)$ .
Set $\varepsilon\leftarrow\max\{\varepsilon/2,\eta\}$ .

Set $(P,g)\leftarrow\operatorname{Auction}(\C,\eta,g)$ .

Return $P$ and $g$ .

Proposition: Complexity of

\varepsilon

-Scaling

For $\eta>0$ , Algorithm Algorithm: Auction With $\varepsilon$ -Scaling returns a permutation satisfying $\eta$ -complementary slackness and (17) with $\varepsilon$ replaced by $\eta$ . If $R_{\C}>0$ , it uses at most

1+\left\lceil\log_2^+\!\left(\frac{R_{\C}}{\eta}\right)\right\rceil, \qquad \log_2^+(s)=\max\{0,\log_2 s\},

(19)

auction phases. Its dense worst-case complexity is

O\!\left(n^3\left(1+\log_+\frac{R_{\C}}{\eta}\right)\right), \qquad \log_+(s)=\max\{0,\log s\}.

(20)

If $R_{\C}=0$ , it terminates immediately. For integer-valued $\C$ and $\eta<1/n$ , its output is exactly optimal.

A cold-started $\eta$ -auction has bound $O(n^2(1+R_{\C}/\eta))$ , while scaling uses a logarithmic number of $O(n^3)$ phases. Thus scaling is a high-accuracy guarantee, not an unconditional speedup: its bound improves the cold-start estimate when $R_{\C}/\eta$ is large compared with $n(1+\log_+(R_{\C}/\eta))$ . For integer costs, choosing $\eta<1/n$ gives an exact assignment in $O(n^3(1+\log_+(nR_{\C})))$ operations.

Semi-discrete¶

The semi-discrete case is the setting where dual potentials become weights of Laguerre cells. This gives both geometry and algorithms for quantization and density fitting.

Discrete Targets and Laguerre Cells¶

Consider the case where

\beta=\sum_{j=1}^m b_j\delta_{y_j}

(22)

has distinct atoms and positive weights; zero-weight atoms can be removed. The same construction applies if $\alpha$ is discrete, after exchanging the roles of $\alpha$ and $\beta$ . Restricting the minimization in Definition Definition: $c$ -Transform to the support of $\beta$ , equivalently applying that definition with the discrete target space $\Y=\{y_j\}_{j=1}^m$ and identifying a vector $g\in\RR^m$ with the function $g:\Y\to\RR$ defined by $g(y_j)=g_j$ , gives the discrete $\bar c$ -transform

g^{\bar c}(x) \eqdef \min_{1\le j\le m} c(x,y_j)-g_j.

(23)

This maps a vector $g$ to a continuous function because it is the minimum of finitely many continuous functions. Using this transform when $\beta$ is discrete yields the finite-dimensional semi-dual

\mathcal{L}_c(\alpha,\beta) = \max_{g\in\RR^m} \mathcal{E}(g) \eqdef \mathcal E_0(g^{\bar c},g) = \int_\X g^{\bar c}(x)\,\d\alpha(x) + \sum_{j=1}^m g_j b_j .

(24)

The objective is invariant under $g\mapsto g+s\mathbf 1$ , so one may impose the gauge $\sum_j g_j=0$ .

The geometric object encoded by the dual weights is a weighted nearest-neighbor diagram: each source point is assigned to the target atom that realizes the discrete $\bar c$ -transform.

For quadratic costs, varying the dual weights moves the walls between adjacent cells while keeping them parallel. This is the geometric mechanism by which the cell masses are adjusted.

Figure Div follows this adjustment from unweighted Voronoi cells to a power diagram whose cell masses match the prescribed discrete target weights.

Laguerre cells for semi-discrete quadratic transport. The red contours show a continuous source density $\alpha$ given by a three-component Gaussian mixture on the right. The twenty-one colored circular sites are the atoms of the discrete target $\beta$ , sampled from a compact cloud on the left; each site color matches its Laguerre cell. Starting from ordinary Voronoi cells, semi-dual weight updates deform the cells so that the $\alpha$ -mass captured by each cell approaches the prescribed target mass.

The interactive demo exposes the dual-weight mechanism directly. Increase the number of weight updates to watch cells with too little mass expand and cells with too much mass shrink.

Interactive panel. Use the weight and seed controls to deform Laguerre cells and watch how their areas respond to semi-discrete masses.

Mass Balance¶

The semi-dual energy can be rewritten as

\mathcal{E}(g) = \sum_{j=1}^m \int_{\mathcal{L}_j(g)} \left(c(x,y_j)-g_j\right)\,\d\alpha(x) + \langle g,b\rangle .

(26)

The first-order optimality condition says that solving the semi-discrete dual amounts to choosing weights $g$ so that

\int_{\mathcal{L}_j(g)}\d\alpha=b_j \qquad\text{for every }j.

(30)

The gradient components sum to zero, consistently with the gauge invariance. Conversely, balanced cells define the piecewise-constant map $T(x)=y_j$ on $\mathcal{L}_j(g)$ . Its graph lies in the contact set $g^{\bar c}(x)+g_j=c(x,y_j)$ , so continuous complementary slackness proves that both the map and the weights are optimal. For the quadratic cost, uniqueness follows from Brenier’s theorem when $\alpha$ has a density.

The sign of the gradient has a direct geometric interpretation. Increasing $g_j$ lowers the corresponding power distance and expands $\mathcal L_j(g)$ ; decreasing $g_j$ shrinks it. The dotted outline marks the balanced cell, so semi-dual ascent can be read as a mass-balancing procedure on a power diagram.

Figure Div makes the sign of this gradient geometric.

Dual weights control Laguerre cell masses in the semi-discrete quadratic problem. The same blue target sites and red Gaussian source density are used in all panels; only the highlighted violet weight is changed. The dotted violet outline is the balanced cell. If the highlighted cell has too little source mass, then $b_j-\alpha(\mathcal L_j(g))>0$ and the ascent update increases the weight, expanding the cell outward. If it has too much mass, the update decreases the weight, shrinking it inward. At balance, the cell mass matches the prescribed target mass and the first-order update vanishes.

Interactive panel. Vary the target weights and the number of dual updates to watch Laguerre cells rebalance their masses.

Quadratic power diagrams have polyhedral cells and can be computed efficiently using computational-geometry algorithms Aurenhammer, 1987Aurenhammer et al., 1998Mérigot, 2011. Expanding the cost shows that a cell minimizes $x\mapsto-2\langle x,y_j\rangle+\norm{y_j}^2-g_j$ . The lower envelope of these affine functions gives the power diagram, while the lower hull of the lifted sites $(y_j,\norm{y_j}^2-g_j)$ gives its dual regular triangulation. For a planar source, this is a three-dimensional hull and Chan’s output-sensitive algorithm costs $O(m\log Q)$ for $Q$ hull vertices Chan, 1996. A three-dimensional source lifts to four dimensions and is not covered by that particular bound.

Stochastic Optimization¶

The semi-discrete formulation is useful because the objective is an expectation with respect to $\alpha$ :

\mathcal{E}(g) = \int_\X E(g,x)\,\d\alpha(x) = \EE_X(E(g,X)), \qquad E(g,x)\eqdef g^{\bar c}(x)+\langle g,b\rangle .

(31)

Away from cell boundaries, the stochastic gradient of the integrand is

\nabla_g E(g,x) = \left(b_j-\mathbf{1}_{\mathcal{L}_j(g)}(x)\right)_{j=1}^m,

(32)

an unbiased estimator of $\nabla\mathcal{E}(g)$ when cell boundaries have $\alpha$ -measure zero. One can therefore maximize the semi-dual without first discretizing $\alpha$ : the measure is used as a black box from which independent samples are drawn, a natural setup in high-dimensional statistics and machine learning.

Starting from $g^{(0)}=0$ , stochastic gradient ascent draws $x_\ell\sim\alpha$ and performs

g^{(\ell+1)} \eqdef g^{(\ell)} + \tau_\ell\nabla_g E(g^{(\ell)},x_\ell).

(33)

The stochastic supergradient has zero coordinate sum and preserves the gauge. For almost-sure stochastic-approximation convergence, one typically imposes

\sum_{\ell=0}^{\infty}\tau_\ell=\infty, \qquad \sum_{\ell=0}^{\infty}\tau_\ell^2<\infty.

(34)

For example, one may use $\tau_\ell=\tau_0(1+\ell/\ell_0)^{-q}$ with $1/2<q\leq1$ . The standard finite-horizon rate instead concerns averaged iterates.

This stochastic viewpoint is one of the main algorithmic advantages of the semi-discrete formulation Mérigot, 2011Genevay et al., 2016.

Algorithm: Semi-discrete Laguerre Ascent

Input: Source measure $\alpha$ , target atoms $(y_j,\b_j)$ , cost $c$ , steps $(\tau_k)_{k=0}^{K-1}$ , tolerance $\mathrm{tol}$ , maximum iteration count $K$ .

Output: Semi-discrete dual weights $\gD$ and Laguerre cells.

Initialize: Set $\gD^{(0)}=0$ .

For $k=0,\ldots,K-1$ do:

Compute cells: $\Laguerre_j(\gD^{(k)}) = \enscond{x}{c(x,y_j)-\gD^{(k)}_j\leq c(x,y_\ell)-\gD^{(k)}_\ell\quad\forall \ell}.$
Compute masses: $m_j^{(k)}=\int_{\Laguerre_j(\gD^{(k)})}\d\al .$
If $\max_j\abs{m_j^{(k)}-\b_j}\leq\mathrm{tol}$ then:

Return $\gD^{(k)}$ and the current cells.
Update $\gD^{(k+1)} = \gD^{(k)}+\tau_k\bigl(\b-m^{(k)}\bigr).$

Compute the cells of $\gD^{(K)}$ .

Return $\gD^{(K)}$ and these cells.

Algorithm: Stochastic semi-discrete ascent

Input: Source sampler $x\sim\alpha$ , target atoms $(y_j,\b_j)$ , steps $(\tau_\ell)_{\ell=0}^{L-1}$ , iteration count $L$ .

Output: Stochastic semi-discrete dual weights $\gD$ .

Initialize: Set $\gD^{(0)}=0$ .

For $\ell=0,\ldots,L-1$ do:

Draw $x_\ell\sim\alpha$ .
Set $j_\ell=\min\argmin_j\bigl(c(x_\ell,y_j)-\gD_j^{(\ell)}\bigr)$ .
For $j=1,\ldots,m$ do

$\gD_j^{(\ell+1)} = \gD_j^{(\ell)} + \tau_\ell\bigl(\b_j-\ones_{\{j=j_\ell\}}\bigr).$

Return $\bar\gD_L=L^{-1}\sum_{\ell=0}^{L-1}\gD^{(\ell)}$ .

Optimal Quantization¶

Optimal quantization asks for the best discrete approximation of a measure by $m$ codepoints. It is the geometric core of vector quantization, compression and $k$ -means clustering.

Free Masses and Prescribed Weights¶

The classical problem optimizes both codepoint positions and their probabilities. For a measure $\alpha$ , define

\mathcal{Q}_m(\alpha) \eqdef \min_{Y=(y_j)_{j=1}^m,\;b\in\simplex_m} \Wass_p\left(\alpha,\sum_{j=1}^m b_j\delta_{y_j}\right).

(37)

This problem is classical in approximation theory and information theory Graf & Luschgy, 2000Lloyd, 1982.

The equal-weight case, $b_j=1/m$ , prescribes the weights and is treated at the end of this section.

This deterministic rate mirrors the empirical optimal-transport sample-complexity rate: both are governed by the spacing $m^{-1/d}$ of points in dimension $d$ . Quantization is best-case and deterministic, while empirical OT is random, but both display the same curse of dimensionality. Zador’s theorem further identifies the sharp asymptotic constant and limiting codepoint density Graf & Luschgy, 2000.

For fixed codepoints $Y$ , the powered cost $b\mapsto\Wass_p^p(\alpha,\sum_jb_j\delta_{y_j})$ is convex. Its $p$ -th root $\Wass_p$ need not be convex. The dependence on $Y$ is nonconvex and is generally computationally hard. The rest of this section distinguishes the free-mass Lloyd reduction from the fixed-weight geometry underlying finite-particle $\mathcal W_2$ gradient flows.

Lloyd Algorithm¶

The computational appeal of quantization comes from splitting the nonconvex search over sites into two elementary operations. For fixed sites, the optimal assignment is purely local: each point is sent to one of its nearest sites, and the resulting cells are Voronoi cells. This is the assignment step behind Lloyd’s algorithm and the $k$ -means method.

Proposition: Free Masses Give Voronoi Cells

For the cost $c(x,y)=d(x,y)^p$ , fix distinct codepoints $Y=(y_j)_{j=1}^m$ . Duplicate codepoints can be merged beforehand. Minimizing over the weights $b\in\simplex_m$ gives

\min_{b\in\simplex_m} \Wass_p^p \left(\alpha,\sum_j b_j\delta_{y_j}\right) = \int_\X \min_{1\le j\le m} c(x,y_j)\,\d\alpha(x).

(40)

An optimal coupling is induced by sending each $x$ to a nearest codepoint. The corresponding cells are the Voronoi cells

\mathcal{V}_j(Y) \eqdef \left\{ x : c(x,y_j)\le c(x,y_{j'}) \quad\text{for all }j' \right\},

(41)

up to arbitrary tie-breaking on common boundaries.

Consequently, the quantization energy can be written in nearest-centroid form:

\mathcal{Q}_m(\alpha)^p = \min_Y \mathcal{F}(Y), \qquad \mathcal{F}(Y) \eqdef \int_\X \min_{1\le j\le m} c(x,y_j)\,\d\alpha(x).

(42)

At a differentiability point of this energy, any local minimizer with nonempty cells satisfies the centroid condition

y_j \in \operatorname*{arg\,min}_{y} \int_{\mathcal{V}_j(Y)} c(x,y)\,\d\alpha(x).

(43)

For the squared Euclidean cost, this becomes

y_j = \frac{\int_{\mathcal{V}_j(Y)} x\,\d\alpha(x)} {\int_{\mathcal{V}_j(Y)} \d\alpha}.

(44)

Lloyd’s algorithm, also known as the $k$ -means algorithm, iterates this fixed point: assign points to nearest sites, then replace each site by the centroid of its cell Lloyd, 1982. The assignment and centroid steps each minimize the appropriate block, so the objective cannot increase. Nonconvexity means this does not guarantee a global minimizer. For a finite data set with squared Euclidean loss, $k$ -means++ gives an expected logarithmic approximation guarantee Arthur & Vassilvitskii, 2007.

Continuous Lloyd Flow¶

There is also an infinitesimal version of Lloyd’s fixed point, but it should first be understood on finite labelled configurations. Assume that $c(x,y)=\norm{x-y}^2$ and that $\alpha$ does not charge Voronoi boundaries. For a configuration $Y$ , define, on nonempty cells,

a_j(Y)=\alpha(\mathcal V_j(Y)), \qquad b_j(Y)=\frac{1}{a_j(Y)} \int_{\mathcal V_j(Y)}x\,\d\alpha(x)

(45)

as the cell mass and centroid. Empty cells are singular points of the vector field; one either freezes them, as in the algorithm below, or reseeds them. The relaxed step

y_j^{(k+1)} = y_j^{(k)}+\tau\big(b_j(Y^{(k)})-y_j^{(k)}\big), \qquad 0<\tau\le 1,

(46)

is an explicit-Euler step for the cell-mass preconditioned gradient flow of the quantization energy $\mathcal F$ ,

\dot y_j(t)=b_j(Y_t)-y_j(t).

(47)

Indeed, at differentiability points of $\mathcal F$ , the envelope theorem gives

\nabla_{y_j}\mathcal F(Y) = 2\int_{\mathcal V_j(Y)}(y_j-x)\d\alpha(x) = 2a_j(Y)(y_j-b_j(Y)),

(48)

so that

\dot y_j(t) = -\frac{1}{2a_j(Y_t)}\nabla_{y_j}\mathcal F(Y_t).

(49)

Equivalently, this is the gradient flow of $\mathcal F$ for the site metric $g_Y(U,V)=2\sum_j a_j(Y)\langle u_j,v_j\rangle$ ; it is not the unweighted Euclidean gradient flow unless the masses are absorbed into the time step. Along smooth portions of the flow,

\frac{\d}{\d t}\mathcal F(Y_t) = -2\sum_j a_j(Y_t)\|b_j(Y_t)-y_j(t)\|^2 \le 0.

(50)

If $\eta_t=\sum_j w_j\delta_{y_j(t)}$ carries fixed positive weights, independent of the Voronoi masses $a_j(Y_t)$ , this labelled particle ODE is equivalently a weak continuity equation,

\partial_t\eta_t+\operatorname{div}(v_t\eta_t)=0, \qquad v_t(y_j(t))=b_j(Y_t)-y_j(t),

(51)

in the sense of the measure evolutions introduced in Chapter Paragraph. The weights $w_j$ in this transport equation are auxiliary weights for the moving labelled particles; they are not the Voronoi masses used to define the quantization energy. If one records instead the free-weight projection

\nu_{Y_t}=\sum_j a_j(Y_t)\delta_{y_j(t)},

(52)

then, formally,

\partial_t\nu_{Y_t}+\operatorname{div}(v_t\nu_{Y_t}) = \sum_j\dot a_j(Y_t)\delta_{y_j(t)}, \qquad v_t(y_j(t))=\dot y_j(t).

(53)

Thus the free-weight quantizer evolves by a balance equation, not by pure transport. This is why the construction is intrinsically finite-dimensional: Voronoi cells, centroids and labels define the velocity, and a canonical extension to arbitrary measures is not obtained by replacing $Y$ with the support of a measure. Indeed, any measure with dense support would have zero support-distance quantization error.

Mean-Field Limit and Ultrafast Diffusion¶

There is nevertheless a precise high-resolution continuum theory when the number $m$ of codepoints tends to infinity. If $\alpha=\rho\,\d x$ , the limiting Eulerian variable is the density $\sigma$ of sites, meaning heuristically that $m\sigma(x)\,\d x$ codepoints lie in $\d x$ . Thus the limit is $m\to+\infty$ , not a limit in the exponent of the PDE. In one dimension, Caglioti, Golse and Iacobelli embed the ordered particle configuration in $L^2(0,1)$ and prove quantitative convergence of the discrete gradient flow toward a limiting flow Caglioti et al., 2015. A perturbative two-dimensional analysis around the optimal hexagonal lattice is developed in Caglioti et al., 2018. For $p$ -quantization in dimension $d$ , set $r=p/d$ , so $r=2/d$ for the quadratic cost used in this section, and write $\mathcal F_p(Y)=\int\min_j\norm{x-y_j}^p\,\d\alpha(x)$ . For well-prepared configurations whose empirical site distributions converge to $\sigma\,\d x$ , the rescaled energy is described, up to a universal cell-shape constant, by

\mathcal G_\rho(\sigma) \eqdef \int_\Omega \rho(x) \sigma(x)^{-r}\,\d x, \qquad \int_\Omega \sigma\,\d x=1,

(54)

Formally,

m^r\mathcal F_p(Y^{(m)})\simeq C_{p,d}\mathcal G_\rho(\sigma), \qquad m^r\mathcal Q_m(\alpha)^p \longrightarrow C_{p,d}\min_\sigma\mathcal G_\rho(\sigma).

(55)

Since the first variation is $-r\rho\sigma^{-r-1}$ , the formal $\mathcal W_2$ -gradient flow is the weighted ultrafast diffusion equation

\partial_t \sigma = -r\,\operatorname{div}\!\left( \sigma\nabla\!\left(\frac{\rho}{\sigma^{r+1}}\right) \right),

(56)

with periodic or no-flux boundary conditions. Iacobelli studies the associated one-dimensional very-fast-diffusion equation and its convergence to equilibrium Iacobelli, 2019; Iacobelli, Patacchini and Santambrogio then use the JKO scheme and Wasserstein-gradient-flow tools to prove well-posedness, regularity estimates and convergence for a multidimensional weighted version Iacobelli et al., 2019. When $\rho>0$ , set $\omega=\rho^{1/(r+1)}$ and $u=\sigma/\omega$ . The same equation becomes

\partial_t u = -\frac{r+1}{\omega}\operatorname{div}\!\left(\omega\nabla(u^{-r})\right),

(57)

which makes the negative exponent, hence the ultrafast-diffusion character, explicit. Its stationary site density is proportional to $\rho^{d/(d+p)}$ .

Figure Div shows the relaxed Lloyd flow in a two-dimensional toy problem.

Relaxed Lloyd flow from a source Gaussian-mixture initialization toward a different target Gaussian-mixture density. The blue contours and shading show the target density $\alpha$ , while the colored disks are the moving codepoints initialized from the source mixture. The faint curves trace the labelled sites under the explicit-Euler Lloyd ODE. The right panel displays the relative quantization energy, illustrating the monotone decay of the objective along the relaxed iterations.

Figure Div follows the associated Voronoi cells and generators through Lloyd iterations, making the decrease of the quantization energy visible.

Lloyd quantization for the same continuous density and twenty-one initial sites as the Laguerre-cell figure. The red contours show the density $\alpha$ , while the colored disks are the current codepoints and have the same colors as their Voronoi cells. The iterations move the initially left-located sites toward the high-density region and reshape the cells according to centroidal Voronoi geometry.

The interactive demo separates the nonconvex geometry from the fixed-point update: increase the iteration counter and watch sites migrate toward the density before settling into a local centroidal configuration.

Interactive panel. Use the iteration and site controls to compare Lloyd-style quantization steps with the semi-discrete geometry.

Algorithm: Lloyd quantization

Input: Source measure $\alpha$ , initial codepoints $Y^{(0)}=(y_j^{(0)})_{j=1}^m$ , squared Euclidean cost, tolerance $\mathrm{tol}$ , maximum iteration count $K$ .

Output: Codepoints $Y=(y_j)_{j=1}^m$ .

Initialize: Set $d_0=+\infty$ and $k=0$ .

While $d_k>\mathrm{tol}$ and $k<K$ do:

Set $k\leftarrow k+1$ .
Compute Voronoi cells: $\VV_j(Y^{(k-1)}) = \enscond{x}{c(x,y_j^{(k-1)})\leq c(x,y_\ell^{(k-1)})\quad\forall \ell}.$
For each nonempty cell $\VV_j$ do

$y_j^{(k)} = \frac{\int_{\VV_j(Y^{(k-1)})}x\,\d\al(x)} {\int_{\VV_j(Y^{(k-1)})}\d\al(x)}.$

For each empty cell $\VV_j$ do:

Set $y_j^{(k)}=y_j^{(k-1)}$ .

Set $d_k=\max_j\norm{y_j^{(k)}-y_j^{(k-1)}}$ .

Return $Y^{(k)}$ .

Quantization with Fixed Equal Weights¶

The free-mass formulation above optimizes the positions and the weights of the atoms. A different problem is obtained by prescribing the weights. In the equal-weight case, set

\nu_Y\eqdef \frac1m\sum_{j=1}^m\delta_{y_j}, \qquad \mathcal F_{\rm eq}(Y) \eqdef \frac12\mathcal W_2^2(\alpha,\nu_Y),

(58)

and minimize only over the positions $Y=(y_j)_j$ . Assume that the sites are distinct, that $\alpha$ has a density, and that cell boundaries have zero $\alpha$ -mass. Let $C_j(Y)$ be the Laguerre cell transported to $y_j$ , so that $\alpha(C_j(Y))=1/m$ , and define its centroid

\bar x_j(Y) \eqdef m\int_{C_j(Y)} x\,\d\alpha(x).

(59)

At differentiability points, the envelope theorem gives

\nabla_{y_j}\mathcal F_{\rm eq}(Y) = \frac1m\bigl(y_j-\bar x_j(Y)\bigr).

(60)

Locally, while labels remain optimally matched, the $\mathcal W_2$ metric on equal-weight empirical measures induces the particle metric $g_Y(U,V)=m^{-1}\sum_j\langle u_j,v_j\rangle$ . Hence the associated $\mathcal W_2$ gradient flow is the coupled system

\dot y_j(t) = \bar x_j(Y_t)-y_j(t), \qquad j=1,\ldots,m.

(61)

Equivalently, $\nu_{Y_t}$ satisfies a continuity equation with velocity $v_t(y_j(t))=\bar x_j(Y_t)-y_j(t)$ . This is the so-called finite-particle $\mathcal W_2$ gradient-flow viewpoint developed more systematically in Chapter Paragraph; the fixed-weight cells are Laguerre cells rather than the free-mass Voronoi cells used by Lloyd’s method.

Equal-Weight Quantization on the Line¶

The following classical scalar quantization result gives the precise form of the inverse-CDF rule for equal-weight quadratic quantization Graf & Luschgy, 2000. The atoms are not exactly the midpoint quantiles in general; they are the averages of the quantile function over equal mass bins. Midpoint inverse-CDF samples are nevertheless asymptotically equivalent and are often the most convenient rule in numerical examples.

Proposition: One-Dimensional Equal-Weight Quantization

Let $\alpha\in\mathcal M_+^1(\mathbb R)$ have finite second moment and quantile function $Q=F_\alpha^{-1}$ . For

\mathcal Q_{m,\mathrm{eq}}(\alpha)^2 \eqdef \min_{y_1\le\cdots\le y_m} \mathcal W_2^2 \left(\alpha,\frac1m\sum_{i=1}^m\delta_{y_i}\right),

(62)

set $I_i=((i-1)/m,i/m]$ . Then the sorted minimizer is unique and its $i$ th atom is

y_i^\star = m\int_{I_i} Q(u)\,\d u,

(63)

and

\mathcal Q_{m,\mathrm{eq}}(\alpha)^2 = \sum_{i=1}^m \int_{I_i} \left|Q(u)-y_i^\star\right|^2\,\d u.

(64)

If $Q\in C^1([0,1])$ , then

m^2\mathcal Q_{m,\mathrm{eq}}(\alpha)^2 \longrightarrow \frac1{12}\int_0^1 |Q'(u)|^2\,\d u.

(65)

Thus the common deterministic rule $m^{-1}\sum_i\delta_{Q((i-1/2)/m)}$ should be read as a midpoint approximation of the optimal bin-average formula. Orthogonal projection onto constants shows that it has the same leading squared error under the same smoothness assumptions; for the uniform law on $[0,1]$ , both rules coincide and give the regular grid $y_i=(i-1/2)/m$ . Random sampling has a different asymptotic regime.

Proposition: Quantile-Process Asymptotics for Random Placement

Let $\alpha\in\mathcal M_+^1(\mathbb R)$ have quantile $Q\in C^1([0,1])$ , and let $\widehat\alpha_m=m^{-1}\sum_{i=1}^m\delta_{X_i}$ with $X_i$ i.i.d. with law $\alpha$ . If $B$ denotes the standard Brownian bridge on $[0,1]$ , then

m\,\mathcal W_2^2(\alpha,\widehat\alpha_m) \overset{\mathrm{law}}{\longrightarrow} \int_0^1 B(u)^2 |Q'(u)|^2\,\d u,

(69)

and, in expectation,

m\,\mathbb E\!\left[ \mathcal W_2^2(\alpha,\widehat\alpha_m) \right] \longrightarrow \int_0^1 u(1-u)|Q'(u)|^2\,\d u.

(70)

Combining these propositions gives a sharp contrast between optimal placement and random placement on the line. Deterministic equal-weight quantization has squared error of order $m^{-2}$ , hence $\mathcal W_2$ error of order $m^{-1}$ , while i.i.d. empirical sampling has expected squared error of order $m^{-1}$ , hence root-mean-square $\mathcal W_2$ error of order $m^{-1/2}$ . This is consistent with broader empirical OT sample-complexity theory Dereich et al., 2013Fournier & Guillin, 2015Weed & Bach, 2019.

Figure Div illustrates both parts of this comparison: optimal atoms are uniform in quantile coordinates, and their error decays one power of $m$ faster than the root-mean-square empirical error.

One-dimensional equal-weight quantization in quantile coordinates. Left: for a smooth positive density on $[0,1]$ , the colored atoms are bin averages of the inverse CDF over equal quantile intervals, while the gray atoms show one i.i.d. empirical draw with the same number of particles. Right: expected squared $\mathcal W_2$ errors. The deterministic bin averages and midpoint quantiles follow the $m^{-2}$ squared-error law, whereas i.i.d. empirical measures follow the slower $m^{-1}$ expected squared-error law.

Interactive panel. Change the one-dimensional law, the number of atoms and the Monte Carlo seed to compare optimal equal-weight quantization with random empirical sampling. The right panel recomputes the squared $\Wass_2$ error curves from the quantile formula.

Wasserstein-1 Norm¶

The $\Wass_1$ distance has an especially transparent dual: the admissible potentials are exactly 1-Lipschitz test functions. This makes $\Wass_1$ the meeting point between transport, PDE formulations and weak norms on signed measures.

c-Transform for Wasserstein-1¶

Assume that $d$ is a distance on $\X=\Y$ and take the ground cost $c(x,y)=d(x,y)$ .

By the preceding proposition, a closed dual pair has the form $(f,-f)$ with $\Lip(f)\le1$ . The Kantorovich dual therefore becomes the Kantorovich--Rubinstein formula

\Wass_1(\alpha,\beta) = \max_f \left\{ \int_\X f\,\d(\alpha-\beta) : \Lip(f)\le1 \right\} =: \norm{\alpha-\beta}_{W_1}.

(76)

This expression depends only on the signed measure $\xi=\alpha-\beta$ . On compact $\X$ , the same supremum defines the Kantorovich--Rubinstein norm on finite signed measures with zero mass Kantorovich & Rubinstein, 1958. Homogeneity and the triangle inequality are immediate, while definiteness follows because Lipschitz functions separate finite Radon measures. On a noncompact pointed space, one uses normalized Lipschitz functions and measures with finite first moment.

For a discrete signed measure $\alpha-\beta=\sum_k r_k\delta_{z_k}$ with $\sum_k r_k=0$ ,

\Wass_1(\alpha,\beta) = \max_{(f_k)_k} \left\{ \sum_k f_k r_k : |f_k-f_\ell|\le d(z_k,z_\ell) \quad\text{for all }k,\ell \right\}.

(77)

This finite-dimensional linear program can be solved by generic interior-point or first-order methods. If $N$ support points are involved, however, it still contains $O(N^2)$ Lipschitz constraints, mirroring the $O(nm)$ coupling variables of the original discrete Kantorovich LP; the dual formulation alone does not remove the all-pairs structure. The gain comes on structured metric spaces where the distance is generated locally: it is then enough to impose Lipschitz inequalities on neighboring pairs, because summing along paths recovers the constraints between arbitrary points. The one-dimensional ordered case is the first example; the graph-geodesic case is described later in Proposition Proposition: $\Wass_1$ and Beckmann Flow on a Graph.

When $d(x,y)=|x-y|$ on $\RR$ , ordering the support points $z_1\le z_2\le\cdots$ reduces the constraints to neighboring pairs:

\Wass_1(\alpha,\beta) = \max_{(f_k)_k} \left\{ \sum_k f_k r_k : |f_{k+1}-f_k|\le z_{k+1}-z_k \quad\text{for all }k \right\}.

(78)

In one dimension this is equivalent to the cumulative formula given in the one-dimensional transport section.

Wasserstein-1 on Euclidean Spaces¶

In Euclidean space, the Lipschitz constraint has a local differential form and its dual variable is a flux. Let $\alpha,\beta$ have finite first moments and set $\xi=\alpha-\beta$ . Rademacher’s theorem gives

\Wass_1(\alpha,\beta) = \sup_{f\in W_{\mathrm{loc}}^{1,\infty}(\RR^d)} \left\{ \int_{\RR^d} f\,\d\xi : \norm{\nabla f}_{L^\infty}\le1 \right\}.

(79)

The flux need not have a Lebesgue density: Dirac-to-Dirac transport already produces a measure concentrated on a segment. Let $m\in\mathcal M(\RR^d;\RR^d)$ be a vector-valued Radon measure, let $|m|$ denote its total variation, and define $\langle\operatorname{div}m,\varphi\rangle=-\int\langle\nabla\varphi,\d m\rangle$ .

This is the Beckmann formulation Beckmann, 1952Santambrogio, 2015. If $m=w\,\d x$ , its cost is $\int\norm{w(x)}\,\d x$ . Outside the source and target mass, $\operatorname{div}m=0$ , expressing local conservation.

Once discretized with finite elements, the dual Lipschitz problem and the Beckmann problem become nonsmooth convex optimization problems. The same formulation extends to complete Riemannian manifolds by replacing straight segments with minimizing geodesics and using tangent-valued Radon measures.

Graph Distances and Beckmann Flows¶

Finite graphs give a simple discrete instance where a metric is generated by local moves, so the all-pairs Lipschitz constraints collapse to edge constraints.

This graph distance turns $\Wass_1$ into a finite-dimensional flow problem.

Proposition:

\Wass_1

and Beckmann Flow on a Graph

Let $G=(V,E)$ be a connected finite graph with positive edge lengths $(\ell_e)_{e\in E}$ and graph geodesic distance $d_G$ . For probability vectors $a,b$ on $V$ , set $r=a-b$ and orient each edge $e=(i,j)$ . If

(\nabla_G f)_e=f_j-f_i, \qquad \operatorname{div}_G=-\nabla_G^*

(83)

are the finite-difference gradient and negative adjoint, then

\Wass_{1,G}(a,b) = \max_f \left\{ \sum_{i\in V} f_i r_i : |f_i-f_j|\le\ell_e \quad\forall e=(i,j) \right\} = \min_m \left\{ \sum_{e\in E}\ell_e |m_e| : \operatorname{div}_G m=r \right\}.

(84)

The vector $m_e$ is an oriented edge flow, and the constraint $\operatorname{div}_G m=r$ is conservation of mass at each vertex.

Figure Div shows the optimal edge flux on both a quasi-regular and a nonuniform Delaunay graph, emphasizing that the Beckmann variables live only on graph edges.

Graph Beckmann formulation of $\Wass_1$ on a Delaunay graph. Red and blue disks encode the positive and negative parts of $r=\alpha-\beta$ . Violet arrows display the signed edge flow $m$ : orientation gives the sign, width is proportional to $\sqrt{|m_e|}$ , and the flow satisfies the conservation constraint $\operatorname{div}_G m=r$ .

The interactive graph view lets the source and sink clusters move and changes the graph resolution. It makes the transshipment interpretation of $\Wass_1$ visible: signed mass is routed through local edges rather than matched only by straight source-to-target segments.

Interactive panel. Use the graph and demand controls to inspect how Wasserstein-1 transport becomes a flow problem on edges.

Remark: Sparse LP and Network Simplex

Let $N=|V|$ and $M=|E|$ . Writing $m=m^+-m^-$ turns graph Beckmann transport into

\min_{m^+,m^-\geq0}\sum_{e\in E}\ell_e(m^+_e+m^-_e) \quad\text{subject to}\quad \operatorname{div}_G(m^+-m^-)=r .

(86)

This LP has $2M$ nonnegative variables and $N-1$ independent balance constraints, versus $N^2$ variables and $2N-1$ independent constraints for the dense transport LP. On a sparse graph, $M=O(N)$ .

Equivalently, replace each undirected edge by two directed arcs. The result is a minimum-cost transshipment problem to which the network simplex applies: a basis is a spanning tree, and a pivot inserts a non-tree arc and routes flow around the resulting cycle Bertsekas & Eckstein, 1988Orlin, 1997. A basic implementation costs $O(PM)$ for $P$ pivots on a sparse graph. Although $P$ depends on the pivot rule, polynomial minimum-cost-flow algorithms are available, and the edge formulation is usually far smaller than the dense transport LP.

This graph formulation is the transshipment version of $\Wass_1$ . It is the natural discrete analogue of the Beckmann formulation: gradients are edge differences, divergences are incidence-matrix balances, and geodesic distance is shortest-path length. It can be solved by min-cost flow methods on sparse graphs, while entropic or KL-projection variants lead to flow-Sinkhorn algorithms for graph $\Wass_1$ Beckmann, 1952Peyré, 2026.

References¶

Bertsekas, D. P. (1992). Auction algorithms for network flow problems: a tutorial introduction. Computational Optimization and Applications, 1(1), 7–66.
Bertsekas, D. P., & Eckstein, J. (1988). Dual coordinate step methods for linear network flow problems. Mathematical Programming, 42(1), 203–243.
Aurenhammer, F., Hoffmann, F., & Aronov, B. (1998). Minkowski-type theorems and least-squares clustering. Algorithmica, 20(1), 61–76.
Mérigot, Q. (2011). A multiscale approach to optimal transport. Computer Graphics Forum, 30(5), 1583–1592.
Mérigot, Q. (2013). A comparison of two dual methods for discrete optimal transport. In Geometric science of information (pp. 389–396). Springer.
Kantorovich, L., & Rubinstein, G. S. (1958). On a space of totally additive functions. Vestn Leningrad Universitet, 13, 52–59.
Beckmann, M. (1952). A continuous model of transportation. Econometrica, 20, 643–660.
Mérigot, Q., & Thibert, B. (2020). Optimal Transport: Discretization and Algorithms. arXiv Preprint arXiv:2003.00855. 10.48550/arXiv.2003.00855
Bertsekas, D. P. (1981). A new algorithm for the assignment problem. Mathematical Programming, 21(1), 152–171.
Aurenhammer, F. (1987). Power diagrams: properties, algorithms and applications. SIAM Journal on Computing, 16(1), 78–96.
Chan, T. M. (1996). Optimal output-sensitive convex hull algorithms in two and three dimensions. Discrete & Computational Geometry, 16(4), 361–368.
Genevay, A., Cuturi, M., Peyré, G., & Bach, F. (2016). Stochastic optimization for large-scale optimal transport. Advances in Neural Information Processing Systems, 3440–3448.
Graf, S., & Luschgy, H. (2000). Foundations of Quantization for Probability Distributions (Vol. 1730). Springer.
Lloyd, S. (1982). Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.
Arthur, D., & Vassilvitskii, S. (2007). k-means++: The Advantages of Careful Seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 1027–1035.