Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Optimal Transport for Machine Learners

Abstract

Modern machine learning repeatedly manipulates probability measures: empirical datasets, generated samples, latent distributions, class-conditional laws, particle systems, weights of wide networks and attention patterns. Optimal transport is useful in this setting because it compares such objects by asking how mass should move. It therefore combines a statistically meaningful notion of discrepancy with a geometry of interpolation, dual certificates and variational dynamics. This makes OT a common language for losses, generative modeling, domain adaptation, robust learning, barycenters, gradient flows and mean-field descriptions of learning algorithms.

This book presents the main OT techniques with these machine-learning uses in mind. It starts from finite assignment and the Monge map viewpoint, passes to Kantorovich couplings and dual potentials, and then explains the algorithmic ideas that make transport usable: linear programming, semi-discrete cells, Sinkhorn scaling and low-dimensional projections. The same objects are then reused as a geometry of measures, giving Wasserstein distances, barycenters, gradient flows, dynamic formulations and Gaussian/Bures formulas. The final chapters emphasize the variants most relevant to modern ML: divergences and adversarial losses, entropic and unbalanced relaxations, robust or spectral ground geometries, Gromov and quantum extensions, and transport-based views of generative models, mean-field networks and attention dynamics. The goal is to keep the mathematics explicit while exposing the computational and geometric intuitions needed to turn OT into a working toolbox for machine learners.

Guide to the Literature and Scope

Several books already cover optimal transport from complementary viewpoints. The two-volume monograph of Rachev and Rueschendorf Rachev & Rüschendorf, 1998Rachev & Rüschendorf, 1998 gives a broad probabilistic treatment of mass transportation and its applications. Villani’s books Villani, 2003Villani, 2009 are the standard references for the modern mathematical theory, from Kantorovich duality to curvature, concentration and geometric analysis. Santambrogio’s text Santambrogio, 2015 offers a concise applied-mathematics route through the same foundations, with a strong emphasis on PDEs and variational arguments. Ambrosio, Gigli and Savare Ambrosio et al., 2006 develop the metric-space theory of gradient flows that underlies the dynamical part of the subject.

On the computational side, Peyre and Cuturi Peyré & Cuturi, 2019 provide the reference account of numerical OT, entropic regularization and applications in data sciences. Galichon’s book Galichon, 2016 explains the economic and matching-theoretic viewpoint, while the statistical theory of OT is developed in the recent lecture notes of Chewi, Niles-Weed and Rigollet Chewi et al., 2024. Recent surveys complement these books by emphasizing scalable algorithms and machine-learning applications Khamis et al., 2024Montesuma et al., 2023, as well as the role of OT in imaging and graphics Bonneel & Digne, 2023. These references remain the natural places to find exhaustive proofs, historical details and specialized variants.

The aim here is different and more selective. The book keeps the core mathematics explicit, but organizes it around the questions that repeatedly arise in machine learning: how to compare singular empirical measures, how to compute differentiable transport losses, how regularization changes optimization and statistics, how dual potentials become discriminators, and how transport geometry produces flows of particles, neurons and tokens. The intended contribution is therefore not a replacement for the references above, but a compact bridge between rigorous OT and the geometric intuitions needed to use it in modern ML.

Interactive Web Book

This web version gives the LaTeX book a second life as an interactive reading environment:

References
  1. Rachev, S. T., & Rüschendorf, L. (1998). Mass Transportation Problems: Volume I: Theory. Springer.
  2. Rachev, S. T., & Rüschendorf, L. (1998). Mass Transportation Problems: Volume II: Applications. Springer.
  3. Villani, C. (2003). Topics in Optimal Transportation (Vol. 58). American Mathematical Society.
  4. Villani, C. (2009). Optimal Transport: Old and New (Vol. 338). Springer.
  5. Santambrogio, F. (2015). Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Birkhäuser.
  6. Ambrosio, L., Gigli, N., & Savaré, G. (2006). Gradient Flows in Metric Spaces and in the Space of Probability Measures. Springer.
  7. Peyré, G., & Cuturi, M. (2019). Computational Optimal Transport with Applications to Data Sciences. Foundations and Trends in Machine Learning, 11(5–6), 355–607. 10.1561/2200000073
  8. Galichon, A. (2016). Optimal Transport Methods in Economics. Princeton University Press.
  9. Chewi, S., Niles-Weed, J., & Rigollet, P. (2024). Statistical Optimal Transport. arXiv Preprint arXiv:2407.18163.
  10. Khamis, A., Tsuchida, R., Tarek, M., Rolland, V., & Petersson, L. (2024). Scalable Optimal Transport Methods in Machine Learning: A Contemporary Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 10.1109/TPAMI.2024.3379571
  11. Montesuma, E. F., Mboula, F. N., & Souloumiac, A. (2023). Recent Advances in Optimal Transport for Machine Learning. arXiv Preprint arXiv:2306.16156.
  12. Bonneel, N., & Digne, J. (2023). A Survey of Optimal Transport for Computer Graphics and Computer Vision. Computer Graphics Forum, 42(2), 439–460. 10.1111/cgf.14778