Optimal Transport for Machine Learners

Abstract¶

Modern machine learning repeatedly manipulates probability measures: empirical datasets, generated samples, latent distributions, class-conditional laws, particle systems, parameter distributions of wide networks and attention patterns. Optimal transport is useful in this setting because it compares such objects by asking how mass should move. It combines a statistically meaningful discrepancy with a geometry of interpolation, dual certificates and variational dynamics, making OT a common language for losses, generative modeling, domain adaptation, robust learning, barycenters, gradient flows and mean-field descriptions of learning algorithms.

This book uses optimal transport as a meeting point between probability, PDEs, optimization and statistics, with modern machine learning as the organizing pressure. In current learning systems, probability distributions are no longer peripheral objects: datasets are empirical laws, generators define push-forward laws, samplers solve evolution equations, and large models move information through populations of particles, parameters and tokens. Comparing and evolving such distributions has become a central mathematical problem, with particular urgency in generative AI. The resulting tensions are both conceptual and computational: empirical measures are singular, dimensions are large, parametrizations are non-convex, and numerical approximations cannot be separated from statistical error. The goal is to expose the OT tools that organize these tensions, while keeping their connection to the training and deployment of large models in view.

Several books already cover optimal transport from complementary viewpoints. The two-volume monograph of Rachev and Rueschendorf Rachev & Rüschendorf, 1998Rachev & Rüschendorf, 1998 gives a broad probabilistic treatment of mass transportation and its applications. Villani’s books Villani, 2003Villani, 2009 are the standard references for the modern mathematical theory, from Kantorovich duality to curvature, concentration and geometric analysis. Santambrogio’s text Santambrogio, 2015 offers a concise applied-mathematics route through the same foundations, with a strong emphasis on PDEs and variational arguments, and Ambrosio, Gigli and Savare Ambrosio et al., 2006 develop the metric-space theory of gradient flows that underlies the dynamical part of the subject. On the computational side, Peyre and Cuturi Peyré & Cuturi, 2019 provide the reference account of numerical OT, entropic regularization and applications in data sciences; Galichon’s book Galichon, 2016 explains the economic and matching-theoretic viewpoint; and the statistical theory of OT is developed in the recent lecture notes of Chewi, Niles-Weed and Rigollet Chewi et al., 2025. Recent surveys complement these books by emphasizing scalable algorithms and machine-learning applications Khamis et al., 2024Montesuma et al., 2023, as well as the role of OT in imaging and graphics Bonneel & Digne, 2023. These references remain the natural places to find exhaustive proofs, historical details and specialized variants.

All material for this book, including the code used to reproduce the figures, is available at gpeyre/ot4ml. Most computational figures were produced with the Python Optimal Transport (POT) library Flamary et al., 2021. The author warmly thanks the POT team and contributors for their important and sustained effort in making reliable optimal-transport algorithms available to the community.

Interactive Web Book¶

This web version gives the LaTeX book a second life as an interactive reading environment:

the mathematical exposition stays close to the book;
the publication figures sit directly beside small parameter panels;
the reader can change meaningful quantities and immediately see their influence;
the interface stays focused on the book content.

References¶

Rachev, S. T., & Rüschendorf, L. (1998). Mass Transportation Problems: Volume I: Theory. Springer.
Rachev, S. T., & Rüschendorf, L. (1998). Mass Transportation Problems: Volume II: Applications. Springer.
Villani, C. (2003). Topics in Optimal Transportation (Vol. 58). American Mathematical Society.
Villani, C. (2009). Optimal Transport: Old and New (Vol. 338). Springer.
Santambrogio, F. (2015). Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Birkhäuser.
Ambrosio, L., Gigli, N., & Savaré, G. (2006). Gradient Flows in Metric Spaces and in the Space of Probability Measures. Springer.
Peyré, G., & Cuturi, M. (2019). Computational Optimal Transport: With Applications to Data Science. Foundations and Trends in Machine Learning, 11(5–6), 355–607. 10.1561/2200000073
Galichon, A. (2016). Optimal Transport Methods in Economics. Princeton University Press.
Chewi, S., Niles-Weed, J., & Rigollet, P. (2025). Statistical Optimal Transport (Vol. 2364). Springer. 10.1007/978-3-031-85160-5
Khamis, A., Tsuchida, R., Tarek, M., Rolland, V., & Petersson, L. (2024). Scalable Optimal Transport Methods in Machine Learning: A Contemporary Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 10.1109/TPAMI.2024.3379571
Montesuma, E. F., Mboula, F. N., & Souloumiac, A. (2023). Recent Advances in Optimal Transport for Machine Learning. arXiv Preprint arXiv:2306.16156.
Bonneel, N., & Digne, J. (2023). A Survey of Optimal Transport for Computer Graphics and Computer Vision. Computer Graphics Forum, 42(2), 439–460. 10.1111/cgf.14778
Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., Gautheron, L., Gayraud, N. T. H., Janati, H., Rakotomamonjy, A., Redko, I., Rolet, A., Schutz, A., Seguy, V., Sutherland, D. J., … Vayer, T. (2021). POT: Python Optimal Transport. Journal of Machine Learning Research, 22(78), 1–8. http://jmlr.org/papers/v22/20-451.html