JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS ARTICLE NO.
215, 86]94 Ž1997.
AY975618
On the Monotonicity of Optimal Transportation Plans* J. A. Cuesta-Albertos Departamento de Matematicas, Estadıstica y Computacion, ´ ´ ´ Uni¨ ersidad de Cantabria, Spain
C. Matran ´ Departamento de Estadıstica e In¨ estigacion ´ ´ Operati¨ a, Uni¨ ersidad de Valladolid, Spain
and
A. Tuero-Dıaz ´ Departamento de Matematicas, Estadıstica y Computacion, ´ ´ ´ Uni¨ ersidad de Cantabria, Spain Submitted by N. H. Bingham Received November 22, 1996
We obtain analytical properties of the maps that give optimal transportation plans for the L2-Wasserstein distance. We take advantage of the monotonicity of such optimal transportation plans to discuss their measurability and continuity. As a main result of independent interest we obtain the a.e. continuity of monotone Žin the sense of Zarantonello. operators. Q 1997 Academic Press
1. INTRODUCTION The Monge]Kantorovich mass-transportation problem is one of the best-known formulations of a widely treated problem in several areas of mathematics Žsee the book by Rachev w10x for a detailed account, or w6x for * Research partially supported by DGICYT, Grants PB95-0715-C02-00, 01 and 02. 86 0022-247Xr97 $25.00 Copyright Q 1997 by Academic Press All rights of reproduction in any form reserved.
87
MONOTONICITY OF OPTIMAL PLANS
a recent survey.. In this formulation, the objective is to minimize the total cost of transportation of a mass, whose location is initially given by a probability measure P, to a final location given by another probability measure Q. Our set-up will be the following: we will assume that the probabilities are defined on the Borel sets of the Euclidean space R p , p G 1, with its usual inner product ² ? , ? : and norm 5 5 and that the cost of transportation of a unit mass between its initial x and final y locations is measured by 5 x y y 5 2 . Let us denote by P2 the set of the probability measures P on R p with H 5 x 5 2 dP - `, and, given P, Q g P2 , let M Ž P, Q . be the set of the probability measures on R p = R p with marginal distributions P and Q. The L2-Wasserstein distance between P and Q is the value W Ž P, Q . defined through W 2 Ž P , Q . [ inf
½H
5 x y y 5 2 m Ž dx, dy . ,
m g M Ž P , Q. .
5
Ž 1.
The infimum in Ž1. is attained, so that to find W Ž P, Q . it is enough to obtain, in some probability space Ž V, s , n ., a pair Ž X 0 , Y0 . of random vectors Žr.v.’s. with distribution laws L Ž X 0 . s P and L Ž Y0 . s Q such that
H
X 0 y Y0
2
dn s inf
½H
5 X y Y 5 2 dn ,
L Ž X . s P,
L ŽY . s Q
5
Ž sW 2 Ž P , Q . . . Such a pair Ž X 0 , Y0 . is called an Ž L2 .-optimal transportation plan Žin short, o.t.p.. for Ž P, Q .. However, even in this ‘‘easy’’ set-up concerning transportation on Euclidean spaces, it is widely assumed that only the one-dimensional problem admits a general explicit solution. It was proved in w3x that, under continuity assumptions on P, the 2 L -optimal transportation plan Ž X 0 , Y0 . can be represented as Ž X 0 , c Ž X 0 .. for some suitable optimal map c . Therefore c defines an optimal plan for the transportation. A remarkable fact is that the optimality of a map c is essentially independent of the distribution of X 0 Žsee w7x.. It is well known that, in the real case, optimal maps coincide with increasing arrangements. Increasing maps also play an important role in the multivariate case Žsee, for instance, w3, 8x., but the relevant concept of increasing map is that of monotone operator in the sense of Zarantonello: Gi¨ en D ; R p , a mapping H: D ª R p is said to be Z-increasing if ² H Ž x . y H Ž x9., x y x9: G 0 for e¨ ery x, x9 in D. An important fact about the structure of the o.t.p.’s on R p , is the existence of recent monotonicity results even in the case in which the cost of transportation is related to the Lq-norm, 1 F q - ` Žsee w3, 11x for q s 2 and w12]14, 9x for general q ..
88
CUESTA-ALBERTOS, MATRAN ´ , AND TUERO-D´ıAZ
In this paper we provide some technical results which complete and justify some non-obvious steps of the proof in w3x. These results allow us to conclude that every optimal map is a.s. Z-increasing ŽTheorem 2.3.. Then we obtain some properties of Z-increasing maps which, as a consequence, are also satisfied by the o.t.p.’s. In particular, paralleling the fact that increasing functions on the line have at most a countable set of discontinuity points, in Theorem 2.8 we will prove the almost everywhere continuity of multidimensional Z-increasing functions. This basic property has been the main tool for the proof in w4x of the consistency of Monte-Carlo procedures for obtaining approximations to the o.t.p.’s. These procedures are currently the only practical way to Žapproximately. determine o.t.p.’s.
2. THE RESULTS The symbol l p will denote the Lebesgue measure on R p , and absolute continuity of a measure m with respect to l p will be denoted by m < l p . Given a pair of r.v.’s Ž X, Y ., S x Ž Y . denotes the support of a regular conditional distribution of Y given X s x and aA denotes the cardinal of the set A. Given a vector ¨ g R p , we denote by ² ¨ : the linear space spanned by ¨ , and the superscript H denotes orthogonal subspace. A precise formulation of the Z-increasing character of optimal maps is provided by the following statement Žwhose proof, valid for separable Hilbert spaces, can be found in Theorem 2.3 and Corollary 2.7 of w3x.. THEOREM 2.1. Let P g P2 and assume that there exists an orthogonal p basis, ¨ n4ns1 , such that, for each n and almost e¨ ery v in ² ¨ n :H , the conditional distribution function on ² ¨ n : gi¨ en v is continuous. Let Ž X, Y . be an optimal transportation plan for Ž P, Q . defined on the probability space Ž V, s , n .. Then we ha¨ e that: Ža. P x: aS x Ž Y . s 14 s 1. Žb. n m n Ž v , v 9.: ² X Ž v . y X Ž v 9., Y Ž v . y Y Ž v 9.: - 04 s 0. From statement Ža. it is evident that if Ž X, Y . is an o.t.p. for Ž P, Q . and we consider the n-a.s. defined choice function x ª H Ž x ., where H Ž x . is the only element in S x , then we obtain that Y s H Ž X ., n-a.s. and that, from Žb., P m P Ž x, x9 . : ² H Ž x . y H Ž x9 . , x y x9: - 0 4 s 0.
Ž 2.
The hypotheses in Theorem 2.1 are trivially satisfied by probability measures that are absolutely continuous with respect to l p . Therefore, for simplicity, we will generally assume that the probability measure P satisfies this property.
MONOTONICITY OF OPTIMAL PLANS
89
Two open questions remain about H: its measurability and the existence of a P-probability one set D in which H is a Z-increasing map. Both questions are solved in Theorem 2.3, where we apply Lemma 2.2 which is taken from w11x. In this lemma, f denotes the subdifferential of the function f. In connection with the second question, it is worth noting that it is not trivial that Ž2. implies the existence of a P-probability one set on which H is Z-increasing, in spite of the fact that this has been taken for granted sometimes Žsee w3, 15x.. LEMMA 2.2. Let P, Q be probability measures on R p. The pair Ž X, Y . of R p-¨ alued random ¨ ectors is an optimal transportation plan for Ž P, Q . if and only if there exists a lower semicontinuous proper con¨ ex function f on R p such that y g f Ž X . n-a.s. THEOREM 2.3. Let Ž X, Y . be an optimal transportation plan for Ž P, Q . and assume that P < l p . Then there exists a P-probability one set D and a map H: D ª R p such that Ža. Y s H Ž X ., n-a.s. Žb. H is Borel-measurable. Žc. H is Z-increasing on D. Proof. Measurability of D [ x: aS x s 14 when Y is a real r.v. can be easily proved by handling a conditional distribution function F Ž yrX s x . of Y given X s x and noting that the set x: aS x s 14 coincides with
F x : sup F Ž yrX s x .
kg N
yF Ž y9rX s x . : y, y9 g Q, 0 - y y y9 - ky1 4 s 1 4 . p Now let f n4ns1 be defined by f nŽ y . [ ² y, e n : for some orthonormal p 4 Ž basis e n ns1. If P ?rX s x . is a regular conditional distribution of Y given X s x, then FnŽ zrX s x . [ P Ž f nŽ Y . F zrX s x . are conditional distribution functions of f nŽ Y .. Moreover the corresponding supports satisfy
x : aS x Ž Y . s 1 4 s F x : aS x Ž f n Ž Y . . s 1 4 , n
and, if we denote this set by A, we have that, if B is a Borel set, then
x g A: H Ž x . g B 4 s x g A: P Ž BrX s x . s 1 4 . Therefore we have proved that there exists a P-probability one measurable set A and a map H whose restriction to A is measurable and such that Ž X, H Ž X .. is an o.t.p.
90
CUESTA-ALBERTOS, MATRAN ´ , AND TUERO-D´ıAZ
An equivalent condition to Y g f Ž X ., n-a.s., is that ² X, Y : s f Ž X . q f *Ž Y ., n-a.s. Žwhere f * denotes the conjugate function of f .. Thus, taking into account the measurability of the functions involved Ž f * is also a lower-semicontinuous proper convex function. and Lemma 2.2, we have that U s x g A: H Ž x . g f Ž x .4 is a P-probability one measurable set. Since the subdifferentials of lower-semicontinuous proper convex functions are Z-increasing operators Žsee Theorem 2.3 in w1x., Žc. is also proved. Theorem 2.3 covers our objectives from the point of view of the o.t.p.’s, but it does not answer the question: Does condition Ž2. imply the existence of a P-probability one set on which H is Z-increasing? This is because not every Z-increasing map is an o.t.p. The answer to this question is also positive from the following result, whose proof can be found in w5x. THEOREM 2.4. Let P < l p be a probability measure. Let T be a set such that P ŽT . s 1 and suppose that H: T ª R p is a map such that P m P Ž x, x9 . : ² x y x9, H Ž x . y H Ž x9 .: - 0 4 s 0. Then there exists a P-probability one set D such that H is Z-increasing on D. We now address the study of the a.s. continuity of the Z-increasing maps, for which we will use some additional notation and a previous result which we state as a lemma. This lemma is included in Theorem 1.5 in w1x and it will be generalized in Lemma 2.7. We will denote by H: D ; R p ª R p a Z-increasing map defined on the measurable set D; e1 , . . . , e p 4 is a fixed orthonormal basis in R p , and, given x g R p , we denote x j [ ² x, e j :, j s 1, . . . , p. We base our continuity proof on the study of the maps H j Ž x . [² H Ž x . , e j:
and
H j Ž x . [ H j Ž x . e j , j s 1, . . . , p.
LEMMA 2.5. Assume that D s R p and let x n4n be a con¨ ergent sequence in R p. Then the sequences H j Ž x n .4n , j s 1, 2, . . . , p, are bounded. PROPOSITION 2.6. Let H: R p ª R p be a Z-increasing map. Then H is almost e¨ erywhere continuous with respect to the Lebesgue measure. Proof. Since H s Ý j H j Ž x ., the result will be proved if we show that H , j s 1, . . . , p, is l p-a.e. continuous. In order to do so, let us assume that H 1 is not l p-a.e. continuous and let DŽ H 1 . be the set of discontinuities of H 1. It is well known Žsee, for instance, the footnote in p. 343 of w2x. that the set of discontinuities of every function is Borel-measurable. j
91
MONOTONICITY OF OPTIMAL PLANS
On the other hand, by using a sequence Kn4n of compact sets such that Kn R p , the hypothesis implies that l p w DŽ H 1 . l Kn x ) 0 for some n. Hence, from Lemma 2.5, we obtain that the function H 1 is bounded by a constant K on the measurable non-null set M [ DŽ H 1 . l Kn . Let V1Ž x . [ x q a e1: a g R 4 . There exists x 0 orthogonal to e1 such that V1Ž x 0 . l M is not countable Žif this set were countable for every x 0 in ² e1 :H then l p Ž M . ) 0 would not be possible.. Since H 1 is bounded on any compact set, we can define H 1 Ž x q . [ lim sup H 1 Ž y .
H 1 Ž x y . [ lim inf H 1 Ž y . .
and
yªx
yªx
Also, for every d ) 0, let us define the sets Bd1 [ x : H 1 Ž xq. ) H 1 Ž x . q d 4 ,
Bd2 [ x : H 1 Ž xq. - H 1 Ž x . y d 4 ,
Cd1 [ x : H 1 Ž xy. - H 1 Ž x . y d 4 ,
Cd2 [ x : H 1 Ž xy. ) H 1 Ž x . q d 4 .
Since Bd1n j Bd2n j Cd1n j Cd2n DŽ H 1 ., as dn x0, the nondenumerability of the set V1Ž x 0 . l M implies the existence of d ) 0 such that one of the sets Bd1 l V1 Ž x 0 . l M,
Bd2 l V1 Ž x 0 . l M,
Cd1 l V1 Ž x 0 . l M, or
Cd2 l V1 Ž x 0 . l M
is not denumerable. Assume, for instance, that the set Bd1 l V1Ž x 0 . l M is not denumerable. Then there exists a sequence yn4n in this set such that ² yn , e1 : - ² ynq1 , e1 : ,
;n g N.
If we show that H 1 Ž ynq 1 . G H 1 Ž yn q . the proposition will be proved because, by definition of Bd1 , we would then have that H 1 Ž yn . ) H 1 Ž y 1 . q Ž n y 1 . d , and this would contradict that H 1 Ž yn . F K, ; n g N . To simplify the notation we only consider the case n s 1. Let x m 4m be a sequence which converges to y 1 and such that lim m H 1 Ž x m . s H 1 Ž y 1 q .. Then 0 F ² H Ž y2 . y H Ž x m . , y2 y x m : s
Ý Ž H j Ž y 2 . y H j Ž x m . .Ž y 2j y x mj . . j
92
CUESTA-ALBERTOS, MATRAN ´ , AND TUERO-D´ıAZ
But, from Lemma 2.5, the sequence H j Ž x m .4m is bounded and lim m Ž y 2j y x mj . s 0, if j G 2; therefore we have that 0 F lim inf Ž H 1 Ž y 2 . y H 1 Ž x m . .Ž y 21 y x 1m . m
s lim inf m
Ž H 1 Ž y 2 . y H 1 Ž x m . .Ž y 21 y y11 . q Ž H 1 Ž y 2 . y H 1 Ž x m . .Ž y 11 y x 1m .
s lim inf Ž H 1 Ž y 2 . y H 1 Ž x m . .Ž y 21 y y 11 . , m
because lim m Ž y 11 y x 1m . s 0 and H 1 Ž x m .4m is bounded. From these inequalities it is evident that H 1 Ž y 1 q . F H 1 Ž y 2 .. Finally we will obtain an extension of the previous result to cover situations where H is defined on a proper subset of R p and a probability measure P is considered instead of the Lebesgue measure. The generalization of Lemma 2.5 is not trivial because Theorem 1.5 in w1x applies for sequences which converge to an interior point in the domain of H. Nevertheless, in Lemma 2.7 we generalize Lemma 2.5 by assuming that the probability P possesses an additional property which is always satisfied if P < l p Žsee w5x for a proof.. Given x g R p and d ) 0 we denote by B Ž x, d . the open ball with radius d centered at x and, if x, z g R p and d , a ) 0, we denote by SŽ x, z, d , a . the set B Ž x, d . l y / x : ang Ž y y x, z . - a 4 , where angŽ y y x, z . denotes the angle between the vectors y y x and z. Given x in the support of a probability P, let us say that P satisfies property C at x if for every z g R p and d , a ) 0, we have that P w S Ž x, z, d , a .x / 0. We will say that P satisfies property C , if P x : P S Ž x, z, d , a . ) 0, ; z g R p , d and a ) 0 4 s 1. LEMMA 2.7. Let x n4n ; D be a sequence that con¨ erges to a point x where P satisfies C . Then the sequences H j Ž x n .4n , j s 1, 2, . . . , p, are bounded. Proof. If we suppose that H j Ž x n .4n is not bounded for some j then there exists a subsequence Žwhich we can identify with the original one. such that if we denote Jq[ j: H j Ž x n . ª q` 4
and
Jy[ j: H j Ž x n . ª y` 4 ,
93
MONOTONICITY OF OPTIMAL PLANS
then Jqj Jy is not empty and the sequences H j Ž x n .4n are bounded if j f Jqj Jy. Given e ) 0, let us define Ae [ y / x : y j ) x j q e , j g Jq and y j - x j y e , j g Jy 4 . The probability P satisfies property C at x, so that there exists e ) 0 such that P Ž Ae . ) 0. Consider x 0 g Ae and set K [ sup j < H j Ž x 0 .< and K * [ sup K ,
ž
sup
jfJ qjJ y
½ sup
H jŽ xn .
n
5/.
Trivially K * - ` and we have that
² H Ž x 0 . y H Ž x n . , x 0 y x n: s
Ý Ž H j Ž x 0 . y H j Ž x n . .Ž x 0j y x nj . j
F 2 K*
Ý q
jfJ jJ
y
y
x 0j y x nj q
Ý Ž H j Ž x 0 . y H j Ž x n . .Ž x 0j y x nj .
jgJ q
Ý Ž H j Ž x n . y H j Ž x 0 . .Ž x 0j y x nj .
jgJ y
which converges to y` because the differences < x 0j y x nj <4n are bounded for every j and they are greater than er2 from an index onward if j g Jqj Jy. But this is not possible because H is an increasing map. Now, taking into account that property C is satisfied by absolutely continuous measures and the preceding argument, we can conclude with the following result. THEOREM 2.8. Let H: D ; R p ª R p be a Z-increasing map and let P < l p be a probability measure such that P Ž D . ) 0. Then P x g D: H is not continuous at x 4 s 0.
ACKNOWLEDGMENTS The authors thank an anonymous referee and Professor N. H. Bingham who suggested several modifications which have considerably improved the paper.
REFERENCES 1. V. Barbu, ‘‘Nonlinear Semigroups and Differential Equations in Banach Spaces,’’ Noordhoff, Leiden, 1976.
94
CUESTA-ALBERTOS, MATRAN ´ , AND TUERO-D´ıAZ
2. P. Billingsley, ‘‘Probability and Measure,’’ Wiley, New York, 1986. 3. J. A. Cuesta-Albertos and C. Matran, ´ Notes on the Wasserstein metric in Hilbert spaces, Ann. Probab. 17 Ž1989., 1264]1276. 4. J. A. Cuesta-Albertos, C. Matran, ´ and A. Tuero-Diaz, Optimal transportation plans and convergence in distribution, J. Multi¨ ariate Anal. 60 Ž1997., 72]83. 5. J. A. Cuesta-Albertos, C. Matran, ´ and A. Tuero-Diaz, ‘‘Properties of the Optimal Maps for the L2-Wasserstein Distance,’’ Tech. Report, 1996. 6. J. A. Cuesta-Albertos, C. Matran, Mass transportation ´ S. T. Rachev, and L. Ruschendorf, ¨ problems in probability theory, Math. Sci. 21 Ž1996., 34]72. 7. J. A. Cuesta-Albertos, L. Ruschendorf, and A. Tuero-Diaz, Optimal coupling of multi¨ variate distributions and stochastic processes, J. Multi¨ ariate Anal. 46 Ž1993., 355]361. 8. P. Glasserman and D. D. Yao, ‘‘Optimal Couplings Are TP and More,’’ Tech. Report, Gresman School of Business, Columbia University, 1996. 9. R. McCann, Existence and uniqueness of monotone measure-preserving maps, Duke Math. J. 80, No. 2 Ž1996., 309]323. 10. S. T. Rachev, ‘‘Probability Metrics and the Stability of Stochastic Models,’’ Wiley, New York, 1991. 11. L. Ruschendorf and S. T. Rachev, A characterization of random variables with minimum ¨ L2-distance, J. Multi¨ ariate Anal. 32 Ž1990., 48]54. 12. L. Ruschendorf, Bounds for the distributions with multivariate marginals, in ‘‘Stochastic ¨ Order and Decision under Risk’’ ŽK. Mosler and M. Scarssini, Eds.., 285]310. IMS Lecture Notes, Vol. 19, Inst. Math. Statist., Hayward, CA. 13. L. Ruschendorf, Frechet-bounds and their applications, in ‘‘Advances in Probability ¨ ´ Measures with Given Marginals’’ ŽDall’Aglio, Kotz, and Salinetti, Eds.., pp. 151]188, Kluwer Academic, Dordrecht, 1991. 14. C. S. Smith and M. Knott, On Hoeffding-Frechet bounds and cyclic monotone relations, ´ J. Multi¨ ariate Anal. 40 Ž1992., 328]334. 15. A. Tuero-Dıaz, ´ On the stochastic convergence of representations based on Wasserstein metrics, Ann. Probab. 21, No. 1 Ž1993., 72]85.