Ergodic decomposition for inverse Wishart measures on infinite positive-definite matrices

The ergodic unitarily invariant measures on the space of infinite Hermitian matrices have been classified by Pickrell and Olshanski-Vershik. The much-studied complex inverse Wishart measures form a projective family, thus giving rise to a unitarily invariant measure on infinite positive-definite matrices. In this paper we completely solve the corresponding problem of ergodic decomposition for this measure.


Informal introduction and historical overview
Since their introduction in the 1920's [34] the Wishart measures have been ubiquitous in mathematics, physics and statistics. They appear in diverse fields, from statistical analysis [20], [18] to stochastic processes [28], [3] and free probability [9], [23]. More recently, there has been renewed interest in connection to their applications in quantum transport [11], [12] and their enumerative properties, in particular relations to Hurwitz numbers [10]. This paper stems from the following remarkable property, and its consequences, of these measures: the inverse Wishart measures {M (ν),N } N≥1 (ν is a real parameter greater than -1) defined in (3) on (positive-definite) Hermitian matrices form a projective family under the so-called corners maps given in (1) below. The goal of this paper is to study the corresponding unitarily invariant, namely invariant under conjugation by unitary matrices, measure M (ν) on infinite (positive-definite) Hermitian matrices and describe explicitly how it decomposes into ergodic components.
These ergodic measures for the action by conjugation of the inductive limit of unitary groups on infinite Hermitian matrices have been classified in classical works of Pickrell [26] and Olshanski and Vershik [25]. They are parametrized by the infinite dimensional space Ω defined in (2) below and depend on a set of parameters ({α + }, {α − }, γ 1 , γ 2 ) ⊂ R ∞ + × R ∞ + × R × R + . As we will see in Section 3, the α parameters are asymptotic eigenvalues and γ 1 and γ 2 are related to the asymptotic trace and asymptotic sum of squares respectively; the parameter γ 2 is also called the 'Gaussian component'. The study of γ 1 especially and also γ 2 is a much more difficult task compared to describing the α's.
The problem of ergodic decomposition of certain distinguished unitarily invariant measures on infinite Hermitian matrices was initiated by Borodin and Olshanski in [2]. They considered the Hua-Pickrell measures depending on a complex parameter s. These measures were first studied by Hua in his classical book [19] and implicit in his calculations is consistency under the corners maps, see also Neretin's generalization [24]. Borodin and Olshanski described the α parameters and proved that for s = 0 γ 2 = 0. The determination of γ 1 and γ 2 was left open for many years until recently in a breakthrough work [31] Qiu proved that γ 2 = 0 for general parameter s and completely described γ 1 for real s (see Remark 5.5 for more on this restriction). In the case of the inverse Wishart measures M (ν) we are able to completely describe all the parameters for all ν > −1. This is achieved in Theorem 1.6, the main result of this paper.
A closely related problem is the ergodic decomposition of the so-called Pickrell measures [27] (depending on a real parameter s) on infinite square complex matrices. The ergodic unitarily invariant (by multiplication to the left and to the right) measures on infinite square complex matrices have also been classified. These are parametrized by a different infinite dimensional space that is a subset of R ∞ + × R + (there is no analogue of γ 1 ). The explicit description of the ergodic decomposition has been settled in a series of papers by Bufetov [4], [5], [6] (see also [8]), which have been very influential for us. We should mention that the papers of Bufetov [4], [5], [6] and Qiu [31] also study the infinite case of the problem of ergodic decomposition, namely when the corresponding matrix measures no longer have finite mass. Since this requires quite different techniques we will not consider it in this work.
Finally, before closing this informal introduction we remark that a key role in all these papers [2], [4], [5], [6], [8], [31] is played by orthogonal polynomials. In the case of the Hua-Pickrell measures these are the Pseudo-Jacobi polynomials and in the case of the Pickrell measures these are the Jacobi. The analogous role in this paper is played by the Bessel [22] and also Laguerre polynomials.
In the next subsections we give the necessary background to make the informal discussion above precise and state our results rigorously.

Ergodic unitarily invariant measures on infinite Hermitian matrices
Let U(N) and H(N) be the group of N × N unitary matrices and the space of N × N Hermitian matrices respectively. Let U(∞) be the inductive limit of unitary groups: under the natural embeddings. In more explicit terms an element of U(∞) is an infinite matrix whose top corner is an N × N unitary matrix for some N and the rest is the identity matrix.
Consider the so called corners maps π N N−1 : Let H be the space of infinite Hermitian matrices defined as the projective limit: under the corners maps. Moreover, let H + (N) ⊂ H(N) denote the space of N × N positivedefinite matrices; namely matrices with positive eigenvalues. By Cauchy's interlacing theorem we get that π N N−1 : H + (N) → H + (N − 1). Thus, we can also correctly define the projective limit H + = lim It is a classical theorem of Pickrell [26] and also Olshanski and Vershik [25] that ergodic measures on H for this action (namely ones such that all U(∞)invariant subsets have mass 0 or 1) are parametrized by the infinite dimensional space Ω ⊂ R 2∞+2 : Observe that, Ω is a locally compact space. We then have the following classification theorem, see [26], [25], also [13].
There exists a parametrization of ergodic U(∞)invariant probability measures on the space H by the points of the space Ω described as follows.
Given ω ∈ Ω the characteristic function of the ergodic measure M ω is given by where: Note that the characteristic function F ω is well defined for all ω ∈ Ω by the fact that the sum of squares of the α's is finite. Also, observe that parameter γ 2 corresponds to an infinite random Hermitian matrix with independent (subject to the Hermitian constraint) complex Gaussian entries with mean 0, having variance γ 2 on the diagonal.

The inverse Wishart measures M (ν) on infinite positive-definite matrices
For ν > −1, consider the complex Wishart (or Laguerre ensemble) probability measure on N × N Hermitian matrices, supported on H + (N): where throughout the paper for a Hermitian matrix Y, dY denotes Lebesgue measure on H(N) (we suppress dependence on N): The restriction ν > −1 is so that the normalization constant const ν,N is finite. Under the change of variables Y = 2X −1 we obtain the central object of study in this paper, the inverse Wishart probability measures on H + (N): Observe that, for all N ∈ N the measure M (ν),N is unitarily invariant, namely invariant under the action of U(N) by conjugation. Then, we have the following consistency result: Thus, by Kolmogorov's theorem we obtain a unitarily invariant measure M (ν) on H that is supported on H + . Proposition 1.2 is proven in Section 2 as Proposition 2.3. A key role in the proof is played by the Bessel orthogonal polynomials [22].

Description of ergodic decomposition of M (ν)
Consider the measure M (ν) defined above for ν > −1. It is a result of Borodin and Olshanski, see Proposition 4.4 in [2], that there exists a unique probability measure m (ν) on Ω such that: Here, the equality is interpreted as integrated against a test function, see Section 3. The main result of this paper is the explicit description of m (ν) . To proceed and state it precisely we need some more definitions and background on determinantal measures.
Determinantal measures Let X be a locally compact Polish space equipped with a σfinite reference measure µ. Let Conf(X) be the space of point configurations over X. Points in a point configuration X ∈ Conf(X) will be called particles. We can embed Conf(X) in the space of finite measures on X by X → x∈X δ x and with the induced topology Conf(X) is a Polish space. A Borel probability measure P on Conf(X) is called a determinantal point process or measure, see [29], with (Hermitian) kernel K : X × X → C if for any n ∈ N and function Φ ∈ C c (X n ), continuous and of compact support, we have: where the sum is taken over ordered n-tuples of particles with pairwise distinct labels. The measure P satisfying (6) completely determines the pair (K, µ) and dropping dependence on µ (usually fixed), we denote it by P K .
Throughout this paper the reference measure will be Lebesgue measure. We now define a distinguished determinantal measure on (0, ∞). Definition 1.5. Define the Bessel kernel J ν by: where J ν is the Bessel function of order ν. This kernel gives rise to the Bessel determinantal point process P J ν on (0, ∞) of Tracy and Widom [32].
We are ready to give the explicit description of the ergodic decomposition of the inverse Wishart measures M (ν) . Theorem 1.6. Let ν > −1. The spectral measure m (ν) associated to M (ν) is concentrated on Ω + 0 :

Moreover, the law of the parameters α
Theorem 1.6 above will be proven by an approximation procedure, the so-called 'ergodic method' of Olshanski and Vershik [25], that is explained in Section 3. For this asymptotic analysis we will exploit the relation with the Laguerre ensemble.

Organisation of the paper
In Section 2, we prove consistency for the inverse Wishart measures. This is proved at the level of eigenvalue measures first and then lifted to matrices. The key ingredient is the backward shift equation for Bessel polynomials. In Section 3, we recall in detail the approximation procedure of Vershik, Borodin and Olshanski. In Section 4, we study the α parameters. We obtain the parameters α + as limits of the Bessel orthogonal polynomial ensemble, which is a determinantal point process. In Section 5, we obtain certain estimates for the kernel of the Bessel orthogonal polynomial ensemble. These are essential for the study of γ 1 and γ 2 . In Section 6, we prove that γ 2 ≡ 0. In Section 7, we treat the γ 1 parameter. A key role is played by positivity along with the estimates from Section 5. Finally, in Section 8 we simply put everything together to conclude the proof of Theorem 1.6.
Acknowledgements. I would like to thank Alexei Borodin and Grigori Olshanski for some useful comments and pointers to the literature. Research supported by ERC Advanced Grant 740900 (LogCorRM).

Consistency of M (ν),N
Let ν > −1. Recall that, we are interested in the normalized (probability) measure: where the normalizing constant const ν,N will be given explicitly below. By Weyl's integration formula we get that the induced probability measure on eigenvalues in the Weyl chamber W N (we write W N + if all coordinates are positive): with the Bessel weight w ν N (·) on (0, ∞): x and we will write for the Vandermonde determinant. Moreover, the constants const ν,N and const ν,N are related by, It will be convenient to introduce the following notation: we define the map eval N : the vector of eigenvalues of H in weakly decreasing order. In this notation we have: Bessel polynomials The orthogonal polynomials with respect to w ν N (x), that are called the Bessel polynomials [22], will be important for us. The reference for all the facts stated below is Chapter 9.13 pages 244-247 of [21].
For ν > −1, there exist p 0 (·; ν, N), · · · , p N−1 (·; ν, N) monic orthogonal polynomials of degree 0, · · · , N −1 with respect to w ν N . These can be expressed in terms of hypergeometric functions but we shall not need any explicit expression here. Their norms are given by, for n = 0, · · · , N − 1: Here, (a) k = k i=1 (a + i − 1), (a) 0 = 1 is the Pochhammer symbol. Also, we have the following key relation, called the backward shift equation, that relates orthogonal polynomials with respect to w ν N to orthogonal polynomials with respect to w ν N+1 : Furthermore, by writing the Vandermonde determinant in terms of the monic orthogonal polynomials p n (·; ν, N) a standard calculation gives that: and thus we also obtain an explicit expression for const ν,N . Now, we introduce the following Markov kernel Λ N+1 N from W N+1 to W N , defined for x ∈W N+1 (the interior of W N+1 ) as: where y ≺ x denotes interlacing: This Markov kernel has a random matrix interpretation, due to Baryshnikov [1] (the required computation is already implicit in the classical book of Gelfand and Naimark [16]), which easily extends to any x ∈ W N+1 (even when the coordinates coincide): where U is a random (Haar distributed) unitary matrix from U(N + 1). We will first prove consistency at the level of eigenvalues: Proof. Since both sides are probability measures, it suffices to prove that they are equal up to multiplicative constant (we will use the notation ∝ for this). Since µ ν N+1 is supported onW N+1 we can write (note that since y ≺ Note that we can write: .
We thus need to show: By multilinearity of the determinant we obtain that the LHS of (11) is where y 0 = ∞, y N+1 = 0. Now, by the backward shift equation we can perform the integral inside the determinant and obtain for i ≥ 2: We note that when evaluated at y 0 = ∞ and y N+1 = 0 the terms above vanish. Hence, we get that the LHS of (11) is proportional to (note that the entry with index (2,2) is the , similarly for the entry with index (N+1,2)): Successively adding column j to column j + 1 we obtain that this is equal to, (11) immediately follows.

Remark 2.2. As pointed out to me by Grigori Olshanski a similar idea of using the backwards shift equation to prove consistency appears in the study of the q-zw measures on the quantized Gelfand-
Tsetlin graph [17]. The corresponding orthogonal polynomials are the Pseudo big q-Jacobi, see [17] and the references therein.
We will now, making use of Baryshnikov's result [1], prove consistency for the matrix measures.
On the other hand from Proposition 2.1 we have: Hence, we need to show: Now, let λ = (λ 1 ≥ λ 2 ≥ · · · ≥ λ N+1 ) be fixed and consider the orbital measure on H(N + 1) where U is Haar distributed in U(N + 1). Then, since (eval N+1 ) * n λ = δ λ , Baryshnikov's result [1] can be written as: By Weyl's integration formula we have: So we obtain: By linearity of the operations involved and so we finally arrive at: Remark 2.4. The exact same scheme of proof, can be applied to the case of the Hua-Pickrell measures. One uses, instead the backward shift equation for the Pseudo-Jacobi polynomials, see Section 9.9 of [21].
We finally reinterpret the proposition above to obtain the following matrix integral. This is an analogue of Hua's integrals [19], [24] for the inverse Wishart measures. Let For X ∈ H(N + 1) we write:

Approximation of spectral measures
In order to proceed we will need to get a handle on the abstract spectral measure m (ν) . The following approximation procedure of Olshanski-Vershik [25] and Borodin-Olshanski [2], based on the ergodic method of Vershik [33] allows us to do so.

Define the corner map π ∞
We also set, A matrix X ∈ H is called regular and we will write X ∈ H reg if the following limits exist: We can easily see that i α + i (X) 2 + α − i (X) 2 ≤ δ(X) and we define: We also define r N : H → Ω by: and similarly r ∞ : H → Ω, that is defined correctly on H reg and thus almost everywhere on H as we shall see next: It has been shown in Section 5 of [2] that any U(∞)-invariant probability measure M on H is supported on H reg . Moreover, define the unique spectral measure P M associated to M (in the particular case M = M (ν) we write as we did in the introduction m (ν) = P M (ν) ) by, see Section 5 of [2]: where the equality is understood as follows: for all Borel functions F on H: Then, by Section 5 of [2] the spectral measure P M is in fact given by: Finally, see Section 5 of [2], we have weak convergence of probability measures: We will need some more definitions used in later sections. For X (deterministic or random) in H and H reg respectively define the point configurations: omitting possible zeroes. We will write C (ν) . For M on H that is U(∞)-invariant we will let P N and P (we drop dependence on M) denote the corresponding random point configurations, namely C N (X) and C(X) if Law(X) = M.
We need a final definition for a general measure P on Conf(X). We define its correlation functions ρ n for each n ≥ 1 (if they exist) with respect to µ, by replacing the right hand side of (6) by: X n Φ(x 1 , · · · , x n )ρ n (x 1 , · · · , x n )dµ(x 1 ) · · · dµ(x n ).
In particular, for a determinantal measure P K we have ∀n ≥ 1:

The α − parameters
The following proposition is obvious from the definitions in Section 3: Thus, we can restrict our attention on the parameters α + which form a random point configuration on X = (0, ∞) (throughout, the reference measure µ on (0, ∞) will be Lebesgue measure), which we go on to study next.

Explicit expression for the correlation kernel and the α + parameters
It is a standard result from random matrix theory that the measure µ ν N gives rise to a determinantal point process on (0, ∞) with N particles. It is the orthogonal polynomial ensemble associated to the Bessel weight w ν N . We will denote its correlation kernel by K ν N . This is given explicitly in terms of Bessel polynomials: However, we will not make use of this expression and for the purposes of the asymptotic analysis we will instead exploit the connection to the Wishart ensemble and Laguerre polynomials. First, we need to recall a simple fact about transformations of determinantal measures on X = (I 1 , I 2 ) ⊂ R. Let f : X → X be a C 1 bijection and let g = f −1 be its inverse. Then, f induces a homeomorphism of Conf(X): for a configuration X the particles of f (X) are of the form f (x), x ∈ X. Let P K be a determinantal measure on X. Then, its pushforward under f is also a determinantal measure g(y)).
The Laguerre ensemble L ν N , defined for ν > −1, is the probability measure on W N By Weyl's integration formula this is the eigenvalue law of the complex Wishart measure on Hermitian matrices but we will not need this fact. We will denote for n ≥ 1, by L ν n the monic Laguerre polynomials, orthogonal with respect to the measure (note that this, unlike w ν N , does not depend on N): Their squared norm is given by: It's a standard result that the Laguerre ensemble L ν N (dx) gives rise to a determinantal point process on (0, ∞) with N particles and its correlation kernel L ν N , with respect to Lebesgue measure, is given by: Using the Christoffel-Darboux formula we can further write: , ∀N ≥ 1. Thus, the correlation kernel K ν N of the determinantal point process associated to µ ν N is given by: Furthermore, the scaled point process C (ν) where Law(X) = M (ν) is determinantal on (0, ∞) with correlation kernel K ν N , with respect to Lebesgue measure, given by: And also in terms of the Laguerre kernel Proposition 4.2. Let ν > −1. Then, we have uniformly on compacts in (0, ∞): Furthermore, the determinantal point process C (ν) and so do all the correlation functions. Finally, under the transformation x → 8 x we get the Bessel point process: Proof. We first write using relation (19): Then, the first statements of the proposition are immediate consequences of the following well-known facts: the uniform convergence on compacts in (0, ∞), with z 1 = 8 and convergence of the Laguerre ensemble determinantal point process at the hard edge scaling limit to the Bessel process, see [32], [14], [15]. These results follow from uniform on compacts asymptotics for Laguerre polynomials, see [30]. The final statement is immediate from the transformation rule for determinantal point processes.

Remark 4.3. Using, the representation of K ν N and thus K ν N in terms of the Bessel polynomials it is possible to give an alternative proof of Proposition 4.2. The analysis boils down to asymptotics for hypergeometric functions.
Finally, the following completes the description of the α parameters.
Proof. Observe that, since the α + i (X) are strictly decreasing (by Proposition 4.2 they form a determinantal point process) it suffices to prove that: C (ν) (X) has infinitely many points for M (ν) − a.e. X ∈ H reg .
Under the map x → 8 x , by Proposition 4.2, it then suffices to prove that the Bessel point process P J ν has infinitely particles almost surely which is well-known result. It is a consequence of the fact that J ν is a projection kernel of infinite rank.

An estimate on the correlation kernel
In this section we obtain certain, uniform in N, estimates on K ν N , that will be useful for the description of the parameters γ 1 and γ 2 . First, we have the following estimate, which will be used to obtain that almost surely γ 2 ≡ 0.
The estimate above is an immediate consequence of the next result. Proposition 5.2 will moreover be the key estimate in proving that almost surely Using relation (19), then in terms of the Laguerre kernel L ν N it will suffice to prove: Proposition 5.3. Let ν > −1. For any ǫ > 0 there exists R = R(ǫ) large enough such that for all N ∈ N: Bounds for Laguerre polynomials In order to prove Proposition 5.3 we first need to recall some bounds for Laguerre polynomials from Szego, [30]. Theorem 7.6.4 in [30] gives for the monic Laguerre polynomials L ν n (note that in [30] the result is written for the polynomials with leading coefficient (−1) n n! ): In principle, the constant cnst in the range n −1 ≤ x ≤ w depends on w. However, a close inspection, see Theorem 8.22.5 in [30], shows that it is of order 1 and thus we have decided to drop dependence on w. Now, recall that L ν n 2 2 = n!Γ(n + ν + 1). Then, using the classical fact for ratios of Gamma functions: and the trivial bound e −x ≤ 1 we obtain for the squared normalized Laguerre wavefunction L ν N (with a different constant cnst): With these preliminaries in place we are ready to prove Proposition 5.3.
Proof of Proposition 5.3. Let ǫ > 0 be arbitrary. Below, we will pick (in this order) constants w, l, R depending on ǫ. We will use the notation to mean ≤ up to a constant; with the implicit constant being independent of ǫ (and obviously of the quantities w, l, R depending on it) and uniform in N. It is clear that it suffices to prove that we can find R large enough such that uniformly in N ∈ N: First, recall that We split the integral, with w to be picked according to ǫ (large): We then, using the fact that ∞ 0 L ν n (y)dy = 1 for all n, easily estimate We now focus on the integral and split the range of summation, for l to be picked depending on ǫ (small), as follows: We will moreover, in the range 1 ≤ n < lN, split the integral further as: Note that, since the integrand is positive, in case R N > 1 n , we simply estimate: Thus, we have the bound where we define We now go on to estimate each of these terms individually. Using the bound for L ν n in the range 0 ≤ x ≤ n −1 , we estimate J 1 : We split into three cases. For ν > 0 we have: While, for ν = 0: Finally, for −1 < ν < 0: Observe that, the last bound decreases as R increases since ν is negative. Now, using the bound for L ν n in the range n −1 ≤ x ≤ w, we estimate J 2 : Finally, we turn to J 3 . We assume that R is large enough so that R ≥ 1 l . The fact that this is possible (not trivial apriori since both quantities depend on ǫ) will be clear by the choices made below. Thus, the bound for L ν n valid in the range n −1 ≤ x ≤ w, is valid throughout the range of integration R N ≤ x ≤ w. Hence, we estimate Putting everything together we obtain: Hence, we can choose (w, l, R) as follows: Note that, in all cases above the requirement R ≥ 1 l , used to bound J 3 , is satisfied. Thus, if we take R = R(ǫ) as above, we get uniformly in N ∈ N The proof is complete.
The proof of Proposition 5.3 also gives the following lemma, written in terms of K ν N : This as we see in the next section gives that γ 2 ≡ 0.
On the other hand, in the analysis of parameter γ 1 , one is lead to consider the following term: For s ∈ R, due to the symmetry of the kernel K s,N HP (−x, −x) = K s,N HP (x, x), it is immediate that this term vanishes identically. Then, one is left with terms that can be easily estimated by (39). In general, the problem of estimating (40) (when it's not trivial) appears to be open. 6 The parameter γ 2 We have the following proposition.
This is exactly the statement of Proposition 5.1.

7
The parameter γ 1 Proposition 7.1. Let ν > −1. Then In particular, e. X ∈ H reg . Moreover, assume that Let P N , P be the corresponding point processes (of α + 's) on (0, ∞) and let ρ (N) 1 and ρ 1 be their first correlation functions with respect to Lebesgue measure (assuming they exist). Assume that for any Φ ∈ C c ((0, ∞)) Then, we have We first need an elementary lemma. Lemma 7.3. Assume we are given numbers ∀N ≥ 1 Moreover, assume the following limit exists and is finite: Note that by Fatou's lemma (and positivity) Let Φ be a continuous function on (0, ∞) such that Proof. Observe that, there exists k such that α + k+1 < ǫ. Then α + k+1,N < ǫ for N sufficiently large and α + i,N < ǫ for i ≥ k + 1 by monotonicity. Also, α + i < ǫ for i ≥ k + 1. Therefore and by the assumptions of the lemma The statement now follows.
Proof of Proposition 7.2. First, observe that M is supported on the subset H * reg ⊂ H reg that we now define. An element X ∈ H * reg iff: Fix a continuous function Φ(x) ≥ 0, vanishing for x large enough, such that Φ(x) = x near 0. For any X ∈ H * reg we set: Apply the previous lemma to the sequences α + i,N = α + i,N (X), α + i = α + i (X) for X ∈ H * reg (note that all conditions are satisfied) to get: φ N (X) → φ ∞ (X) + ∆(X).
Observe that all three functions φ N , φ ∞ , ∆ are non-negative and thus Fatou's lemma gives: Associate the point configurations C N (X), C(X) to X ∈ H * reg . Then so that by the definition of the correlation functions We now proceed to show that lim sup Since ∆(X) ≥ 0 on H * reg we get that ∆(X) ≡ 0 for M − a.e. X ∈ H * reg , from which, recalling that M is supported on H * reg the conclusion of the proposition follows. To this end, decompose Φ(x) as follows, for arbitrary ǫ > 0: Proof. First of all, we note that M (ν) is supported on the subset H + reg ⊂ H reg that we now define. An element X ∈ H + reg iff: Moreover, if we define for R > 0 the subset H +,R reg ⊂ H + reg such that X ∈ H +,R reg iff α + 1 (X) < R we easily see that γ 1 (X) < ∞, for M (ν) − a.e. X ∈ H +,R reg .
Furthermore, by positivity it actually suffices to show where the expectation E is with respect to M (ν) . We calculate, using Fatou's lemma and the underlying determinantal structure The last claim is the statement of Lemma 5.4. The description of the law of the parameters α + = α + 1 ≥ α + 2 ≥ α + 3 ≥ · · · ≥ 0 under m (ν) viewed as a point configuration on (0, ∞) is given by Proposition 4.2. The proof is complete.