On the Scaling Limits of Determinantal Point Processes with Kernels Induced by Sturm-Liouville Operators

By applying an idea of Borodin and Olshanski [J. Algebra 313 (2007), 40-60], we study various scaling limits of determinantal point processes with trace class projection kernels given by spectral projections of selfadjoint Sturm-Liouville operators. Instead of studying the convergence of the kernels as functions, the method directly addresses the strong convergence of the induced integral operators. We show that, for this notion of convergence, the Dyson, Airy, and Bessel kernels are universal in the bulk, soft-edge, and hard-edge scaling limits. This result allows us to give a short and unified derivation of the known formulae for the scaling limits of the classical random matrix ensembles with unitary invariance, that is, the Gaussian unitary ensemble (GUE), the Wishart or Laguerre unitary ensemble (LUE), and the MANOVA (multivariate analysis of variance) or Jacobi unitary ensemble (JUE).


Introduction
We consider determinantal point processes on an interval Λ = (a, b) with trace class projection kernel K n (x, y) = n−1 ∑ j=0 φ j (x)φ j (y), (1.1) where φ 0 , φ 1 , . . . , φ n−1 are orthonormal in L 2 (Λ); each φ j may have some dependence on n that we suppress from the notation. We recall (see, e.g., Anderson, Guionnet and Zeitouni 2010, §4.2) that for such processes the joint probability density of the n points is given by p n (x 1 , . . . , x n ) = 1 n! n det i,j=1 K n (x i , x j ), the mean counting probability is given by the density ρ n (x) = n −1 K n (x, x), and the gap probabilities are given, by the inclusion-exclusion principle, in terms of a Fredholm determinant, namely E n (J) = P({x 1 , . . . , x n } ∩ J = ∅) = det(I − 1 J K n 1 J ).
The various scaling limits are usually derived from a pointwise convergence of the kernel function K n (x, y) obtained from considering the large n asymptotic of the eigenfunctions φ j , which can be technically very involved. 1 Borodin and Olshanski (2007) suggested, for discrete point processes, a different, conceptually and technically much simpler approach based on selfadjoint difference operators. We will show that their method, generalized to selfadjoint Sturm-Liouville operators, allows us to give a short and unified derivation of the various scaling limits for the unitary random matrix ensembles (GUE, LUE/Wishart, JUE/MANOVA) that are based on the classical orthogonal polynomials (Hermite, Laguerre, Jacobi).
Second, ifΛ n ⊂Λ = (ã,b) withã n →ã,b n →b, we aim for a selfadjoint operatorL on L 2 (Λ) with a core C such that eventually C ⊂ D(L n ) and L n u →Lu (u ∈ C). (1. 2) The point is that, if the test functions from C are particularly nice, such a convergence is just a simple consequence of the locally uniform convergence of the coefficients of the differential operatorsL n -a convergence that is, typically, an easy calculus exercise. Now, given (1.2), the concept of strong resolvent convergence (see Theorem 2) immediately yields, 2 if 0 ∈ σ pp (L), K n 1Λ n = 1 (−∞,0) (L n )1Λ n s −→ 1 (−∞,0) (L).
Third, we take an interval J ⊂Λ, eventually satisfying J ⊂Λ n , such that the operator 1 (−∞,0) (L)1 J is trace class with kernelK(x, y) (which can be obtained from the generalized eigenfunction expansion ofL, see §A.2). Then, we immediately get the strong convergenceK n 1 J s −→K 1 J .
Remark 1.1. Tao (2012, §3.3) sketches the Borodin-Olshanski method, applied to the bulk and edge scaling of GUE, as a heuristic device. Because of the microlocal methods that he uses to calculate the projection 1 (−∞,0) (L), he puts his sketch under the headline "The Dyson and Airy kernels of GUE via semiclassical analysis".

Scaling Limits and Other Modes of Convergence.
Given that one just has to establish the convergence of the coefficients of a differential operator (instead of an asymptotic of its eigenfunctions), the Borodin-Olshanski method is an extremely simple device to determine all the scalings x = σ n ξ + µ n that would yield some meaningful limitK n 1 J →K 1 J , namely in the strong operator topology. Other modes of convergence have been studied in the literature, ranging from some weak convergence of k-point correlation functions over pointwise convergence of the kernel functions to the convergence of gap probabilities, that is, From a probabilistic point of view, the latter convergence is of particular interest and has been shown in at least three ways: 2 By " s −→" we denote the strong convergence of operators acting on L 2 .
(1) By Hadamard's inequality, convergence of the determinants follows directly from the locally uniform convergence of the kernels K n (Anderson et al. 2010, Lemma 3.4.5) and, for unbounded J, from further large deviation estimates (Anderson et al. 2010, Lemma 3.3.2). This way, the limit gap probabilities in the bulk and soft edge scaling limit of GUE can rigorously be established (see, e.g., Anderson et al. 2010, § §3.5 and 3.7).
(2) Since A → det(I − A) is continuous with respect to the trace class norm (Simon 2005, Thm. 3.4),K n 1 J →K 1 J in trace class norm would generally suffice. Such a convergence can be proved by factorizing the trace class operators into Hilbert-Schmidt operators and obtaining the L 2 -convergence of the factorized kernels once more from locally uniform convergence, see the work of Johnstone on the scaling limits of the LUE/Wishart ensembles (2001) and that on the limits of the JUE/MANOVA ensembles (2008). (3) Since 1 JKn 1 J and 1 JK 1 J are selfadjoint and positive semi-definite, yet another way is by observing that the convergenceK n 1 J →K 1 J in trace class norm is, for continuous kernels, equivalent (Simon 2005, Thm. 2.20) to the combination of both, the convergenceK n 1 J →K 1 J in the weak operator topology and the convergence of the traces Once again, these convergences follow from locally uniform convergence of the kernels; see Deift (1999, §8.1) for an application of this method to the bulk scaling limit of GUE. Since convergence in the strong operator topology implies convergence in the weak one, the Borodin-Olshanski method would thus establish the convergence of gap probabilities if we were only able to show condition (T) by some additional, similarly short and simply argument. Note that, by the ideal property of the trace class, condition (T) implies the same condition for all J ⊂ J. We fall, however, short of conceiving a proof strategy for condition (T) that would be independent of all the laborious proofs of locally uniform convergence of the kernels.
Remark 1.2. Contrary to the discrete case originally considered by Borodin and Olshanski, it is also not immediate to infer from the strong convergence of the induced integral operators the pointwise convergence of the kernels, For instance, in §2, we will need just a single such instance, namelỹ to prove a limit lawρ n (t) dt w −→ρ(t) dt for the mean counting probability. Using mollified Dirac deltas, pointwise convergence would generally follow, for continuously differentiableK n (ξ, η), if were able to bound, locally uniform, the gradient of K n (ξ, η). Then, by dominated convergence, criterion (T) would already be satisfied if we established an integrable bound ofK n (ξ, ξ) on J. Since the scalings laws are, however, maneuvering just at the edge between trivial cases (i.e., zero limits) and divergent cases, it is conceivable that a proof of such bounds might not be significantly simpler than a proof of convergence of the gap probabilities itself.

The Main Result.
Using the Borodin-Olshanski method, we will prove the following general result on selfadjoint Sturm-Liouville operators; a result that adds a further class of problems to the universality (Kuijlaars 2011) of the Dyson, Airy, and Bessel kernel in the bulk, soft-edge, and hard-edge scaling limits.
Theorem 1. Consider a selfadjoint realization L n of the formally selfadjoint Sturm-Liouville operator 3 , or Λ = (0, 1) with p, q n ∈ C ∞ (Λ) and p(x) > 0 for all x ∈ Λ such that, for t ∈ Λ and n → ∞, asymptotically up to an error O(n −1 ); the scaling exponents should satisfy We assume that these expansions can be differentiated 4 at least twice, that the roots of q(t) − ω are simple and that the spectral projection K n = 1 (−∞,0) (L n ) satisfies tr K n = n. Let x = σ n ξ + µ n induce the transformed projectionK n and let x = n κ t induce the transformed mean counting probability densityρ n . Then the following scaling limits hold.
• Bulk scaling limit: given t ∈ Λ withq(t) < ω, the scaling yields, for any bounded interval J, the strong operator limitK n 1 J s −→ K Dyson 1 J . Under condition (K 0 ) and ifρ has unit mass on Λ, there is the limit law ρ n (t) dt w −→ρ(t) dt.
• Hard-edge scaling limit: given that Λ = (0, ∞) or Λ = (0, 1) with 3 Since, in this paper, we always consider a particular selfadjoint realization of a formal differential operator, we will use the same letter to denote both. 4 We say that an expansion Here, if 0 α < 1, the selfadjoint realization L n is defined by means of the boundary condition (1.6) Remark 1.3. Whether the intervals J can be chosen unbounded or not depends on whether the limit operator K 1 J would then be trace class or not, see the explicit formulae for the traces in the appendix: only in the former case we get the representation of the scaling limit in terms of a particular integral kernel. Note that we can never use J = Λ since tr K n = n → ∞.
Outline of the paper. The proof of Theorem 1 is subject of §2. In §3 we apply it to the classical orthogonal polynomials, which yields a short and unified derivation of the known formulae for the scaling limits for the classical unitary random matrix ensembles (GUE, LUE/Wishart, JUE/MANOVA). In fact, by a result of Tricomi, the only input needed is the weight function w of the orthogonal polynomials; from there one gets in a purely formula based fashion (by simple manipulations which can easily be coded in any computer algebra system), first, to the coefficients p and q n as well as to the eigenvalues λ n of the Sturm-Liouville operator L n and next, by applying Theorem 1, to the particular scalings limits.
To emphasize that our main result and its application is largely independent of concretely identifying the kernelK of the spectral projection 1 (−∞,0) (L), we postpone this identification to the appendix: there, using generalized eigenfunction expansions, we calculate the Dyson, Airy, and Bessel kernels directly from the limit differential operatorsL.

Proof of the Main Result for Sturm-Liouville Operators
To begin with, we first state the expansions of the coefficients, which also motivate the particular regularity assumptions made in Theorem 1.
Limit Law. The result for the bulk scaling limit allows, in passing, to calculate a limit law of the mean counting probability ρ n (x) dx. We observe that, by virtue of Thus, to get to a limit, we have to assume condition (K 0 ), that is, 8 We then get, forq(t) = ω (that is, almost everywhere in Λ), Hence, by Helly's selection theorem, the probability measureρ n (t) dt converges vaguely toρ(t) dt, which is, in general, just a sub-probability measure. If, however, it is checked thatρ(t) dt has unit mass, the convergence is weak.
7 It is the cancellation of the powers of n in the last equality, which called for assumption (1.4). 8 Following Knuth, the bracketed notation [S] stands for 1 if the statement S is true, 0 otherwise. 9 Note that, by the assumption made on the simplicity of the roots ofq(t) − ω, we haveq (t * ) = 0.
Hard-Edge Scaling Limit. In the case a = 0, we take the scaling with σ n = o(1) appropriately chosen, to explore the vicinity of this "hard edge"; note that such a scaling yieldsΛ = (0, ∞). To simplify we make the assumptions stated in (1.5). We see thatK If we choose (1); that is,L n u →Lu, We recall that, if α 1, the limitL is essentially selfadjoint on C ∞ 0 (Λ) and that the spectrum of its unique selfadjoint extension is absolutely continuous: σ(L) = σ ac (L) = [−1, ∞). The spectral projection can be calculated by a generalized eigenfunction expansion, yielding the Bessel kernel (A.5); see Lemma A.3.
Remark 2.1. The theorem also holds in the case 0 α < 1 if the particular selfadjoint realization L n is defined by the boundary condition (1.6), see Remark A.1.

Projection Kernels Associated to Classical Orthogonal Polynomials
In this section we apply Theorem 1 to the kernels associated with the classical orthogonal polynomials, that is, the Hermite, Laguerre, and Jacobi polynomials. In random matrix theory, the thus induced determinantal processes are modeled by the spectra of the Gaussian Unitary Ensemble (GUE), the Laguerre Unitary Ensemble (LUE) or Wishart ensemble, and the Jacobi Unitary Ensemble (JUE) or MANOVA 11 ensemble.
To prepare the study of the individual cases, we first discuss their common structure. Let P n (x) be the sequence of classical orthogonal polynomials belonging to the weight function w(x) on the interval (a, b). We normalize P n (x) such that φ n , φ n = 1, where φ n (x) = w(x) 1/2 P n (x). The functions φ n form a complete orthogonal set in L 2 (a, b); conceptual proofs of the completeness can be found, e.g., in Andrews, Askey and Roy (1999) ( §5.7 for the Jacobi polynomials, §6.5 for the Hermite and Laguerre polynomials). By a result of Tricomi (see Erdélyi, Magnus, Oberhettinger and Tricomi 1953, §10.7), the P n (x) satisfy the eigenvalue problem 10 The error term is O(n − min(1,2κ ) ), which amounts for O(n −1 ) if κ 1 2 (as throughout §3). 11 MANOVA = multivariate analysis of variance where p(x) is a quadratic polynomial 12 and r(x) a linear polynomial such that In terms of φ n , a simple calculation shows that Therefore, by the completeness of the φ n , the formally selfadjoint Sturm-Liouville operator L = − d dx p(x) d dx + q(x) has a particular selfadjoint realization (which we continue to denote by the letter L) with spectrum σ(L) = {λ 0 , λ 1 , λ 2 , . . .} and corresponding eigenfunctions φ n . Hence, the projection kernel (1.1) induces an integral operator K n with tr K n = n that satisfies Note that this relation remains true if we choose to make some parameters of the weight w (and, therefore, of the functions φ j ) to depend on n. For the scaling limits of K n , we are now in the realm of Theorem 1: given the weight w(x) as the only input, all the other quantities can now simply be obtained by routine calculations.

Hermite Polynomials. The weight is
and, therefore, Theorem 1 is applicable and we directly read off the following well-known scaling limits of the GUE (see, e.g., Anderson et al. 2010, Chap. 3): • bulk scaling limit: if − √ 2 < t < √ 2, the transformation x = π ξ n 1/2 √ 2 − t 2 + n 1/2 t inducesK n with a strong limit given by the Dyson kernel; • limit law: the transformation x = n 1/2 t inducesρ n with a weak limit given by the Wigner semicircle law • soft-edge scaling limit: the transformation x = ±(2 −1/2 n −1/6 ξ + √ 2n) inducesK n with a strong limit given by the Airy kernel.
Laguerre Polynomials. The weight is w(x) = x α e −x on Λ = (0, ∞); hence In random matrix theory, the corresponding determinantal point process is modeled by the spectra of complex n × n Wishart matrices with a dimension parameter m n; the Laguerre parameter α is then given by α = m − n 0. Of particular interest in statistics (Johnstone 2001) is the simultaneous limit m, n → ∞ with m n → θ 1, for which we get Theorem 1 is applicable and we directly read off the following well-known scaling limits of the Wishart ensemble (Johnstone 2001): inducesK n with a strong limit given by the Dyson kernel; • limit law: the scaling x = nt inducesρ n with a weak limit given by the Marčenko-Pastur law • soft-edge scaling limit: with signs chosen consistently as either + or −, x = ±n 1/3 θ −1/6 t 2/3 ± ξ + nt ± (3.1) inducesK n with a strong limit given by the Airy kernel.
Remark 3.1. The scaling (3.1) is better known in the asymptotically equivalent form , which is obtained from (3.1) by replacing θ with m/n (see Johnstone 2001, p. 305).
In the case θ = 1, which implies t − = 0, the lower soft-edge scaling (3.1) breaks down and has to be replaced by a scaling at the hard edge: • hard-edge scaling limit: if α = m − n is a constant, 13 x = ξ/(4n) induces K n with a strong limit given by the Bessel kernel K (α) Bessel .
13 By Remark 2.1, there is no need to restrict ourselves to α 1: since φ n (x) = x α χ n (x) with χ n (x) extending smoothly to x = 0, we have, for α 0, Hence, the selfadjoint realization L n is compatible with the boundary condition (1.6).

Jacobi Polynomials. The weight is
and λ n = n(n + α + β + 1). In random matrix theory, the corresponding determinantal point process is modeled by the spectra of complex n × n MANOVA matrices with dimension parameters m 1 , m 2 n; the Jacobi parameters α, β are then given by α = m 1 − n 0 and β = m 2 − n 0. Of particular interest in statistics (Johnstone 2008) is the simultaneous limit m 1 , m 2 , n → ∞ with for which we get Theorem 1 is applicable and we directly read off the following (less well-known) scaling limits of the MANOVA ensemble (Collins 2005, Johnstone 2008): • bulk scaling limit: if t − < t < t + , inducesK n with a strong limit given by the Dyson kernel; • limit law: (because of κ = 0 there is no transformation here) ρ n has a weak limit given by the law (Wachter 1980) • soft-edge scaling limit: with signs chosen consistently as either + or −, inducesK n with a strong limit given by the Airy kernel.
Remark 3.2. Johnstone (2008Johnstone ( , p. 2651 gives the soft-edge scaling in terms of a trigonometric parametrization of θ and τ. By putting we immediately get t ± = sin 2 φ ± ψ 2 and (3.2) becomes In the case θ = τ = 1/2, which is equivalent to m 1 /n, m 2 /n → 1, we have t − = 0 and t + = 1. Hence, the lower and the upper soft-edge scaling (3.2) break down and have to be replaced by a scaling at the hard edges: • hard-edge scaling limit: if α = m 1 − n, β = m 2 − n are constants, 14 x = ξ/(4n 2 ) inducesK n with a strong limit given by the Bessel kernel K Bessel is obtained for x = 1 − ξ/(4n 2 ).

A. Appendices
A.1. Generalized Strong Convergence. The notion of strong resolvent convergence (Weidmann 1980, §9.3) links the convergence of differential operators, pointwise for an appropriate class of smooth test functions, to the strong convergence of their spectral projections. We recall a slight generalization of that concept, which allows the underlying Hilbert space to vary.
Specifically we consider, on an interval (a, b) (not necessarily bounded) and on a sequence of subintervals (a n , b n ) ⊂ (a, b) with a n → a and b n → b, selfadjoint operators L : D(L) ⊂ L 2 (a, b) → L 2 (a, b), L n : D(L n ) ⊂ L 2 (a n , b n ) → L 2 (a n , b n ).
By means of the natural embedding (that is, extension by zero) we take L 2 (a n , b n ) ⊂ L 2 (a, b); the multiplication operator induced by the characteristic function 1 (a n ,b n ) , which we continue to denoted by 1 (a n ,b n ) , constitutes the orthogonal projection of L 2 (a, b) onto L 2 (a n , b n ). Following Stolz and Weidmann (1993, §2), we say that L n converges to L in the sense of generalized strong convergence (gsc), if for some z ∈ C \ R, and hence, a forteriori, for all such z, in the strong operator topology of L 2 (a, b). 15 Theorem 2 (Stolz and Weidmann (1993, Thm. 4/5)). Let the selfadjoint operators L n and L satisfy the assumptions stated above and let C be a core of L such that, eventually, C ⊂ D(L n ).
(i) If L n u → Lu for all u ∈ C, then L n gsc −→ L.
(ii) If L n gsc −→ L and if the endpoints of the interval ∆ ⊂ R do not belong to the pure point spectrum σ pp (L) of L, the spectral projections to ∆ converge as 14 For the cases 0 α < 1 and 0 β < 1, see the justification of the limit given in Footnote 13. 15 We denote by R z (L) = (L − z) −1 the resolvent of an operator L.

A.2. Generalized Eigenfunction Expansion of Sturm-Liouville Operators. Let
L be a formally selfadjoint Sturm-Liouville operator on the interval (a, b), with smooth coefficient functions p > 0 and q. We have the limit point case (LP) at the boundary point a if there is some c ∈ (a, b) and some z ∈ C such that there exists at least one solution of (L − z)u = 0 in (a, b) for which u ∈ L 2 (a, c); otherwise, we have the limit circle case (LC) at a. According to the Weyl alternative (Weidmann 1980, Thm. 8.27) this is then true for all such c and z and, moreover, in the LP case there exists, for every z ∈ C \ R, a one-dimensional space of solutions u of the equation (L − z)u = 0 for which u ∈ L 2 (a, c). The same structure and notion holds for the boundary point b.
Theorem 3. Let L be a formally selfadjoint Sturm-Liouville operator on the interval (a, b) as defined above. If there is the LP case at a and b, then L is essentially self-adjoint on the domain C ∞ 0 (a, b) and, for z ∈ C \ R, the resolvent R z (L) = (L − z) −1 of its unique selfadjoint extension (which we continue to denote by the letter L) is of the form (A.1) Here u a and u b are the non-vanishing solutions of the equation (L − z)u = 0, uniquely determined up to a factor by the conditions u a ∈ L 2 (a, c) and u b ∈ L 2 (c, b) for some c ∈ (a, b), and W denotes the Wronskian which is a constant for x ∈ (a, b).
A more general formulation (and a proof) that includes the LC case can be found, e.g., in Weidmann (1980, Thm. 8.22/8.29). 16 In the following, we write (A.1) briefly in the form If the imaginary part of the thus defined Green's kernel G z (x, y) has finite boundary values as z approaches the real line from above, there is a simple formula for the spectral projection associated with L that often applies if the spectrum of L is absolutely continuous.
Theorem 4. (i) Assume that there exits, as ↓ 0, the limit π −1 Im G λ+i (x, y) → K λ (x, y), locally uniform in x, y ∈ (a, b) for each λ ∈ R except for some isolated points λ for which the limit is replaced by Im G λ+i (x, y) → 0. Then the spectrum is absolutely continuous, σ(L) = σ ac (L), and, for a Borel set ∆, 16 See Weidmann (1987, pp. 41-42) for a proof that C ∞ 0 (a, b) is a core if the coefficients are smooth.
(ii) Assume further, for some (a Then 1 ∆ (L)1 (a ,b ) is a Hilbert-Schmidt operator on L 2 (a, b) with kernel Proof. With E denoting the spectral resolution of the selfadjoint operator L, we observe that, for a given φ ∈ C ∞ 0 (a, b), the Borel-Stieltjes transform of the positive measure µ φ (λ) = E(λ)φ, φ can be simply expressed in terms of the resolvent as follows (see Lax 2002, §32.1): If we take z = λ + i and let ↓ 0, we obtain by the locally uniform convergence of the integral kernel of R z that there exits either the limit or, at isolated points λ, Im R λ+i (L)φ, φ → 0. By a theorem of de la Vallée-Poussin (see Simon 2005, Thm. 11.6(ii/iii)), the singular part of µ φ vanishes, µ φ,sing = 0; by Plemelj's reconstruction the absolutely continuous part satisfies (see Simon 2005, Thm. 11.6(iv)) dµ φ,ac (λ) = K λ φ, φ dλ.
, approximation shows that E sing = 0, that is, σ(L) = σ ac (L). Since 1 ∆ (L)φ, φ = ∆ dµ φ (λ), we thus get, by the symmetry of the bilinear expressions, the representation (A.2), which finishes the proof of (i). The Hilbert-Schmidt part of part (ii) follows using the Cauchy-Schwarz inequality and Fubini's theorem and yet another density argument; the trace class part follows from Gohberg, Goldberg and Krupnik (2000, Thm. IV.8 is a selfadjoint, positive-semidefinite operator. We apply this theorem to the spectral projections used in the proof of Theorem 1. The first two examples could have been dealt with by Fourier techniques (Tao 2012, §3.3); applying, however, the same method in all the examples renders the approach more systematic.
Example 1 (Dyson kernel). Consider Lu = −u on (−∞, ∞). Since u ≡ 1 is a solution of Lu = 0, both endpoints are LP; for a given Im z > 0 the solutions u a (u b ) of (L − z)u = 0 being L 2 at −∞ (∞) are spanned by Thus, Theorem 3 applies: L is essentially selfadjoint on C ∞ 0 (−∞, ∞), the resolvent of its unique selfadjoint extension is represented, for Im z > 0, by the Green's kernel For λ > 0 there is the limit for λ < 0 the limit is zero; both limits are locally uniform in x, y ∈ R. For λ = 0 there would be divergence, but we obviously have locally uniform in x, y ∈ R. Hence, Theorem 4 applies: σ(L) = σ ac (L) = [0, ∞) and (A.2) holds for each Borel set ∆ ⊂ R. Given a bounded interval (a, b), we may estimate for the specific choice ∆ = (−∞, Therefore, Theorem 4 yields that 1 (−∞,π 2 ) (L)1 (a,b) is Hilbert-Schmidt with the Dyson kernel restricted to x, y ∈ (a, b). Here, the last equality is simply obtained from Since the resulting kernel is continuous for x, y ∈ (a, b), Theorem 4 gives that 1 (−∞,π 2 ) (L)1 (a,b) is a trace class operator with trace To summarize, we have thus obtained the following lemma.
To summarize, we have thus obtained the following lemma.