Precise Deviations Results for the Maxima of Some Determinantal Point Processes: the Upper Tail

We prove precise deviations results in the sense of Cram\'er and Petrov for the upper tail of the distribution of the maximal value for a special class of determinantal point processes that play an important role in random matrix theory. Here we cover all three regimes of moderate, large and superlarge deviations for which we determine the leading order description of the tail probabilities. As a corollary of our results we identify the region within the regime of moderate deviations for which the limiting Tracy-Widom law still predicts the correct leading order behavior. Our proofs use that the determinantal point process is given by the Christoffel-Darboux kernel for an associated family of orthogonal polynomials. The necessary asymptotic information on this kernel has mostly been obtained in [Kriecherbauer T., Schubert K., Sch\"uler K., Venker M., Markov Process. Related Fields 21 (2015), 639-694]. In the superlarge regime these results of do not suffice and we put stronger assumptions on the point processes. The results of the present paper and the relevant parts of [Kriecherbauer T., Schubert K., Sch\"uler K., Venker M., Markov Process. Related Fields 21 (2015), 639-694] have been proved in the dissertation [Sch\"uler K., Ph.D. Thesis, Universit\"at Bayreuth, 2015].


Model and general assumptions
In this paper we consider a class of determinantal point processes that is prominent in random matrix theory. There a well studied ensemble type consists of probability measures on N × N Hermitian matrices that are invariant under unitary conjugation (unitary invariant ensembles) and for which the joint distribution of the vector λ ∈ R N of eigenvalues has a density of the The function V : R → R should be viewed as a parameter of the model and is supposed to have sufficient growth at ±∞ such that the measure can be normalized by a constant Z N,V . Note that choosing V to be a quadratic function leads to the classic Gaussian unitary ensemble (GUE). We refer the interested reader to [1,2,11,23,33,34] for recent monographs on random matrix theory.
The determinantal nature of the point process on R induced by the probability measure dP N,V (λ) = P N,V (λ) dλ on N -point configurations is due to the square of the Vandermonde determinant appearing in (1.1). In fact, there exist functions K N,V : R 2 → R such that all marginal densities (also called correlation functions) can be expressed as determinants of the N × N matrix K N,V (λ) := (K N,V (λ j , λ k )) 1≤j,k≤N and of its principal minors, e.g., P N,V (λ) = 1 N ! det[K N,V (λ)] (see, e.g., [37,Section 2.3], see also [2,Section 4.2], [1,Chapters 4 and 11] and references therein). Moreover, the kernels can be expressed in terms of orthogonal polynomials w.r.t. the measure e −N V (x) dx on R. More precisely, denote by p N,V j j the uniquely defined sequence of polynomials that satisfies We are interested in deviations results for the distribution of the largest component λ max := max{λ 1 , . . . , λ N } of λ in the limit as N → ∞. In order to be definite in our subsequent discussion we now introduce the general assumptions (GA) on the functions V that will be required throughout this paper. These are certainly not the most general conditions for our results to hold true but they reduce the technicalities in the proofs to a minimum.
A function V is said to satisfy (GA) if (1)-(3) hold: (1) V : R → R is real analytic; (2) V is strictly monotonically increasing (convexity assumption); Note that conditions (2) and (3) imply at least linear growth of V (x) for |x| → ∞ that suffices to ensure the integrability of P N,V .

Equilibrium measure and upper tail rate function
One important ingredient in the analysis of the probability measure P N,V on R N is the functional It is now plausible that P N,V concentrates around those vectors λ for which I V (µ λ ) is close to the infimum of I V (µ) where µ ranges over M 1 (R). In fact, this observation can be used to derive a large deviations principle for both the counting measure µ λ and for λ max under P N,V for V satisfying (GA) (see, e.g., [2, Section 2.6] and references therein). For a definition of a large deviations principle, see [17]. However, this is not the approach of the present paper in which the analysis is based on the determinantal representation of P N,V . Let us summarize some well known facts about the minimization of the functional I V , see, e.g., [11,Chapter 6], [34,Chapter 11] and references therein, see also [36,Chapter 2] for a derivation of the facts relevant in the present paper. For a large class of functions V , including those satisfying (GA), the functional I V has a unique minimizer µ V that is called the equilibrium measure with respect to the external field V . The equilibrium measure µ V has compact support and we denote by b V the maximum of its support. From a heuristic point of view we expect λ max to fluctuate around b V . In order to describe these fluctuations it is known that the gradient L V of I V at µ V comes into play. It is given by The Euler-Lagrange equations for the above variational problem imply that there exists a real number l V such that L V equals l V on the support of µ V and is ≥ l V elsewhere. Hence the function is non-negative and vanishes identically on the support of the equilibrium measure. Observe that η V coincides with the rate function of the large deviations principle for the upper tail of λ max , see [2, Section 2.6, Theorem 2.6.6]. It is a remarkable and useful fact that in the case of strictly convex and sufficiently smooth functions V (e.g., C 3 will do) there is an almost explicit representation for η V : There existand this is the implicit part -unique reals a V < b V , called Mhaskar-Rakhmanov-Saff numbers, that are uniquely defined by the two equations b a V (t) As it turns out the support of µ V equals the interval [a V , b V ]. Set Observe that G V > 0 by condition (2) of (GA). To the right of the support of µ V we have Note that the prefactor 4 3 has no significance other than the standard convention that γ V = 1 for V (x) = x 2 /2. Secondly, we remind the reader that the equilibrium measure can also be expressed in terms of the just defined quantities. Indeed, on its support µ V is given by dµ

Fluctuations: Tracy-Widom law, large and moderate deviations principles
After the brief review of the equilibrium measure we are now ready to discuss the fluctuations of λ max around b V . They are of order N −2/3 and, appropriately rescaled (here we also need the just defined γ V ), they are asymptotically described by the celebrated (β = 2) Tracy-Widom distribution F TW [38], i.e., uniformly for s ∈ R (see, e.g., [1, Section 6.3], [12,Chapter 6], [34,Section 13.1] and references therein). This result can be viewed as an analogue of the central limit theorem for the arithmetic mean of N independent and identically distributed random variables. Note that the role of the normal distribution is taken by the Tracy-Widom distribution and that the power law of the fluctuations has changed from N −1/2 to N −2/3 . As it is the case for the classical CLT it is natural to ask for corresponding deviations results. Roughly speaking this means to describe how fast the tail probabilities tend to zero if s is not kept fixed as in (1.10) but is allowed to grow with N . In this paper we are only concerned with the upper tail. Before we formulate our precise deviations results for λ max in Theorems 1.1 and 1.5 we begin by stating our results in a weaker but possibly more familiar form that is related to the large deviations principles introduced by Varadhan (see, e.g., [17]). We will show in Corollary 1.2 how to derive these from our main results.
Recall the definition of η V in (1.5) (see also (1.8)). Then we have for t > b V : where the O-term is uniform for t in compact subsets of (b V , ∞). Formula (1.11) implies in particular a large deviations principle for λ max with speed N and rate function J V (t) = η V (t) if t ≥ b V and J V (t) = ∞ otherwise. Indeed, we obtain that lim sup N 1 N log P N,V (λ max ≤ t) = −∞ for any t < b V applying the large deviations principle for the empirical measure of the eigenvalues, see [2, equation (2.6.30)]. Furthermore, one can remark that together with the assertion lim N 1 N log P N,V (λ max ≥ t) = −η V (t) for all t ≥ b V , Theorem 4.1.11 in [17] allows us to derive a large deviations principle from the limiting behavior of probabilities for a basis of topology, see also [21]. Observe, that for any V satisfying condition (GA), growth-condition (2.6.2) and Assumption 2.6.5 in [2, Theorem 2.6.6] are fulfilled. The latter follows from Lemmas 4.5 and 4.6 in [26] (for discrete Coulomb gases, but the proof for a continuous Coulomb gas is essentially the same). Hence (1.11) reproves Theorem 2.6.6 in [2] under stronger assumptions but provides more information on the higher order terms.
Large deviations principles for extremal values have already been proved in a much more general setting of mean field models with Coulomb gas interactions that do not necessarily possess the structure of determinantal point processes, (see [6], [2, Section 2.6.2], [26,Section 4], [7,10,22]). Note that these results do not provide rates of convergence as presented in (1.11). Another class of repulsive particle systems that is not determinantal but can be expressed as an average of determinantal ones by a stochastic linearization procedure has been introduced in [24]. For such ensembles a result comparable to (1.11) has been obtained in [28]. Recently in [3] the author proves a large deviations principle for the largest eigenvalue of Wigner matrices without Gaussian tails, namely such that the distribution tails P (|X 1,1 | > t) and P (|X 1,2 | > t) behave like e −bt α and e −at α respectively for some a, b ∈ (0, ∞) and α ∈ (0, 2). The large deviations principle is of speed N α/2 and with an explicit rate function depending only on the tail distributions of the X i,j .
We turn to the regime of moderate deviations. Theorem 1.1 below implies for any α ∈ 0, 2 3 moderate deviations principles for the rescaled random variable λ max /N α with as appearing in (1.10). Here the speed is N 3 2 α and the rate function is J(s) := 4 3 s 3/2 . This can be seen from the following corollary of Theorem 1.1: where the O-terms are uniform for s ∈ 1, N 2/3 , thus connecting the Tracy-Widom regime with the regime of large deviations.
In the regime of moderate deviations less is known. Upper and lower bounds on the left hand side of (1.12) of the correct order were shown in [30] for Hermitian β-ensembles that are determinantal for β = 2 only and agree in this case with the Gaussian unitary ensemble. A result of the form (1.12) has been proved in [28] for the class of repulsive particle systems introduced in [24]. Finally we refer the reader to the moderate deviations results in [18,19,20] on certain statistics of eigenvalues for Wigner matrices and to the moderate deviations results in [5,31,32] on combinatorial models that are closely related to random matrix theory.

Precise deviations results I: moderate and large deviations
As already mentioned, all the above results follow from our precise deviations results in the sense of Cramér and Petrov. Here the goal is to identify the leading order description of Our first main result allows us to obtain simultaneously the leading order description of the upper tail probabilities P N,V (λ max > t) resp. P N,V ( λ max > s) in the regimes of large resp. moderate deviations. To state it conveniently, we introduce the function (1.14) Theorem 1.1. Assume that V satisfies (GA) and recall the notation introduced above. Then the upper tail probability satisfies for all t > b V the relation Observe that Theorem 1.1 immediately implies (1.11) with the uniformity of the O-term claimed there. We would like to point out that for (1.15) the uniformity of the O-term is stated not only for compact but for bounded subsets of (b V , ∞) extending the region of validity up to b V . Note, however, that there exists a positive number C V depending on the constant in the O-term such that for 0 < t − b V ≤ C V N −2/3 statement (1.15) already follows from the boundedness of P N,V (λ max > t)/F N,V (t) which is easy to derive using (1.7)-(1.9). This lack of informative value of (1.15) for these values of t is no problem since they belong to the Tracy-Widom regime and (1.13) with [0, γ V C V ] ⊂ B fills the gap.
Next we turn to the regime of moderate deviations It is instructive to translate this result to the rescaled variable λ max . Since we only need to evaluate F N,V (t(s)). Using again (1.7)-(1.9) and the assumed real analyticity of V one obtains for positive s = o N 2/3 , i.e., in particular in the regime of moderate deviations, that for some sequence (d j,V ) j≥1 of real numbers depending on V . From these formulas (1.12) is immediate, at least for s ∈ C, cN 2/3 where the positive numbers c, C depend on the constants in the O-terms of (1.15) and (1.17). To extend (1.12) to all of 1, N 2/3 one may use the monotonicty of P N,V ( λ max > s) in s for the lower end and for the upper end one shows that s 3/2 e N η V (t(s)) F N,V (t(s)) is bounded away from 0.
We return to the leading order description for P N,V ( λ max > s) in the regime of moderate deviations. The first observation is that combining (1.15)-(1.18) leads to a series representation for the upper tail that is the exact analogue to the Cramér series for sums of independent variables [9], see also [35,Section 5.8]. Secondly, in order to determine the leading order we only need to keep those terms of the series in (1.18) that do not vanish as N becomes large. A computation shows that for any k ∈ N 0 and positive s we have in the limit N → ∞ Note that η V,0 (s, N ) = 4 3 s 3/2 does not depend on N and is just the rate function J introduced above (1.12). The results of our discussion are summarized in statements a) and b) of the following Corollary 1.2. Assume that V satisfies (GA) and recall the notation introduced above. i) For any q ∈ (0, ∞) and any sequence of positive reals (p N ) N satisfying p N < q and lim N →∞ is the distribution function of a standard normal distributed random variable.
Proof . We are only left to show statement c). Observe that α 0 = 2/3 − 2/5 = 4/15. Therefore it follows from (1.15)-(1. 19) that for all s ∈ (0, q N ]: Using in addition the asymptotics of the Tracy-Widom distribution (see, e.g., [4, equations (1) and (25)], cf. [1, Chapter 9] and references therein) To the best of our knowledge there are three deviations results in the literature of comparable precision for models that have the Tracy-Widom distribution as their limit law. The first two are concerned with the upper tail in the moderate regime: Firstly, in [31] the leading order description is given for the length of the longest increasing subsequence of a random permutation. Secondly, for the largest particles from ensembles that were introduced in [24] precise deviations were proved in [28] with slightly worse O-terms that are due to the averaging procedure that is used. The third result is contained in [13,   4. An important topic of random matrix theory is the question of universality. A good example for a universality result is (1.10). After an appropriate linear rescaling that involves only two V -dependent numbers b V and γ V , the distribution of the largest eigenvalue tends in the limit N → ∞ to the Tracy-Widom distribution that has no V -dependency whatsoever. If one considers large deviations principles one sees a transition from universal to non-universal behavior. Based on the same rescaling as in the Tracy-Widom regime, (1.12) implies moderate deviations principles with universal rate function J(s) = 4 3 s 3/2 . In contrast, the rate function η V of the large deviations principle depends fully on V .
This transition becomes even more elaborate when one considers precise deviations. Again there is no universality in the regime of large deviations. However, in the regime of moderate deviations there is an infinite cascade of regions where the level of V -dependency changes. More precisely, the leading order description of P N,

Precise deviations results II: superlarge deviations
The task of providing the leading order description for the upper tail of λ max would be fully completed by Corollary 1.2 if we were allowed to choose q = ∞ in statement b) i). It therefore remains to extend the result of Theorem 1.1 to unbounded subsets of (b V , ∞). In the case of sums of independent variables the corresponding question has been raised under the heading of "superlarge deviations" (see, e.g., [8]) and we will also use this terminology.
Our second main result states that under additional assumptions on V the leading order description of the upper tail remains unchanged also in the superlarge regime. In order to formulate our result we introduce new conditions on V that concern both the size of the region on that V can be extended analytically and the growth of Re(V (z)) as Re(z) → ∞ on this region.
(2) There exists n ∈ N and x 0 > 0 such that V has an analytic extension on Moreover, there exists a constant d V > 0 such that for all z ∈ U(n, x 0 ): Our result on superlarge deviations, which appears to be the first in the realm of random matrix theory, interacting particle systems and related combinatorial models, reads: Recall the definition of F N,V in (1.14). Then, for sufficiently large values of N , The assumptions that are imposed on V in addition to (GA) are in no way optimal. They are chosen such that at least convex polynomials and in particular the Gaussian unitary ensemble are included. For V that do not satisfy these extra conditions one may try to modify the arguments in the proofs of Lemmas 3.1 and A.2. One sees, e.g., from the arguments around (A.12) that the faster V (x) grows for x → ∞ the smaller the domain of analyticity of V needs to be.

Overview of the remaining parts of the paper
The key in proving both of our theorems is that the upper tail probabilities P N,V (λ max > t) = 1 − P N,V (λ max ≤ t) are complementary to the gap probabilities that no component of λ is contained in the interval (t, ∞). For determinantal ensembles (1.1) gap probabilities can be expressed in terms of the kernel (1.2) [1, Section 4.6] and one obtains As it turns out, for all of our analysis just the first term ∞ t K N,V (x, x) dx in the sum of (1.21) already determines the leading order behavior of the tail probabilities. In the situation of moderate and large deviations we show in Section 2 that the asymptotic results on the Christoffel-Darboux kernel K N,V (x, x) provided in [27] together with a well-known estimate on K N,V (x, x) for large values of x suffice to prove Theorem 1.1. The reason why the latter estimate for large x is used in the proof is that the leading order description of the Christoffel-Darboux kernel in [27] is uniform only for x in bounded subsets of (b V , ∞). In order to treat superlarge deviations uniformity is also required in unbounded subsets of (b V , ∞). This is achieved in Section 3 under additional assumptions on V that are formulated in Theorem 1.5. In this situation some of the arguments of [27] need to be improved which is the content of Appendix A.
The results in [27] have been obtained using the Deift-Zhou [16] nonlinear steepest descent method for Riemann-Hilbert problems, following and improving on previous applications [14,15,29,39] to orthogonal polynomials and to random matrices.
Finally, we like to mention that we are also able to treat the case that the domain of definition of V is bounded, but still contains the support of the equilibrium measure µ V in its interior. We refer the reader to Remark 2.2.
2 Proof of Theorem 1.1 As advertised at the end of the Introduction we begin by analyzing the first summand of (1.21). Lemma 2.1. Assume that V satisfies (GA) and let η V and F N,V be given as in (1.5) (see also (1.8)) and (1.14). There exists a number C > 0 such that for all The error bound is uniform for t in bounded subsets of b V + CN −2/3 , ∞ .
Proof . Let S > b V be arbitrary but fixed. We derive (2.1) uniformly for t ∈ b V + CN −2/3 , S . We first show that we only need to consider the integral on a bounded domain. To this end, observe that it follows from (1.14), (1.8), and (1.7) that there exist positive numbers d, D such that F N,V (t) ≥ de −N D for all N and all t ∈ (b V , S] (choose, e.g., D = η V (S) + 1). Next, we use that Well-known estimates from the theory of log-gases (see, e.g., [34,Theorem 11.1.2], see also [24,Lemma 5.2], [25,Lemma 4.4]) together with the fact that any V satisfying (GA) grows at least linearly for x → ∞ yield the existence of positive constants L,D, τ such that ρ N (x) ≤De −N τ x for all N and x ≥ L. Choosing M ≥ L with M τ > D we see that uniformly for t ∈ (b V , S]. We turn to the remaining part of the integral over the domain (t, M ) where we may assume M > S + 1 without loss of generality. In this domain we now use the information on the integrand provided by [27,Theorem 1.5(ii)]. Observe that the result in [27] is stated using a linear rescaling λ V that maps . Note in addition that the function η V of the present paper equals η V • λ −1 V in [27]. It follows that there exists a positive number C (corresponding to c −1 in [27]) such that We proceed as in [36,Lemma 4.8].
with (observe that η V is strictly monotone, thus invertible) By the mean value theorem there exists a 0 < ξ u < u for every u ∈ (0, η V (M ) − η V (t)] such that f (u) = f (0) + f (ξ u )u. To estimate f we use (1.8) and obtain .
Since G V is smooth and strictly positive and . Moreover, f (ξ) < f (0) and the above mean value representation for f (u) yields With this representation the integral on the right of (2.5) becomes trivial. Recall that we have chosen M > S + 1. Thus for all t ∈ (b V , S] we have As ∞ 0 ue −N u du = N −2 we have derived (2.4) and the proof is complete.
Proof of Theorem 1.1. Let C denote the constant introduced in Lemma 2.1. Bounding the probability measure P N,V by 1 it follows from (1.8) and from the boundedness of G V that the quotient P N,V (λ max > t)/F N,V (t) is a bounded function of t on the interval b V , b V + CN −2/3 . Therefore formula (1.15) holds in that region simply by an appropriate choice for the constant in the O-term. From now on we may restrict our attention to values t ∈ (b V + CN −2/3 , S] for some arbitrary but fixed number S > b V . For such values of t we first record the rough bounds that follow from Lemma 2.1, (1.14), (1.8) and from the strict positivity and continuity of In order to use (2.9) for the estimates of the summands in (1.21) with k ≥ 2 we recall a basic fact from linear algebra. Suppose that A = (A ij ) ij is a real, positive definite k × k matrix. Then the determinant of A can be estimated by the product of the diagonal entries of A, It is not difficult to see from (1.2) that (K N,V (x i , x j )) 1≤i,j≤k is a positive definite matrix and we can apply (2.10). Together with Fubini's theorem we arrive at (2.11) Combining (1.21), (2.11), and (2.9) gives and Lemma 2.1 completes the proof.  2) and (GA)(3) (in the case of infinite L − resp. L + ) it is assumed that there exist L − < a V < b V < L + solving equations (1.6), which implies that the support of the equilibrium measure is contained in the interior of J. In random matrix theory this corresponds to the case of "soft edges" (see assumption [27, (GA) 1 ] and the discussion preceding it).
In the case of L + < ∞ the tail probabilities P N,V (λ max > t) are obviously equal to 0 for t ≥ L + . For b V < t < L + the leading order of P N,V (λ max > t) is provided by the leading order of the integral A computation shows that F N,V (t) describes the leading order of P N,V (λ max > t) if N (L + −t) → ∞ for N → ∞. For all this as well as for the results of Theorems 1.1 and 1.5 it is irrelevant whether L − is finite or infinite (see [36] for more details).
3 Proof of Theorem 1.5 In the same way as Theorem 1.1 followed from Lemma 2.1 the result on superlarge deviations, Theorem 1.5, is a consequence of Let η V and F N,V be given as in (1.5) (see also (1.8)) and (1.14). Then, for sufficiently large values of N , Proof . Again we follow the arguments of Section 2 but we omit the splitting of the domain of integration in the proof of Lemma 2.1. This can be done in the following way. We replace (2.3) by uniformly for all t ≥ b V + 1. Relation (3.1) will be proved in Appendix A (see Theorem A.1, here we need that N is sufficiently large) for all V satisfying (GA) ∞ by adapting the arguments of [27]. The second claim (3.2) is the content of the subsequent lemma. Proof . We begin by comparing η V with V . Using that the equilibrium measure µ V is a probability measure supported on [a V , b V ] it follows from the definition of η V via (1.5), (1.4) that for all x ≥ b V + 1 we have For V satisfying (GA) we know that V (x) grows at least linearly as with f as in (2.6). In order to derive a bound on f we use in addition to (2.7) also .
We are now able to derive for all u ≥ 0 the representation

A Appendix
The purpose of this appendix is to establish asymptotic formula (3.1) that was used in the proof of Theorem where A denotes an invertible complex 2 × 2-matrix [27, equation (4.3)], the vector valued function m is defined by m := k • λ −1 V with k as given in [27,Theorem 1.3(a), case x > 1], and finallyR + := R + • λ −1 V with the matrix valued function R + of [27,Lemma 3.8]. Note that R + also depends on N which is suppressed in the notation.
Let us first evaluate m. Keeping in mind that η V of the present paper equals η V • λ −1 V of [27] and using definition [27, equation (1.18) Before discussingR we can already explain how the leading order description of K N,V (x, x) arises. Writê The contribution to K N,V (x, x) from the first summand (1) = Id, i.e., replace in (A.1) the term R + (y) −1R + (x) by Id and take the limit y → x, is given by and hence equals the leading order term in (3.1). To estimate the contribution from (2) in (A.3) we use Lemma A.2 below that takes the role of [27,Theorem 3.9] in the proof of [27, Theorem 1.5(ii)]. Denote by X 0 the positive number that is introduced in Lemma A.2. Statement (ii) of Lemma A.2 and the fundamental theorem of calculus provideR + (x) −R + (y) = O( |x−y| N xy ) for all x, y > X 0 with x = y. Statement (i) implies that there exists a X 1 ≥ X 0 such that for all y > X 1 the matrixR + (y) is sufficiently close to the identity matrix such thatR + (y) −1 is uniformly bounded for y > X 1 and for all N . It follows from (A.2) that m(x) = e − N 2 η V (x) O(1) uniformly for x > X 0 . The combination of all these estimates shows that the contribution of the second summand (2) in (A.3) to K N,V (x, x), again in the limit y → x, is bounded by uniformly for x > X 1 . Thus we are only left to prove (3.1) uniformly for x ∈ [b V + 1, X 1 ] in case this set is not empty. This, however, follows already from [27, Theorem 1.5(ii)].
We now turn to the analysis of the matrix valued functionR + that equals R + of [27] up to the linear rescaling λ V . The function R is analytic on C \ Σ R , where Σ R denotes an unbounded contour that is sketched in Fig. 1 (cf. [27,Fig. 2] and observe that the rightmost circle is not present since we are in the case J = R). Note that the role of b V is taken by 1 since we consider R instead ofR. The definition of R is rather involved [27, Lemma 3.8] but we do not need it. All that is important for us is that R solves the Riemann-Hilbert problem stated in [27, Lemma 3.8(i)R, (ii)R]. The functions R ± that appear there are defined on Σ R as the limits of R when approaching Σ R from the left resp. right with respect to the orientation of Σ R . This finally answers the question how the functionR + is defined that appears in (A.1). The next piece of information that we use from [27,Lemma 3.8] is the smallness of ∆ R that appears in the jump matrix of the Riemann-Hilbert problem for R, i.e., for ζ ∈ Σ R : and (A.4) This implies for sufficiently large values of N that R has a representation as a Cauchy transform whereμ is the solution of a particular singular integral equation (see [27, Proof of Theorem 3.9, in particular equation (3.35)], cf. [11, Section 7.5] for more background information). From the smallness of ∆ R as expressed in (A.5) it follows that the underlying singular integral operator is of the form Id + O N −1 as an operator on L 2 (Σ R ) and its inverse is thus uniformly bounded for N sufficiently large. This then gives for N sufficiently large. In fact, this is the only argument in the proof of Theorem 1.5 where we use that N is assumed to be big. For the remaining part of our discussion we assume that N satisfies this requirement. There is a difficulty in using (A.6) for our purposes. We are interested in R + (x) for large values of x. Therefore x ∈ Σ R and the singularity in the denominator of (A.6) does not allow for pointwise bounds if we only know that the numerator is in some L q space.
As in [27] we deal with this issue by using the assumed real analyticity of V . Then, by (1.5), (1.4), the function η V is also real analytic. From the definition of ∆ R [27, equations (3.24) and (3.10)] on the relevant part of Σ R , i.e., the rightmost half line in Fig. 1, it follows that the jump matrix Id + ∆ R of the Riemann-Hilbert problem (A.4) also has an analytic extension. In this situation one may deform the contour of the Riemann-Hilbert problem into the region of analyticity of the jump matrix (see, e.g., [11,Section 7.3] for a discussion in a more general setting). Given x large, we use the contour Σ x that is obtained from Σ R by replacing the interval (x − κ x , x + κ x ) by a half circle as shown in Fig. 2. Of course, κ x > 0 has to be chosen such that the lower half disc centered at x and with radius κ x is contained in the domain of analyticity of ∆ R . The jump matrix of the modified Riemann-Hilbert problem is still given by Id + ∆ R on the lower half circle and its solution R x coincides with R except on the lower half disc. Thus we have R + (x) = R x (x) and R + (x) = R x (x). (A.7) Again we may express the solution R x of the modified Riemann-Hilbert problem by a Cauchy transform To ensure that μ x L 2 (Σx) = O(N −1 ) uniformly in x, we need to verify that relation (A.5) holds uniformly if Σ R is replaced by Σ x . Thus we have to estimate ∆ R on the lower half circle. This is the point where we begin using the additional assumptions on V that are formulated in (GA) ∞ (2). It follows from the definition [27, equations (3.24) and (3.10)] that ∆ R (z) = O |e −N η V (z) | for λ V (z) ∈ U(n, x 0 ) and x 0 sufficiently large. Due to (3.3) that continues to hold on U(n, x 0 ) it follows from the lower bound on Re(V ) formulated in (GA) ∞ (2) that also Re(η V (z)) grows at least linearly as Re(z) becomes large. Hence there existsd > 0 such that for λ V (z) ∈ U(n, x 0 ) and x 0 sufficiently large. This clearly implies (A.5) for Σ x with an O-term that is uniform in x.
Correspondingly we write A j resp. B j , j ∈ {1, 2}, for the contributions to the values of R x (x)−Id resp. of R x (x) that stem from integration over Σ (j) x in (A.8) resp. in (A.10). Since ∆ R L 1 (Σx) , μ x L 2 (Σx) , and ∆ R L 2 (Σx) are all of order 1/N , uniformly for sufficiently large x, the numerator in (A.8), (A.10) is also of order 1/N in the L 1 (Σ x )-norm and it follows immediately that for sufficiently large values of x. We turn to the contribution from Σ (2) x . Due to (A.9) and since the length of Σ (2) x is bounded by πx we have that for x large that ∆ R = O xe −Ndx/2 in both the L 1 (Σ (2) x )-and L 2 (Σ (2) x )-norm. Since the distance from x to Σ (2) x is bounded below by the radius κ x we obtain (A.12) Assumption (GA) ∞ (2) ensures that we may choose κ x of order x −n which suffices amply for proving that A 2 resp. B 2 can be bounded in the same way as A 1 resp. B 1 in (A.11). In summary, we have derived for x sufficiently large. SinceR + = R + • λ −1 V these estimates carry over toR + which is precisely the content of hold uniformly for all x > X 0 .