Orthogonal Rational Functions on the Unit Circle with Prescribed Poles not on the Unit Circle

Orthogonal rational functions (ORF) on the unit circle generalize orthogonal polynomials (poles at infinity) and Laurent polynomials (poles at zero and infinity). In this paper we investigate the properties of and the relation between these ORF when the poles are all outside or all inside the unit disk, or when they can be anywhere in the extended complex plane outside the unit circle. Some properties of matrices that are the product of elementary unitary transformations will be proved and some connections with related algorithms for direct and inverse eigenvalue problems will be explained.


Introduction
Orthogonal rational functions (ORF) on the unit circle are well known as generalizations of orthogonal polynomials on the unit circle (OPUC). The pole at infinity of the polynomials is replaced by poles "in the neighborhood" of infinity, i.e., poles outside the closed unit disk. The recurrence relations for the ORF generalize the Szegő recurrence relations for the polynomials.
If µ is the orthogonality measure supported on the unit circle, and L 2 µ the corresponding Hilbert space, then the shift operator T µ : L 2 µ → L 2 µ : f (z) → zf (z) restricted to the polynomials has a representation with respect to the orthogonal polynomials that is a Hessenberg matrix. However, if instead of a polynomial basis, one uses a basis of orthogonal Laurent polynomials by alternating between poles at infinity and poles at the origin, a full unitary representation of T µ with respect to this basis is a five-diagonal CMV matrix [12].
The previous ideas have been generalized to the rational case by Velázquez in [47]. He showed that the representation of the shift operator with respect to the classical ORF is not a Hessenberg matrix but a matrix Möbius transform of a Hessenberg matrix. However, a full unitary representation can be obtained if the shift is represented with respect to a rational analog of the Laurent polynomials by alternating between a pole inside and a pole outside the unit disk. The resulting matrix is a matrix Möbius transform of a five-diagonal matrix.
This paper is a contribution to the Special Issue on Orthogonal Polynomials, Special Functions and Applications (OPSFA14). The full collection is available at https://www.emis.de/journals/SIGMA/OPSFA2017.html The purpose of the paper though is not to discuss quadrature in particular. It is just an example application that does not require much extra introduction of new terminology and notation. The main purpose however is to give a general framework on which to build for the many applications of ORFs. Just like orthogonal polynomials are used in about every branch of mathematics, ORFs can be used with the extra freedom to exploit the location of the poles. For example, it can be shown that the ORFs can be used to solve multipoint moment problems as well as more general rational interpolation problems where locations of the poles inside and outside the circle are important for the engineering applications like system identification, model reduction, filtering, etc. When modelling the transfer function of a linear system, poles should be chosen inside as well as outside the disk to guarantee that the transient as well as the steady state of the system is well modelled. It would lead us too far to also include the interpolation properties of multipoint Padé approximation and the related applications in several branches of engineering. We only provide the basics in this paper so that it can be used in the context of more applied papers.
The interpretation of the recursion for the ORFs as a factorization of a matrix into elementary unitary transformations illustrates that the spectrum of the resulting matrix is independent of the order in which the elementary factors are multiplied. As far as we know, this fact was previously unknown in the linear algebra community, unless in particular cases like unitary Hessenberg matrices. As an illustration, we develop some preliminary results in Section 11 in a linear algebra setting that is slightly more general than the ORF situation.
In the last decades, many papers appeared on inverse eigenvalue problems for unitary Hessenberg matrices and rational Krylov methods. Some examples are [4,30,35,36,37,38,44]. These use elementary operations that are very closely related to the recurrence that will be discussed in this paper. However, they are not the same and often miss the flexibility discussed here. We shall illustrate some of these connections with certain algorithms from the literature in Section 12.
The outline of the paper is as follows. In Section 2 we introduce the main notations used in this paper. The linear spaces and the ORF bases are given in Section 3. Section 4 brings the Christoffel-Darboux relations and the reproducing kernels which form an essential element to obtain the recurrence relation given in Section 5 but also for the PORF in Section 6 to be used for quadrature formulas in Section 7. The alternative representation of the shift operator is given in Section 8 and its factorization in elementary 2 × 2 blocks in the subsequent Section 9. We end by drawing some conclusions about the spectrum of the shift operator and about the computation of rational Szegő quadrature formulas in Section 10. The ideas that we present in this paper, especially the factorization of unitary Hessenberg matrices in elementary unitary factors is also used in the linear algebra literature mostly in the finite-dimensional situation. These elementary factors and what can be said about the spectrum of their product is the subject of Section 11. These elementary unitary transformations are intensively used in numerical algorithms such as Arnoldi-based Krylov methods where they are known as core transformations. Several variants of these rational Krylov methods exist. The algorithms are quite similar yet different from our ORF recursion as we explain briefly in Section 12 illustrating why we believe the version presented in this paper has superior advantages.

Basic def initions and notation
We use the following notation. C denotes the complex plane, C the extended complex plane (one point compactification), R the real line, R the closure of R in C, T the unit circle, D the open unit disk, D = D ∪ T, and E = C \ D. For any number z ∈ C we define z * = 1/z (and set 1/0 = ∞, 1/∞ = 0) and for any complex function f , we define f * (z) = f (z * ).
To approximate an integral where µ is a probability measure on T one may use Szegő quadrature formulas. The nodes of this quadrature can be computed by using the Szegő polynomials. Orthogonality in this paper will always be with respect to the inner product f, g = T f (z)g(z)dµ(z).
The weights of the n-point quadrature are all positive, the nodes are on T and the formula is exact for all Laurent polynomials f ∈ span{z k : |k| ≤ n − 1}.
This has been generalized to rational functions with a set of predefined poles. The corresponding quadrature formulas are then rational Szegő quadratures. This has been discussed in many papers and some of the earlier results were summarized in the book [9]. We briefly recall some of the results that are derived there. The idea is the following. Fix a sequence α = (α k ) k∈N with α = {α k } k∈N ⊂ D, and consider the subspaces of rational functions defined by L 0 = C, L n = p n (z) π n (z) : p n ∈ P n , π n (z) = n k=1 (1 − α k z) , n ≥ 1, where P n is the set of polynomials of degree at most n. These rational functions have their poles among the points in α * = {α j * = 1/α j : α j ∈ α}. We denote the corresponding sequence as α * = (α j * ) j∈N . Let φ n ∈ L n \ L n−1 , and φ n ⊥ L n−1 be the nth orthogonal rational basis function (ORF) in a nested sequence. It is well known that these functions have all their zeros in D (see, e.g., [9,Corollary 3.1.4]). However, the quadrature formulas we have in mind should have their nodes on the circle T. Therefore, para-orthogonal rational functions (PORF) are introduced. They are defined by Q n (z, τ ) = φ n (z) + τ φ * n (z), τ ∈ T, where besides the ORF φ n (z) = pn(z) πn(z) , also the "reciprocal" function φ * n (z) = p * n (z) πn(z) = z n pn * (z) πn(z) is introduced. These PORF have n simple zeros {ξ nk } n k=1 ⊂ T (see, e.g., [9,Theorem 5.2.1]) so that they can be used as nodes for the quadrature formulas I n (f ) = n k=1 w nk f (ξ nk ) and the weights are all positive, given by w nk = 1/ n−1 j=0 |φ j (ξ nj )| 2 (see, e.g., [9,Theorem 5.4.2]).
The purpose of this paper is to generalize the situation where the α j are all in D to the situation where they are anywhere in the extended complex plane outside T. This will require the introduction of some new notation.
So consider a sequence α with α ⊂ D and its reflection in the circle β = (β j ) j∈N where β j = 1/α j = α j * ∈ E. We now construct a new sequence γ = (γ j ) j∈N where each γ j is either equal to α j or β j .
With each of the sequences α, β, and γ we can associate orthogonal rational functions. They will be closely related as we shall show. The ORF for the γ sequence can be derived from the ORF for the α sequence by multiplying with a Blaschke product just like the orthogonal Laurent polynomials are essentially shifted versions of the orthogonal polynomials (see, e.g., [15]).
Because σ α n = σ β n , we can remove the superscript and just write σ n . If we also use the following notation which maps complex numbers onto T Later we shall also use π ν * n to mean n j=1 ν * j . Note that ζ α j = ζ β j * = 1/ζ β j . Moreover if α j = 0 and hence β j = ∞, then α * j (z) = z and β * j (z) = −1. Next define the finite Blaschke products for ν ∈ {α, β} It is important to note that here ν = γ. For the definition of B γ n = B n see below. Like we have split up the denominators π n =π α nπ β n in the α-factors and the β-factors, we also define for n ≥ 1 so that we can define the finite Blaschke products for the γ sequence: Note that the reflection property of the factors also holds for the products: B α n = (B β n ) * = 1/B β n , B n * = 1/B n , and (Ḃ α nḂ β n ) * = 1/(Ḃ β nḂ α n ). However,

Linear spaces and ORF bases
We can now introduce our spaces of rational functions for n ≥ 0: The dimension of L ν n is n + 1 for ν ∈ {α, β, γ}, but note that the dimension ofL ν n for ν ∈ {α, β} can be less than n+1. Indeed some of theḂ ν j may be repeated so that for example the dimension ofL α n is only |a n | + 1 with |a n | the cardinality of a n and similarly for ν = β. Hence for ν = γ: Because for n ≥ 1 Occasionally we shall also need the notatioṅ Lemma 3.1. If f ∈ L n then f /Ḃ β n ∈ L α n and f /Ḃ α n ∈ L β n . In other words L n =Ḃ β n L α n =Ḃ α n L β n . This is true for all n ≥ 0 if we setḂ α 0 =Ḃ β 0 = 1. Proof . This is trivial for n = 0 since then L n = C. If f ∈ L n , and n ≥ 1 then it is of the form .
Recall that β * j = −1 and σ j = 1 if β j = ∞ (and hence α j = 0), we can leave these factors out and we shall write · for the product instead of , the dot meaning that we leave out all the factors for which α j = 1/β j = 0.
and thus The second part is similar. Proof . By our previous lemmaḂ β n L α n−1 =ζ β nḂ β n−1 L α n−1 =ζ β n L n−1 . The second relation is proved in a similar way.
To introduce the sequences of orthogonal rational functions (ORF) for the different sequences ν, ν ∈ {α, β, γ} recall the inner product that we can write with our ( ) * -notation as f, g = T f * (z)g(z)dµ(z) where µ is assumed to be a probability measure positive a.e. on T.
Lemma 3.3. The function φ α nḂ β n belongs to L n and it is orthogonal to the n-dimensional subspaceζ β n L n−1 for all n ≥ 1. Similarly, the function φ β nḂ α n belongs to L n and it is orthogonal to the n-dimensional subspacė ζ α n L n−1 , n ≥ 1.
Proof . First note that φ α nḂ β n ∈ L n by Lemma 3.1. By definition φ α n ⊥ L α n−1 . Thus by Lemma 3.2 and because f, g = Ḃ ν n f,Ḃ ν n g , The second claim follows by symmetry.
Note thatζ β n L n−1 = L n−1 if γ n = α n . Thus, up to normalization, φ α nḂ β n is the same as φ n and similarly, if γ n = β n then φ n and φ β nḂ α n are the same up to normalization.
Lemma 3.4. For n ≥ 1 the functionḂ α n (φ α n ) * belongs to L n and it is orthogonal toζ α n L n−1 . Similarly, for n ≥ 1 the functionḂ β n (φ β n ) * belongs to L n and it is orthogonal toζ β n L n−1 .
We now define the reciprocal ORFs by (recall f * (z For the ORF in L n however we set Note that by definition B n is eitherḂ α n orḂ β n depending on γ n being α n or β n , while in the previous definition we do not multiply with B n but with the productḂ α nḂ β n . The reason is that we want the operation ( ) * to be a map from L ν n to L ν n for all ν ∈ {α, β, γ}. Remark 3.5. As the operation ( ) * is a map from L ν n to L ν n , it depends on n and on ν. So to make the notation unambiguous we should in fact use something like f [ν,n] if f ∈ L ν n . However, in order not to overload our notation, we shall stick to the notation f * since it should always be clear from the context what the space is to which f will belong. Note that we also used the same notation to transform polynomials. This is just a special case of the general definition. Indeed, a polynomial of degree n belongs to L α n for a sequence α where all α j = 0, j = 0, 1, 2, . . . and for this sequence B α n (z) = z n . Note that for a constant a ∈ L 0 = C we have a * = a. Although ( ) * is mostly used for scalar expressions, we shall occasionally use A * where A is a matrix whose elements are all in L n . Then the meaning is that we take the ( ) * conjugate of each element in its transpose. Thus if A is a constant matrix, then A * has the usual meaning of the adjoint or complex conjugate transpose of the matrix. We shall need this in Section 8.
Remark 3.6. It might also be helpful for the further computations to note the following. If p n is a polynomial of degree n with a zero at ξ, then p * n will have a zero at ξ * = 1/ξ. Hence, if ν ∈ {α, β, γ} and φ ν n = p ν n π ν n , then φ ν n * = p ν n * π ν n * = p ν * n π ν * n . We know by [9, Corollary 3.1.4] that φ α n has all its zeros in D, hence p α n does not vanish in E and p α * n does not vanish in D. By symmetry φ β n has all its zeros in E and p β * n does not vanish in E. For the general φ n , it depends on γ n being α n or β n . However from the relations between φ n and φ α n or φ β n that will be derived below, we will be able to guarantee that at least for z = ν n we have φ ν * n (ν n ) = 0 and p ν * n (ν n ) = 0 for all ν ∈ {α, β, γ} (see Corollary 3.11 below).
The orthogonality conditions define φ n and φ * n uniquely up to normalization. So let us now make the ORFs unique by imposing an appropriate normalization. First assume that from now on the φ ν n refer to orthonormal functions in the sense that φ ν n = 1. This makes them unique up to a unimodular constant. Defining this constant is what we shall do now.
Suppose γ n = α n , then φ n and φ α nḂ β n are both in L n and orthogonal to L n−1 (Lemma 3.3). If we assume φ n = 1 and φ α n = 1, hence φ α nḂ β n = φ α n = 1, it follows that there must be some unimodular constant s α n ∈ T such that φ n = s α n φ α nḂ β n . Of course, we have by symmetry that for γ n = β n , there is some s β n ∈ T such that φ n = s β n φ β nḂ α n . To define the unimodular factors s α n and s β n , we first fix φ α n and φ β n uniquely as follows. We know that φ α n has all its zeros in D and hence φ α * n has all its zeros in E so that φ α * n (α n ) = 0. Thus we can take φ α * n (α n ) > 0 as a normalization for φ α n . Similarly for φ β n we can normalize by φ β * n (β n ) > 0. In both cases, we have made the leading coefficient with respect to the basis {B ν j } n j=0 positive since φ α n (z) = φ α * n (α n )B α n (z) + ψ α n−1 (z) with ψ α n−1 ∈ L α n−1 and φ β n (z) = φ β * n (β n )B β n (z) + ψ β n−1 (z) with ψ β n−1 ∈ L β n−1 . Before we define the normalization for the γ sequence, we prove the following lemma which is a consequence of the normalization of the φ α n and the φ β n .
For the normalization of the φ n , we can do two things: either we make the normalization of φ n simple and choose for example φ * n (γ n ) > 0, similar to what we did for φ α n and φ β n (but this is somewhat problematic as we shall see below), or we can insist on keeping the relation with φ α n and φ β n simple as in the previous lemma and arrange that s α n = s β n = 1. We choose for the second option.
Let us assume that γ n = α n . Denote , with p n and p α n both polynomials in P n . Then We already know that there is some s α n ∈ T such that φ n = s α nḂ β n φ α n . Take the ( ) * conjugate and multiply withḂ α nḂ β n to get φ * n = s α nḂ β n φ α * n . It now takes some simple algebra to reformulate φ * n = s α nḂ β n φ α * n as φ * n (z) = ς n p * n (z) π α n (z)π β n (z) = s α n ς n p α * n (z) π α n (z)π β n (z) This implies that p * n (z) = s α n p α * n (z) · j∈bn (−|β j |) and thus that p * n (z) has the same zeros as p α * n (z), none of which is in D. Thus the numerator of φ * n will not vanish at α n ∈ D but one of the factors (1 − β j α n ) fromπ β n (α n ) could be zero. Thus a normalization φ * n (α n ) > 0 is not an option in general. We could however make s α n = 1 when we choose p * n (α n )/p α * n (α n ) > 0 or, since φ α * n (α n ) > 0, this is equivalent with ς n p * n (α n )/π α n (α n ) > 0. Yet another way to put this is requiring that φ * n (z)/Ḃ β n (z) is positive at z = α n . This does not give a problem with 0 or ∞ sincė It is clear that neither the numerator nor the denominator will vanish for z = α n . Of course a similar argument can be given if γ n = β n . Then we choose φ * n (z)/Ḃ α n (z) to be positive at z = β n or equivalently ς n p * n (β n )/π β n (β n ) · j∈an (−|α j |) > 0. Let us formulate the result about the numerators as a lemma for further reference.
Lemma 3.8. With the normalization that we just imposed the numerators p ν n of φ ν n = p ν n /π ν n , ν ∈ {α, β, γ} and n ≥ 1 are related by where as before ς n = n j=1 σ j . Proof . The first expression for γ n = α n has been proved above. The second one follows in a similar way from the relation φ n (z) = φ β * n (z)Ḃ α n (z). Indeed With −σ j α j = −|α j | the result follows. The case γ n = β n is similar.
Thus we have proved the following Theorem. It says that if γ n = α n , then φ n is a 'shifted' version of φ α n where 'shifted' means multiplied byḂ β n : B β n (z)φ n (z) =Ḃ β n (z)[a 0 B α 0 + · · · + a n B α n (z)] = a 0Ḃ β n (z) + · · · + a nḂ α n (z), and a similar interpretation if γ n = β n . We summarize this in the following theorem. Theorem 3.9. Assume all ORFs φ ν n , ν ∈ {α, β, γ} are orthonormal with positive leading coefficient, i.e., Corollary 3.10. We have for all n ≥ 1 that (φ ν n ) * ⊥ ζ ν n L ν n−1 , ν ∈ {α, β, γ}. Corollary 3.11. The rational functions φ α n and φ α * n are in L α n and hence have all their poles in {β j : j = 1, . . . , n} ⊂ E while the zeros of φ α n are all in D and the zeros of φ α * n are all in E. The rational functions φ β n and φ β * n are in L β n and hence have all their poles in {α j : j = 1, . . . , n} ⊂ D while the zeros of φ β n are all in E and the zeros of φ β * n are all in D. The rational functions φ n and φ * n are in L n and hence have all their poles in {β j : j ∈ a n } ∪ {α j : j ∈ b n }.
The zeros of φ n are the same as the zeros of φ α n and thus are all in D if γ n = α n and they are the same as the zeros of φ β n and thus they are all in E if γ n = β n .
Proof . It is well known that the zeros of φ α n are all in D [9, Corollary 3.1.4], and because φ β n = φ α n * , this means that the zeros of φ β n are all in E. Because φ n = (φ α n )Ḃ β n = (φ α n )/ j∈bn ζ α j if γ n = α n , i.e., n ∈ a n , and the product withḂ β n will only exchange the poles 1/α j = β j , j ∈ b n in φ n for poles α j = 1/β j , the zeros of φ α n are left unaltered.
The proof for n ∈ b n is similar.
One may summarize that for f ∈ L ν n the f * transform reflects both zeros and poles in T since z → z * = 1/z, while the transform f → f * as it is defined in the spaces L ν n , ν ∈ {α, β, γ}, keeps the poles but reflects the zeros since the multiplication with the respective factors B α n , B β n anḋ B α nḂ β n will only undo the reflection of the poles that resulted from the f * operation.

Christof fel-Darboux relations and reproducing kernels
For ν ∈ {α, β, γ}, one may define the reproducing kernels for the space L ν n . Given the ORF φ ν k , the kernels are defined by They reproduce f ∈ L ν n by k ν n (·, z), f = f (z). The proof of the Christoffel-Darboux relations goes exactly like in the classical case and we shall not repeat it here (see, e.g., [9, Theorem 3.1.3]).
hold for n ≥ 0, ν ∈ {α, β, γ} and z, w not among the poles of φ ν n and not on T.
As an immediate consequence we have: The following relations hold true: for n ≥ 0 and z, w ∈ (T ∪ {β j : j ∈ a n } ∪ {α j : j ∈ b n }).
Proof . The first relation was directly shown above for the case γ n = α n . It also follows in the case γ n+1 = α n+1 and using in the second CD-relation the first expressions from Theorem 3.9 for φ n+1 and φ * n+1 . The relation is thus valid independent of γ n = α n or γ n = β n . Similarly the second expression was derived before in the case γ n = β n , but again, it also follows from the second CD-relation and the first expressions from Theorem 3.9 for φ n+1 and φ * n+1 in the case γ n+1 = β n+1 . Again the relation holds independently of γ n = α n or γ n = β n .
Alternatively, the second relation can also be derived from the second CD-relation in the case γ n+1 = α n+1 but using the second expressions from Theorem 3.9 for φ n+1 and φ * n+1 .
The latter corollary cannot immediately be used when ν = γ because γ n could be equal to some pole of φ n if it equals some 1/γ j for j < n. In that case we can remove the denominators in the CD relation and only keep the numerators. Hence setting the CD relation becomes .
Thus, the first form gives Evaluating a polynomial at infinity means taking its highest degree coefficient, i.e., if q n (z) is a polynomial of degree n, then q n (∞) stands for its coefficient of z n . The second form of (4.1) gives for γ n+1 = ∞ and γ n = ∞ and Coupling the first and the second form in (4.1) gives For γ n+1 = ∞ and γ n = ∞ we get If γ n+1 = ∞ and γ n = ∞, the denominators in (4.1) have to be replaced by 1, which gives For γ n = ∞ and γ n+1 = ∞ we obtain in a similar way To summarize, the relations of Corollary 4.3 may not hold for the ORF if ν = γ, but similar relations do hold for the numerators as stated in the next corollary.
is the numerator in the CD relation and p n (z) is the numerator of the ORF φ n for the sequence γ then we have for n ≥ 0

Recurrence relation
The recurrence for the φ α n is well known. For a proof see, e.g., [9, Theorem 4.1.1]. For φ β n the proof can be copied by symmetry. However, also for ν = γ the same recurrence and its proof can be copied, with the exception that the derivation fails when p * n (γ n−1 ) = 0 where p n = φ n π n . This can (only) happen if (1 − |γ n |)(1 − |γ n−1 |) < 0 (i.e., one of these γ's is in D and the other is in E). We shall say that φ n is regular if p * n (γ n−1 ) = 0. If ν = α or ν = β then the whole sequence (φ ν n ) n≥0 will be automatically regular. Thus we have the following theorem: Theorem 5.1. Let ν ∈ {α, β, γ} and if ν = γ assume moreover that φ ν n is regular, then the following recursion holds with initial condition φ ν where H ν n is a nonzero constant times a unitary matrix: The constant η ν n1 is chosen such that the normalization condition for the ORFs is maintained. The other constant η ν n2 is then automatically related to η ν n1 by η ν n2 = η ν n1 σ n−1 σ n . The Szegő parameter λ ν n is given by where φ ν n (z) = p ν n (z)/π ν n (z).
Proof . We immediately concentrate on the general situation ν = γ. Of course ν = α and ν = β will drop out as special cases. For simplicity assume that γ n and γ n−1 are not 0 and not ∞.
The technicalities when this is not true are left as an exercise. It is easy because formally it follows the steps of the proof below but one has to replace a linear factor involving infinity by the coefficient of infinity (like 1 − ∞z = −z and z − ∞ = −1) and evaluating a polynomial at ∞ means taking its leading coefficient.
First we show that there are some numbers c n and d n such that This can be written as N (z)/[(z − γ n−1 )π n−1 (z)]. Thus the c n and d n are defined by the conditions N (γ n−1 ) = N (1/γ n−1 ) = 0. If we denote φ k = p k /π k and thus φ * k = p * k ς k /π k , it is clear that Thus the first condition gives and the second one .
Note that p * n−1 (γ n−1 ) cannot be zero by Corollary 3.11, and that also p * n (γ n−1 ) does not vanish by our assumption of regularity.
Furthermore, by using the orthognality of φ n and φ n−1 and Corollary 3.10, it is not difficult to show that φ ⊥ L n−2 so that it must be identically zero. Thus .
By taking the ( ) * transform (in L n ) we obtain This proves the recurrence by taking e n = |d n | and η n1 = σ n−1 u(d n ). It remains to show that the initial step for n = 1 is true.
This implies that λ 1 is indeed given by the general formula because In case γ 0 = β 0 = ∞, then ζ 0 = 1/z, so that and thus and again λ 1 is given by the general formula This proves the theorem.
Remark 5.2. If ν ∈ {α, β} we could rewrite λ ν n in terms of φ ν n because by dividing and multiplying with the appropriate denominators π ν n one gets Note that also this η ν n ∈ T, but it differs from the η ν n in the previous theorem. However if ν = γ, then this expression has the disadvantage that γ n−1 could be equal to 1/γ n or it could be equal to a pole of φ n in which case it would not make sense to evaluate these expressions in γ n−1 . The latter expressions only make sense if we interpret them as limiting values and where one has to assume that lim ξ→0 [ξ/ξ] = 1. We shall from now on occasionally use these expressions with this interpretation, but the expressions for λ ν n from Theorem 5.1 using the numerators are more direct since they immediately give the limiting values.
Note that λ α n is the value of a Blaschke product with all its zeros in D evaluated at α n−1 ∈ D and therefore λ α n ∈ D. Similarly, λ β n is the value of a Blaschke product with all its zeros in E, evaluated at β n−1 ∈ E so that λ β n ∈ D. Since the zeros of φ n are the zeros of φ α n if n ∈ a n and they are the zeros of φ β n if n ∈ b n , it follows that if n and n − 1 are both in a n or both in b n , then λ n ∈ D but if n ∈ a n and n − 1 ∈ b n or vice versa, then λ n ∈ E. Therefore and we can choose e n as the positive square root of this expression. The above expression is derived in [9, Theorem 4.1.2] for the case ν = α by using the CD relations. By symmetry, this also holds for ν = β. For ν = γ, the same relation can be obtained by stripping the denominators as we explained after the proof of the CD-relation in Section 4. What goes wrong with the recurrence relation when φ n is not regular? From the proof of Theorem 5.1, it follows that then d n = 0. We still have the relation ∈ T and p * n (γ n−1 ) = 0. Thus, there is some positive constant e n and some η n1 ∈ T such that i.e., the first term in the sum between square brackets vanishes. Applying Theorem 5.1 in this case would give λ n = ∞, and the previous relations show that we only have to replace in Theorem 5.1 the matrix This is in line with how we have dealt with ∞ so far where the rule of thumb was to set So let us therefore also use Theorem 5.1 with this interpretation when φ n is not regular and thus λ n = ∞. With the expressions at the end of Section 4, it can also be shown that in this case Note that this corresponds to replacing 1 − |λ n | 2 when λ n = ∞ by −1. Since this non-regular situation can only occur when (1 − |γ n |)(1 − |γ n−1 |) < 0, this expression for e 2 n is indeed positive. A similar rule can be applied if γ n or γ n−1 is infinite, just replace in this or previous expression 1 − |∞| 2 by −1. The positivity of the expressions for e 2 n also follows from the following result. Theorem 5.3. The Szegő parameters satisfy for all n ≥ 1: If γ n = α n and γ n−1 = α n−1 then λ n = λ α n = λ β n ∈ D. If γ n = β n and γ n−1 = β n−1 then λ n = λ β n = λ α n ∈ D. If γ n = α n and γ n−1 = β n−1 then λ n = 1/λ β n = 1/λ α n ∈ E. If γ n = β n and γ n−1 = α n−1 then λ n = 1/λ α n = 1/λ β n ∈ E. Proof . Suppose γ n = α n and γ n−1 = α n−1 , then by Theorems 5.1 and 3.9, or better still by Lemma 3.8, When using p α n (z) = ς n p β * n (z) · n j=1 (−|α j |) and α j = 1/β j , the previous relation becomes The proof for γ n = β n and γ n−1 = β n−1 , n ≥ 1 is similar. Next consider γ n = α n and γ n−1 = β n−1 , then The remaining case γ n = β n and γ n−1 = α n−1 , is again similar.
Remark 5.4. It should also be clear that the expression for the parameters λ ν n of Theorem 5.1 are for theoretical use only. They are expressed in terms of p ν n , which is the numerator of φ ν n , the object that should be the result of the computation, and hence unknown at the moment of computing λ ν n . For practical use, these parameters λ ν n should be obtained in a different way. In inverse eigenvalue problems the recursion is used backwards, i.e., for decreasing degrees of the ORFs and then these expressions can be used of course.
Even if we know the λ ν n , then in Theorem 5.1, we still need the normalizing factor η ν n1 ∈ T which is characterized by "η ν n1 is chosen such that the normalization condition or the ORFs is maintained". We could compute φ ν n with η ν n1 = 1, check the value of φ ν * n (γ n−1 ), and derive η ν n1 from it. What this means is shown in the next lemma.
Lemma 5.5. For ν ∈ {α, β}, the phase θ ν n of the unitary factor η ν n1 = e iθ ν n is given by or equivalently Proof . Take the first relation of Theorem 5.1 and evaluate for z = ν n−1 , then because .
Note that this expression for η ν n1 is well defined because φ ν * n (ν n−1 ) = 0 if ν ∈ {α, β}. For ν = γ, the expression is a bit more involved but it can be obtained in a similar way from the normalization conditions given in Theorem 3.9. We skip the details.
Another solution of the recurrence relation is formed by the functions of the second kind. Like in the classical case (i.e., for ν = α) we can introduce them for ν ∈ {α, β, γ} by (see [9, p. 83 where D(t, z) = t+z t−z and E(t, z) = D(t, z) + 1 = 2t t−z . This results in which may be generalized to (see [9,Lemma 4 with f arbitrary in L ν (n−1) * . It also holds that (see [9,Lemma 4.2.3]) with g arbitrary in L ν n * (ν n * ). Recall that L ν n * (ν n * ) is the space of all functions in L ν n * that vanish for z = ν n * = 1/ν n . This space is spanned by {B ν k /B ν n : k = 0, . . . , n − 1} if ν ∈ {α, β}. For ν = γ, the space can be characterized as (see Lemma 3.1) Theorem 5.6. The following relations for the functions of the second kind hold for n ≥ 0: We assume the normalization of Theorem 3.9.
Proof . This is trivial for n = 0, hence suppose n ≥ 1 and γ n = α n then Moreover, using φ α n * = φ β n we also have and sinceḂ α n ∈ L β n * (β n * ), we also get the second part: It follows in a similar way that ψ * n = ψ β nḂ α n . The case γ n = β n is proved similarly.
With these relations, it is not difficult to mimic the arguments of [9, Theorem 4.2.4] and obtain the following.

Para-orthogonal rational functions
Before we move on to quadrature formulas, we define para-orthogonal functions (PORF) by The PORF Q ν n (z, τ ν n ) is in L ν n and it is called para-orthogonal because it is not orthogonal to L ν n−1 but it is orthogonal to a particular subspace of L ν n−1 of dimension n − 1.
One may also define associated functions of the second kind as From Theorems 3.9 and 5.6 we can easily obtain the following corollary.
Corollary 6.2. With the notation Q n and P n for the PORF and the associated functions just introduced, we have for n ≥ 1 Proof . Assume that γ n = α n , then φ n = φ α nḂ β n and φ * n = (φ α n ) * Ḃ β n . Thus In a similar way one has The proofs for P n and for γ n = β n are similar.
We are now ready to state that the zeros of the para-orthogonal rational functions Q ν n (z, τ ν n ) will be simple and on T no matter whether ν = α, β or γ. Corollary 6.3. Take n ≥ 1, τ n ∈ T and ν ∈ {α, β, γ} then the zeros of Q ν n (z, τ n ) are all simple and on T.
Proof . The case ν = α was proved in [9, Theorem 5.2.1] and by symmetry, this also holds for ν = β. For ν = γ, assume that γ n = α n , then by the previous corollary Q n (z, τ n ) = B β n (z)Q α n (z, τ n ) so that the zeros of Q n (z, τ n ) are the zeros of Q α n (z, τ n ) (and hence will not cancel against any of the poles).
The proof for γ n = β n is similar.
These PORF properties are important because the zeros will deliver the nodes for the rational Szegő quadrature as we will explain in the next section.
For further reference we give the following property.
Proposition 6.4. For n ≥ 1 and ν ∈ {α, β, γ}, the PORF satisfy, using the notation of Proof . Just take the recurrence relation of Theorem 5.1 and premultiply it with [1 τ n ]. After re-arrangement of the terms, the expression above will result.
The importance of this property is that, up to a constant nonzero factor c ν n , we can compute Q ν n (z, τ n ) by exactly the same recurrence that gives φ ν k in terms of φ ν k−1 and φ ν * k−1 for k = 1, 2, . . . , n, except that we have to replace the trailing parameter λ ν n by a unimodular constant τ ν n to get Q ν n (z, τ n ). When we are interested in the zeros of Q ν n (z, τ n ), then this normalizing factor c ν n does not matter since it is nonzero.

Quadrature
We start by proving that the subspace of rational functions in which the quadrature formulas will be exact only depends on the points {α k : k = 0, . . . , n − 1} no matter whether the points α k are introduced as a pole α k or α k * = 1/α k in the sequence γ n = (γ k ) n k=0 .
Proof . The space L n contains rational functions of degree at most n whose denominators have zeros in {β j : j ∈ a n } ∪ {α j : j ∈ b n }. The space L n * contains rational functions of degree at most n whose denominators have zeros in {β j : j ∈ b n } ∪ {α j : j ∈ a n }. Thus the space R n contains rational functions of degree at most 2n whose denominators have zeros in {β j : j = 1, . . . , n} ∪ {α j : j = 1, . . . , n}. Since the denominators of functions in L α n have zeros in {β j : j = 1, . . . , n} and functions in L α n * have zeros in {α j : j = 1, . . . , n}, the denominator of functions in R α n have zeros in {β j : j = 1, . . . , n} ∪ {α j : j = 1, . . . , n} so that R n = R α n . Of course a similar argument shows that also R n = R β n .
As is well known from the classical case associated with the sequence α discussed for example in [9,Chapter 5], the rational Szegő quadrature formulas are of the form where ξ α nj = ξ α nj (τ α n ), j = 0, . . . , n − 1 are the zeros of the para-orthogonal rational function Q α n (z, τ α n ) with τ α n ∈ T and the weights are given by These formulas have maximal degree of exactness, meaning that and that no n-point interpolatory quadrature formula with nodes on T and positive weights can be exact in a larger (2n − 1)-dimensional space of the form L α n · L α (n−1) * or L α n * · L α n−1 . They are however exact in a subspace of dimension 2n − 1 of a different form as shown in [39] and [17] for Laurent polynomials and in [11] for the rational case.
Our previous results show that taking an arbitrary order of selecting the γ does not add new quadrature formulas. Indeed, if γ n = α n then the zeros of Q n (z, τ n ), Q α n (z, τ n ) and Q β n (z, τ n ) are all the same, i.e., ξ nj (τ n ) = ξ α nj (τ n ) = ξ β nj (τ n ), j = 1, . . . , n. Dropping the dependency on τ from the notation, it is directly seen that and thus we also have w nj = w α nj and similarly w nj = w β nj . Therefore I n = I α n = I β n for appropriate choices of the defining parameters, i.e., τ α n = τ n and τ β n = τ n if γ n = α n , τ β n = τ n and τ α n = τ n if γ n = β n .
An alternative expression for the weights is also known ([9, Theorem 5.4.1]) and it will obviously also coincide for ν ∈ {α, β, γ}: where Q ν n is the PORF with zeros {ξ ν nj } n−1 j=0 and P ν n the associated functions of the second kind as in Corollary 6.2. We have dropped again the obvious dependence on the parameter τ ν n from the notation.
A conclusion to be drawn from this section is that whether we choose the sequence γ, α or β, the resulting quadrature formula we obtain is an n-point formula that is exact in the space R n−1 = L n−1 ·L (n−1) * . Such a quadrature formula is called an n-point rational Szegő quadrature and these are the only positive interpolating quadrature formulas exact in this space. They are unique up to the choice of the parameter τ n ∈ T. Thus, to obtain the nodes and weights for a general sequence γ and a choice for τ n ∈ T, we can as well compute them for the sequence α and an appropriate τ α n ∈ T or for the sequence that alternates between one α and one β. The resulting quadrature will be the same.
In practice nodes and weights of a quadrature formula are computed by solving an eigenvalue problem. It will turn out that also in these computations, nothing is gained from adopting a general order for the γ sequence but that a sequence alternating between α and β (the balanced situation) is the most economical one. We shall come back to this in Section 10 after we provide in the next sections the structure of the matrices whose eigenvalues will give the nodes of the quadrature formula.
Rational interpolatory and Szegő quadrature formulas with arbitrary order of the poles including error estimates and numerical experiments were considered in [19].

Block diagonal with Hessenberg blocks
The orthogonal polynomials are a special case of ORF obtained by choosing a sequence γ = α that contains only zeros. Another special case is given by the orthogonal Laurent polynomials (OLP) obtained by choosing an alternating sequence γ = (0, ∞, 0, ∞, 0, ∞, . . .). In [47], Velázquez described spectral methods for ORF on the unit circle that were based on these two special choices. The result was that the shift operator T µ : L 2 µ → L 2 µ : f (z) → zf (z) has a matrix representation that is a matrix Möbius transform of a structured matrix. The structure is a Hessenberg structure in the case of γ = α and it is a five-diagonal matrix (a so-called CMV matrix) for the sequence γ = (0, β 1 , α 2 , β 3 , α 4 , . . .). In the case of orthogonal (Laurent) polynomials the Möbius transform turns out to be just the identity and we get the plain Hessenberg matrix for the polynomials and the plain CMV matrix for the Laurent polynomials.
In [16], the OLP case was discussed using an alternative linear algebra approach when the 0, ∞ choice did not alternate nicely, but the order in which they were added was arbitrary. Then the structured matrix generalized to a so-called snake-shape matrix referring to the zig-zag shape of the graphical representation that fixed the order in which the elementary factors of the matrix had to be multiplied. This structure is a generalization of both the Hessenberg and the CMV structures mentioned above. It is a block diagonal where the blocks alternate between upper and lower Hessenberg structure.
To illustrate this for our ORF, we start in this section by using the approach of Velázquez in [47], skipping all the details just to obtain the block structure of the matrix that will represent the shift operator. In the next sections we shall use the linear algebra approach of [16] to analyze the computational aspects of obtaining the quadrature formulas.
We start from the recurrence for the φ α k and transform it into a recurrence relation for the φ k , which will eventually result in a representation of the shift operator.
To avoid a tedious book-keeping of normalizing constants, we will just exploit the fact that there are some constants such that certain dependencies hold. For example the recurrence φ α n (z) = e n η n1 will be expressed as Similarly, by combining the recurrence for φ α n and φ α * n from Theorem 5.1 and eliminating φ α n−1 , we have α n φ α * n ∈ span α n φ α n , α n−1 φ α * n−1 .
We first use (8.6) for m = n to write α * n φ n = α * nḂ β n φ α * n = σ n α nḂ β n−1 φ α * n , which implies that we shall need expressions for α n φ α * n . We use repeatedly (8.3) in combination with the previous relation to get where in the last step we used γ l = α l and henceḂ β l−1 =Ḃ β l andḂ β l−1 φ α l = φ l . Thus if n > m, also γ n−1 = β n−1 so that also the first term is α n−1 φ n−1 and we have found that Now we are left with the remaining case n = m, i.e., γ n−1 = α n−1 while γ n = β n . This is the missing link that should connect a β block to the previous α block at the boundary . . . , α m−1 , β m , . . ..

Using (8.3) repeatedly we get
After multiplying withḂ β k =Ḃ β k+1 = · · · =Ḃ β n−1 we get We plug this in our previous expression for α * n φ n and we arrive at To summarize, (8.4) and (8.7) in the case γ n = α n and (8.8) and (8.9) in the case γ n = β n show short recurrences for the φ p that fully rely on the recursion for the α-related quantities: φ α n and φ α * n . They use only factors α p and α * p and in the recurrence relations only the rational Szegő parameters λ α p for the α sequence are used. The longest recursions are needed to switch from a α to a β block, which is for n ∈ {m − 1, m} as given in (8.7) and (8.9) since these need all elements form both blocks.
This analysis should illustrate that as long as we are in a succession of α's we are building up an upper Hessenberg block. At the instant the α sequence switches to a β sequence one starts building a lower Hessenberg block, which switches back to upper Hessenberg when again α's If we alternate between one α and one β's we get the so called CMV type matrix which is a five-diagonal matrix with a characteristic block structure a given in Fig. 2.
Suppose we start with a set of α's, and set and let us recycle our earlier notations n (z) = 1 − α n z and * n (z) = z − α n to now act on an arbitrary operator T by doing the transform for the whole sequence α simultaneously as follows: A (T ) = I − T A * and * A (T ) = T − A where I is the identity operator. Then (at least formally) we have with G having the structure shown in Fig. 1. In the special case that the α's and β's alternate as in we get the five-diagonal matrix as in [12] for polynomials and for ORF in [47]. Since for k ≥ 1 each α 2k−1 -column is the last in an α block and each β 2k -column is the first in a β block, we get a succession of 4 × 2 blocks that shift down by two rows as illustrated by Fig. 2.
The particular role of the last column in an α block and the first column in a β block is due to the fact that we have chosen to derive these relations for the φ k from the recurrence for the φ α k leading to factors A and * A and the use of parameters λ α k . A symmetric derivation could have been given by starting from the φ β k recursion, in which case the β blocks will correspond to upper Hessenberg blocks and the α blocks to lower Hessenberg blocks. Then the longer recurrence would occur in the last column of the β block and the first column of the α block. The shape of the matrix would be the transposed shape of the figures shown here. However while it is rather simple to deal with α j = 0 in the current derivation, it requires dealing with more technical issues to write down the formulas with the corresponding β j equal to ∞. Therefore we stick to the present form. We can define as in [47] an operator Möbius transform of an operator T by and its inverse Recall the meaning of the ( ) * notation for a (constant) matrix from Remark 3.5. Note that A is like A , except that the minus sign is changed into a plus sign and T A * is changed into A * T . It should also be clear that the alternating case (i.e., the case of the CMV representation) gives the smallest possible bandwidth, as was also reported in [12]. Now it is not difficult to see that (8.10) is equivalent to To see this, exactly the same argument as used in [47] to derive relation (19) of that paper can be applied here. Instead we give a direct derivation. The relation (8.10) can be written as where we used the fact that the matrix η A = η A * and its inverse commute with any diagonal matrix. The matrices G and G have the same shape, but G is rescaled to make it isometric. So we see that on L = span{φ 0 , φ 1 , . . .} the operator zI has a matrix representation ζ A (G) with respect to the φ-basis. We shall have a more careful discussion of this fact in the next section.

Factorization of a general CMV matrix
It is well known (see [6,47] in the rational case and [16] for the polynomial case) that the Hessenberg matrix G = H (obtained when γ = α) can be written as an infinite product of elementary matrices, i.e., where I k−1 and I ∞ are the identity matrices of sizes (k − 1) and ∞ respectively, I 0 is the empty matrix, δ k = λ α k is the k-th Szegő parameter and η k := 1 − |δ k | 2 . Also the CMV matrix C ε associated with the alternating sequence γ = ε where γ 2k−1 = α 2k−1 and γ 2k = β 2k , for all k ≥ 1 uses the same elementary transforms, but they are multiplied in a different order: To find explicit expressions for these elementary factors in our case, taking into account the proper normalization, requires a more detailed analysis. We start with the following Eliminate the term α * n−1 φ α n−1 with the first line of the recurrence and after inserting the value of e α n from (5.1) and rearranging, we get This is equivalent with the formula (9.2). Derivation of the other form (9.3) is obvious.
Note that G α n is unitary but G α n is not unless |λ α n | = 1 or α n = α n−1 .
We use vertical bars to separate the different blocks. These separations are of course clear from the notation α and β, but since that will not so obvious when we denote the corresponding functions in the vectors below, we introduced them here and keep them at the corresponding positions in the vectors that we are about to introduce now. That should improve readability of the formulas.
We consider the vector Φ = [φ 0 , φ 1 , φ 2 , . . .] so that Apply successively the G ν n to the right on the Φ vector. Keep an increasing order for the factors within an α block and a decreasing order for the factors covering a β block. Thus for the example given above the order will be γ = (α 0 , α 1 , . . . , α n−1 β n , . . . , β k−1 |α k α k+1 , . . . , α m−1 β m , . . . , β −1 |α α +1 , . . .), with G ν n and G ν n , ν ∈ {α, β} as in (9.8) and similarly G n ν = η −1 A G n ν η A , for n = 1, 2, . . .; ν ∈ {α, β} and Note that we have aligned these formulas properly so that a shift in the blocks from the first to the second line in (9.9) is made clear. The rationale is that in a G n β group the indices are decreasing from left to right and in a G n α group the indices increase from left to right. This is what defines the shape of the result. We used the vertical bars to separate α and β blocks in the sequence γ as before. Now we use double vertical bars to indicate what corresponds to the G n α and G n β blocks on the third line. On the second line in (9.9) you may note that the first α in an α block of the γ sequence corresponds to a G α factor that migrates to the front of the product group of previous β block. This is due to the fact that we analysed the γ sequence using the parameters λ α k of the α recurrence (recall the definitions of the G α k and G β k from Theorem 9.1 and equation (9.6)).
To see the effect of applying all these elementary operations, we start with the first block G 1 α . Using (9.5) when multiplying Φ( While multiplying this result with G α k G β k−1 · · · G β n+1 , we make use of (9.5) and (9.7) to obtain α * 0 φ 0 , . . . , α * n−2 φ n−2 , α n−1 φ α * n−1Ḃ β n−1 α * n φ α nḂ n , α * n+1 φ n+1 , . . . , α * k−1 φ k−1 α k φ * k , α k+1 φ k+1 , . . . and the remaining multiplication G β n of G 1 β links the current β block to the previous α block, resulting in The next block is again an α block treated by the product G 2 α , and one may continue like this to finally get since η A is invertible. This G can be factored as the product of elementary matrices having only one non-trivial 2 × 2 diagonal block: we made G as well as these elementary factors G k unitary.
These elementary factors are grouped together in blocks as a consequence of the α and β blocks in the γ sequence. For example G 2 β = G α G β −1 · · · G β m corresponds to a β block. Because > m there must be at least two factors in such a β block: the first one is G α l and the last one is a G β m . However for an α block we have the product G 2 α = G α k+1 · · · G α m−1 thus if k = m − 1 then the initial index k + 1 is larger than the end index m − 1. Thus if an α block has only one element then there are no factors in this product, which means that this G α is just the identity.
In the case of γ = α (no β's), then there is of course only one infinite block so n = ∞ and there is no k, m, , . . . and is the familiar upper Hessenberg matrix, and in the case of alternating α-β sequence, we have in the case α 1 , β 2 , α 3 , β 4 , . . .
where we grouped the product of the G α k factors as C α o and of the G β k factors as C β e . This rearrangement of the factors is possible because the blocks commute when their subscript differs by at least two. If we define S e = diag 1, S 2 , S 4 , S 6 , . . . , S 2k = diag(σ 2k , 1), then of course In the case of an alternating sequence of the form β 1 , α 2 , β 3 , α 4 , . . ., then As in the previous case this is These are the classical CMV matrices as also given in [47], except for the S factors, which were not in [47]. That is because we have used a particular normalization for the φ n -basis. A slightly different normalization of the φ n -basis will remove the S factors. Indeed, replacing all φ n by ϕ α n = ς β n φ n , where To see this, note that the relation (9.6) between G β n and G α n can also be written as (multiply with σ n σ n = 1) The first and the last of these factors can then be moved to the ϕ α n so that the β-relation (9.7) now becomes Note that the multiplication on the right is now with G α n and not with G β n anymore. For the α-case, i.e., γ n = α n nothing essentially changes since thenς β n =ς β n−1 . It should be clear that following the same arguments used above, the relation (9.11) then becomes and G α is exactly like G in (9.10), except that all G β j should be replaced by a G α j , that is, replace all the G β k factors in G by the corresponding G α k by removing the S k factors in (9.6) and G becomes G α . From (9.11) we derive that which is the matrix representation of the shift operator in L with respect to the basis (φ 0 , φ 1 , φ 2 , . . .). With respect to the basis Φ α , the expression is the same except that G should be replaced by G α . We summarize our previous results.
Theorem 9.2. In the general case of a sequence γ, then multiplying the vector of basis functions Φ(z) = [φ 0 , φ 1 (z), φ 2 (z), . . .] by z corresponds to multiplying it from the right with the infinite matrix where A = diag(α 0 , α 1 , . . .), η A = √ I − A * A and G is a product of unitary factors G k defined in (9.8) where the order of multiplication is from left to right in a block of successive α-values and from right to left in a block of successive β-values as explained in detail above.
For the sequence α we get the classical Hessenberg matrix The classical CMV matrices are obtained for α 1 , β 2 , α 3 , β 4 , . . . as and in the case β 1 , α 2 , β 3 , α 4 , . . ., then It we use the slightly different orthonormalized basis (ϕ α n ) n∈N , then the previous relations still hold true, except that all G β j can be replaced by G α j .
Note that λ α n = 0 does not give any problem for these formulas of the G-factors. If γ = α then all φ α n are regular, but for a general sequence γ, it is possible that φ n is not regular. Recall that then λ n = ∞ but by Theorem 5.3 this means that λ α n = 0 and thus there is no problem in using the G-matrices introduced above, even if the ORF sequence is not regular.
Another thing to note here is that, as we remarked before, the factors G ν k are unitary, although the factors G ν k = η −1 A G ν k η A are in general not unitary. Moreover Note that η A is invertible but A is not (since for example α 0 = 0) and that A and η A commute. To see that the relation holds, we multiply out and bring everything to one side, which gives So we have to prove that this is zero. Because η −2 A A = Aη −2 A and η −2 A A * = A * η −2 A , the first two terms vanish. The four remaining terms can be multiplied from the left and the right with Because η A A * = A * η A and Aη A = Aη A , the last term equals A * A and combined with the second term −I it equals −η 2 A . Multiply everything again with η −1 A from the left and from the right gives The same holds for the infinite case, i.e., in the whole of L 2 µ if L is dense (see also [47]). When using the recursion for the φ n with respect to a general sequence γ, Theorem 5.1 completed by Lemma 5.5 says what the Szegő parameters λ n are. Note however that for the basis Φ corresponding to a general sequence γ, the matrix G had factors G α k and G β k that were all defined in terms of λ α k and not λ k (see Theorem 9.1 and equation (9.6)). Theorem 5.3 explains that, depending on γ n and γ n−1 an λ n is either equal to λ α n or λ α n or the inverses of these. Thus it should be no problem to obtain all the λ α n from the λ n or vice versa. To illustrate how to use the quantities for a general γ sequence in generating the G α n matrix we consider the following example. If n − 1 ∈ b n and n ∈ a n , then γ n = α n , γ n−1 = β n−1 , λ α n = 1/λ n , φ α n = φ n /Ḃ β n , and φ α * n = φ * n /Ḃ β n . This allows us to express η α n1 and all the other elements of G n in terms of the γ-related elements. If we assume that φ n is regular, then where u n = (λ n /|λ n |)η α n1 ∈ T. It still requires η α n1 in terms of γ-elements. This is a bit messy but still possible using φ α * n = φ * n /Ḃ β n and Lemma 5.5. The factor 1/Ḃ β n has an essential role here because we have to evaluate φ α * n (z) for z = α n−1 , and if 1/α n−1 = β n−1 ∈ {γ k : k ∈ b n }, which means that α n−1 has been introduced as a pole in a previous step, then this pole in φ * n will cancel against the same pole inḂ β n . Thus we should use a limiting process or expand φ α * n = φ * n /Ḃ β n or make the cancellation explicit as we did for example in equation (3.1). We skip further details.
If n is not a regular index, then λ n = ∞ and in that case the matrix G α n has the form

Spectral analysis
The full spectral analysis of the matrices ζ A (G) and how this relates to the shift operator in L 2 µ is given in [47] for the case γ = α (then G = H is a Hessenberg matrix) and in the case of the alternating γ = ε where either ε 2k = α 2k and ε 2k+1 = β 2k+1 or ε 2k = β 2k and ε 2k+1 = α 2k+1 (then G = C is a CMV matrix). Velázquez shows for the cases ν ∈ {α, ε} that if T µ is the shift operator on L 2 µ , (i.e., multiplication with z in L 2 µ ) and if L is the closure of L ∞ in L 2 µ , then the matrix representation of T µ L with respect to the basis Φ α = [ϕ α 0 , ϕ α 1 , ϕ α 2 , . . .] is given by ζ A (G α ) (see [47,  We mention here the basis (ϕ α k ) k∈N because in [47] only one type of G α k matrices is used. As we have explained, this is related to the precise normalization used for the basis functions, but as long as the basis is orthogonal and has norm 1, a further normalization with a unimodular constant will not influence the spectral properties of the matrix representations so that similar results hold for the basis Φ and the matrix ζ A (G).
If moreover L is dense in L 2 µ and hence L = L 2 µ , then ζ A (G) is not only isometric but it is a representation of the full matrix. For α compactly included in D this happens for γ = α when log µ ∈ L 1 and in the case γ = ε when In [47] it is also explained how the spectrum of ζ A (G) = η −1 A * A (G) A (G) −1 η A relates to the spectrum of the pencil ( * A (G), A (G)). In the case of an alternating sequence ε, the matrix G was a CMV matrix which we denoted as C. It was then possible to regroup the even and odd elementary factors so that C = C α o C β e or C = C α e C β o with C ν o = G ν 1 G ν 3 G ν 5 · · · and C e = G ν 2 G ν 4 G ν 6 · · · for ν ∈ {α, β}. As a consequence, because all factors are unitary, the previous result could be formulated in terms of the spectral properties for the pencils (AC β * e + C α o , C β * e + A * C α o ) or (AC β * o + C α e , C α * o + A * C α e ). In the general case, such a regrouping is possible, but rather complicated because it will depend on the sizes of the α and β blocks in the γ sequence, i.e., on the number of elementary G-factors that appears in the product in increasing or decreasing order (see (9.9)).
There are also spectral interpretations for the eigenvalues of the finitely truncated sections of the matrices. Again, a careful discussion is given in [47] for this. If for an operator T we denote its truncation to L n as T n = P n T L n with P n the orthogonal projection operator on L n , then Velázquez shows that the zeros of the φ n+1 are the eigenvalues of the finite matrices V n = ζ An (G n ) with G n the leading principle submatrix of size (n + 1) of the matrix G and A n = diag(α 0 , . . . , α n ). As illustrated in [47, equation (19)], G n is the product of unitary matrices G k of the form (9.1) for k = 1, . . . , n + 1 except that G n+1 is 'truncated' and is of the form diag(I n ⊕ −δ n+1 ) where δ n+1 (up to a unimodular constant) is λ n+1 and thus this truncated factor G n+1 is not unitary because λ n+1 ∈ T in general. For γ = α this is proved in [47,Theorem 4.5] and for γ = ε in [47,Theorem 5.8].
The same arguments can be used for a general sequence γ. Indeed (9.5) and (9.7) show that in the truncated version we shall end up with where χ n+1 is of the form c n+1 φ α n+1Ḃ β n+1 or c n+1 φ α * n+1Ḃ β n+1 depending on whether the truncation falls inside and α block or a β block. In both cases, this is a constant multiple of φ n+1 (see Theorem 3.9). Like in [47] this can be transformed into which shows that a zero of φ n+1 corresponds to an eigenvalue of V n = ζ An (G n ). Thus the following theorem is still valid for a general γ. Theorem 10.3. For the general sequence γ let G n be the truncation of size n+1 of the matrix G, and define V n = ζ An (G n ), then the eigenvalues of V n will be the zeros of φ n+1 . The (left) eigenvector corresponding to an eigenvalue ξ is given by Φ n (ξ) = [φ 0 (ξ), . . . , φ n (ξ)].
For the unitary truncation, we first form a unitary truncation G u n of G n by keeping all G k with k < n + 1 and replacing G n+1 by G u n+1 = diag(1, . . . , 1, τ ) for some τ ∈ T. We shall then get (recall the definition of the PORF Q n+1 (z, τ ) in (6.1)) depending on whether the truncation falls inside and α block or a β block. By Corollary 6.2 this is proportional to Q n+1 (z, τ ) in both cases for some appropriate choice of τ . Therefore eigenvalues of U n = ζ An (G u n ) will give the nodes of the (n + 1)-point rational Szegő quadrature formula for some particular choice of the τ -parameter.
Again this is worked out in detail in [47] for the Hessenberg and CMV matrices. As in the previous theorem, also for the unitary truncations we can use similar arguments for a general sequence and we therefore have also Theorem 10.4. For the general sequence γ let G u n be the unitary truncation of size n + 1 of the matrix G, and define U n = ζ An (G u n ), then the eigenvalues of U n will be the zeros of the PORF Q n+1 for some value of the τ -parameter. The (left) eigenvector corresponding to an eigenvalue ξ is given by Φ n (ξ) = [φ 0 (ξ), . . . , φ n (ξ)].
Remark 10.5. Since U n is unitary, its right eigenvectors will be the conjugate transpose of the left eigenvectors. Thus if F is the matrix whose entry at position (i, j) is φ j (ξ i ), and Λ = diag(ξ 0 , . . . , ξ n ) contains the eigenvalues of U n , then F U n = ΛF . A row of F is [φ 0 (ξ i ), . . . , φ n (ξ i )] which is not normalized in 2 sense, but we know that φ 0 = 1. In linear algebra it is more usual to write the eigenvalue decomposition of a unitary (or normal) matrix U n as U * U n = ΛU * with U unitary, thus with orthonormal rows and columns. Thus if D is the diagonal matrix whose diagonal entries are the elements on the first row of U (which in our case are all nozero) In other words, the weights w i of the quadrature formula are given by the modulus square of the elements in the first row of U .
Note that the full matrices never need to be constructed explicitly. There exist efficient algorithms to perform all the required operations when the matrix is stored as a product of elementary 2×2 factors (for example [3] or [16]). In fact these techniques will be able to handle an arbitrary order in the product with the same complexity (in terms of floating point operations) as for example the Hessenberg or CMV ordering. It will however reduce the programming complexity slightly requiring less bookkeeping, if we use for example a CMV structure. In the case of the CMV matrix, we do not even need a product at all to explicitly write down the matrices involved. Indeed, since the odd and even factors C u on and C u en of the unitarily truncated CMV matrix C u n are just block diagonals with all blocks of size at most 2 × 2 with exception of G n+1 which is diagonal. Hence a pencil like (A n C u * en + C u on , C u * en + A * n C u en ) that will allow to compute the quadrature is extremely simple for numerical computations [7,Theorem 7.3].
We mentioned already in Section 7 that an arbitrary sequence γ did not give rise to new quadrature formulas than already given by the sequence α or the alternating sequence ε. We now find that there is no computational advantage either it is not very rewarding in a practical computation of rational Szegő quadrature to work out all the details in the general case. The CMV matrix obviously has the smallest possible bandwidth; it is conceptually simpler and somewhat easier to implement.
There is however a remarkable linear algebra side effect that became clear from the previous analysis. As far as we know, this was previously unknown in the linear algebra community and we will discussed some preliminary results in the next section.

AMPD matrices
In the previous section, it was recalled from [47] that under the appropriate conditions, the spectrum of the unitary shift T µ can be recovered from the matrix representation ζ A (G α ) with respect to the basis [ϕ α 0 , ϕ α 1 , . . .], whatever the sequence γ is. Now the matrix G α is a product of elementary matrices that differ from the identity only by one 2 × 2 block of the general form (9.1). The only difference between the Hessenberg and the CMV form is the order in which these factors are multiplied.
Something of that form was known for a finite unitary Hessenberg matrix. It can be decomposed as a product of unitary matrices of the type (9.1) and multiplying these factors in any order will not change the spectrum of the matrix (see, e.g., [3]).
Since the zeros of the ORFs are given as the eigenvalues of the matrix ζ An (G n ) where G n is either a finite Hessenberg or a CMV matrix and hence they differ only by the order of the elementary factors, we might expect that if G n is any matrix written as the product of n elementary factors of the form (9.1), then the spectrum of ζ An (G n ) will not change if we change the order of the elementary factors in the product. This is a consequence of the fact that the spectrum of G n will not depend on the order of its elementary factors.
Since this section is partly detached from the ORF theory, the notation used is also local and should not be confused with the same notation used in other sections, unless stated explicitly.
We define an AMPD matrix as a matrix of the form AM + D where A and D are diagonal matrices and M is the product of G-matrices that are matrices that differ from the identity by one 2 × 2 diagonal block like (I 0 is the empty matrix) which is more general than the matrices defined in (9.1).
The purpose is first to show that the spectrum of an AMPD matrix does not depend on the order in which the G-matrices are multiplied. The next proof was provided by Marc Van Barel in private discussion with the first author. Note that D − λI is still a diagonal if D is, so that if we replace D by D − λI in the determinants, we obtain the characteristic polynomials of the matrices AGM + D and AM G + D respectively. Equality of these polynomials implies the equality of their roots, i.e., of the eigenvalues.
In fact the result is more general since we could replace D as well as A by any diagonal matrices that are functions of λ and the determinants would still be the same.
It is interesting to note that A M + D is the same as the matrix M in which the last row and the last column are deleted. This will be used in Theorem 11.3 below.
Example 11.2. If in the previous lemma we set then for arbitrary diagonal matrices A = diag(a 1 , a 2 , a 3 ) and D = diag(d 1 , d 2 , d 3 ) a direct evaluation of the matrices AG 1 G 2 and AG 2 G 1 gives Note that the diagonal elements are the same. If we add D to this diagonal, then with the Laplace rule it is immediately seen that det(AG 1 G 2 + D) = det(AG 2 G 1 + D).
If π = (π 1 , π 2 , . . . , π k ) is a permutation of the numbers (1, 2, . . . , k), then with a notation like i∈π G i where G i ∈ C (n+1)×(n+1) are square matrices, we mean the product from left to right in the order given by π. Thus i∈π G i = G π 1 G π 2 · · · G π k . (11.2) Recall that G i and G j of the form (11.1) commute whenever |i − j| > 1. This means that of the k! possible permutations of the numbers (1, 2, . . . , k), only 2 k−1 different matrices will result when we consider all possible permutations in the product i∈π G i . Indeed, once the order is fixed in which the first k − 1 matrices {G i : i = 1, 2, . . . , k − 1} are multiplied, G k will have to multiply this product either from the right or from the left depending on whether it is to the left or to the right of G k−1 in the product (11.2). This is because G k commutes with all previous G i , with i < k − 1.
Let P N n = [I n 0] ∈ C n×N be the matrix projecting onto C n . We define for any matrix M ∈ C N ×N its truncation to C n×n as P N n M P N * n . The following main theorem states that the determinant and hence the eigenvalues of the truncation of an AMPD matrix does not depend on the order in which its G i factors are multiplied. Theorem 11.3. Let π = (π 1 , π 2 , . . . , π n ) be an arbitrary permutation of (1, 2, . . . , n) and define M π = i∈π G i with G i of the form (11.1), let A and D both be arbitrary diagonal matrices of size n + 1 and finally set P = P n+1 n . Then det(P (AM π + D)P * ) will not depend on π.
Proof . The proof is by induction. For n = 2 we need to consider P (AG 1 G 2 + D)P * and P (AG 2 G 1 + D)P * . It can be checked with the expressions of AG 1 G 2 and AG 2 G 1 from Example 11.2 that det(P (AG 1 G 2 + D)P * ) = det a 1 α 1 + d 1 a 1 α 2 β 1 a 2 γ 1 a 2 α 2 δ 1 + d 2 and det(P (AG 2 G 1 + D)P * ) = det Both expressions give the same determinant.
For the induction step we recall that G n commutes with all G k with k < n − 1 thus i∈π G i equals either G n i∈π G i or i∈π G i G n where π is the same as π after removing the value n.
In the first case with G n at the beginning (see the proof of Lemma 11.1) Therefore det[P (AM π + D)P * ] = det( Aα n M π + D) where we introduced for any matrix F the hat-notation to mean F = P F P * . Note that Aα n M π + D is an AMPD matrix of size n which has only n − 1 factors (α n can be assimilated in A). By induction it is independent of the permutation π . The second case, when G n is at the end of the product, is completely similar. The α n in front of r n−1 will move to c n−1 and we find the same expression. This implies that the determinant is not only independent of π but that we can also insert n anywhere in π and still get the same result, which proves the induction step.
We have as an immediate consequence that the truncation is not essential.
Corollary 11.4. The determinant, and hence also the eigenvalues of an AMPD matrix AM π +D do not depend on the permutation π.
Proof . To prove this for an AMPD matrix of size n with n − 1 factors, consider the AMPD matrix of the previous theorem and assume that G n is just the identity matrix. It is then clear that the truncated AMPD matrix A M π + D is equal to an AMPD matrix of size n with M π = P M π P * with π equal to π in which n is removed. By the previous theorem, its determinant is independent of the permutation π .
As we mentioned earlier, we could have replaced A and D by any two diagonal matrices that are functions of λ and the determinants would still be independent of the permutation π.
RAMPD matrices are rational AMPD matrices. They are the product of an AMPD matrix and an inverse of an AMPD matrix. For example with A, B, C, D all diagonal and M a product of G-factors. Again it can be shown that the spectrum of R is independent of the product order in the matrix M .
Theorem 11.5. Let R = (AM π + C)(BM π + D) −1 be a RAMPD matrix as defined above, with M π = k∈π G k , then the eigenvalues of R do not depend on the permutation π. If BM π + D is not invertible, then a slightly more general formulation is that the pencil (AM π + C, BM π + D) has eigenvalues that do not depend on π.
Proof . This problem can be conveniently solved by reducing it to the AMPD case. Indeed, the eigenvalues of the pencil can be found by finding the solutions of the characteristic polynomial which is the determinant of the matrix It is obvious that this can be rewritten as in which A = A − Bλ and D = C − Dλ are diagonal matrices. By what has been proved above, the determinant of the AMPD matrix A M π + D does not depend on the permutation π.
In the context of ORF, the G k matrices were unitary, thus of the general form (9.1). Note that then M π = k∈π G k and its unitary truncation will be unitary. Conversely any unitary matrix can be written as a unitary truncation of such a product because by a sequence of Givens transformations it can be brought into a unitary Hessenberg form, which can be written as a product of such factors. All these elementary transforms can be merged into a unitary truncation of a product of n + 1 factors G k if the unitary matrix has size n + 1.
The results about the zeros of the para-orthogonal functions suggested that the eigenvalues of the RAMPD matrix are independent of the permutation π, and moreover, if U is the unitary matrix of eigenvectors, then the squared absolute values of the entries on the first row of U are also independent of π. This is because these values give the weights of the quadrature whose nodes are the corresponding eigenvalues and since these quadratures are uniquely defined, the weights computed must be the same.
With the ORF interpretation of the previous section we can be more precise and prove that the following theorem holds.
Theorem 11.6. Let π be a permutation of (1, . . . , n) and M π = k∈π G k be the product of unitary factors G k of the form (9.1) with all δ k ∈ D. Let A be a diagonal matrix of size n + 1 with all entries in D. Let R π be the unitary matrix defined by (11.3) with eigenvalue decomposition R π = V π Λ π V * π with V π the unitary matrix whose columns are the eigenvectors. Then Λ π and the absolute values of all the entries in V π are independent of π.
Proof . First note that it is sufficient to prove this when M π is a unitary truncation of a product of n + 1 unitary factors. As in Corollary 11.4 it will then also hold for an untruncated product of n unitary factors by choosing factor G n+1 to be the identity.
Remark 11.7. The condition |δ k | < 1 in the previous theorem is important because δ k ∈ T can result in a matrix M π that is the direct sum of diagonal blocks. Suppose as an example π = (1, 2, . . . , n) so that M π is Hessenberg, but if δ n ∈ T, then the last column of M π will be [0, 0, . . . , 0, δ k ] * and this will give an eigenvector [0, . . . , 0, 1] * and the first entry is obviously zero, so that the link with ϕ α 0 = 1 cannot be made. In general, diagonal G k factors result in a reducible M π , which means that it is a direct sum of submatrices. This corresponds to a breakdown of the recurrence relation of the ORFs.
Even in the case of orthogonal Laurent polynomials, that is when the diagonal matrix A = 0, the refinement of Theorem 11.6 was not observed in [16]. Without our ORF interpretation, a proof purely on linear algebra arguments seems to be difficult.

Relation with direct and inverse eigenvalue problems
We are convinced that the generalization of the ORFs that was developed in the previous sections can be efficiently used to analyse algorithms that are developed and used to solve direct and inverse eigenvalue problems. A direct problem is: given a matrix, find its eigenvalues and the eigenvectors, which allows for example, as we have explained to compute the quadrature formula, while in an inverse problem, we are given the eigenvalues and the weights of the quadrature and we have to find the matrix. Of course the latter makes only sense if some structure is imposed on the matrix. The problem related to what has been discussed in this paper is the inverse problem for unitary Hessenberg matrices (see [14,Section 8] or much earlier papers like [1,2,28]). The methods described there are restricted to orthogonal polynomials. The more general recursion for orthogonal Laurent polynomials of [16] can be used to construct more general snake shaped matrices, including Hessenberg and CMV matrices. The algorithms to solve these problems apply the recurrence relation for the orthogonal (Laurent) polynomials in reverse order, to descend from degree n to degree 0. These inverse problems relate to discrete least squares approximation. This is described in a linearized form in [41], but proper rational approximation with preassigned poles could be applied when not working with vector polynomials but with ORFs on the unit circle. A proposal is made in [42,43], but the ORFs are different from ours. We explain the difference in Section 12.1 below.
The direct methods apply the recursion in the forward direction. The idea is simple: the power method for a matrix A computes vectors v k = A k v for some vector v = v 0 and k = 0, 1, 2, . . . , q. This will with proper normalization converge to an eigenvector corresponding to the dominant eigenvalue, while the inverse power method computes v −k = A −k v for k = 0, 1, 2, . . . , p and this will approximate the dominant eigenvalue of A −1 which is the smallest eigenvalue of A. For the spaces spanned by {v k } q k=−p one wants to find an orthogonal basis to project the problem on a space that is usually much smaller than the size of the matrix A. In polynomial terms, this means finding a set of orthogonal (Laurent) polynomials spanning the space generated by {z k } q k=−p . In each step of the algorithm either p or q is increased by 1. We are thus in the case considered in [16], i.e., the sequence γ has only entries 0 and ∞. For a unitary matrix, the eigenvalues are all on the unit circle, and then the poles of the Laurent polynomials 0 and ∞ are the points inside and outside the circle that are as far as possible away from the boundary. So one may not expect the best possible approximation of the spectrum when considering a unitary truncation of the Hessenberg or general snake shaped matrix. Therefore it is essential to select poles at strategic places much closer to the circle to put more emphasis on regions where the eigenvalues are more important. The effect of the location of the poles can be observed for example in the numerical experiments reported in [6,7,10].
When the matrix is not unitary, then the underlying theory is for polynomials and ORFs that are orthogonal with respect to a measure supported on the real line or a real interval and convergence has been studied based on the asymptotic behaviour of the ORFs and the approximation error (see for example [5,13]). For the unitary case such a theory is much less developed. One of the reasons may be that the link with the ORFs as explained in this paper is not clear. Part of the reason is that the onset in the literature is slightly different since the Hessenberg matrix is replaced by its inverse and an extra diagonal was introduced. Since a unitary Hessenberg can be written as the product of the elementary G k factors, its inverse is also the product of (inverse) G k factors and therefore computationally as simple as a Hessenberg matrix. The study of the inverse of a Hessenberg has a long history. See the many references in [45,46]. Such an inverse is known to be related to semi-separable matrices. That are matrices in which every rectangular block that can be taken from its lower (or upper) triangular part have a low rank. Several more detailed variants of the definition exist in the literature but a simple variant goes as follows. A matrix is lower (upper-) semi-separable of semi-separability rank r if all submatrices which can be chosen out of the lower (upper) triangular part of the matrix have a rank at most r and rank r is reached for at least one of them. They can be characterized with few (say O(n)) parameters. For a simple example take two vectors v, w ∈ C n and form a matrix A whose upper triangular part (including the diagonal) is equal to the upper triangular part of vw * and whose strictly lower triangular part is equal to the strictly lower triangular part of wv * , then A is a semi-separable matrix of rank 1 (Hermitian, except for the diagonal which may not be real). The vectors v and w are called its generators. The generators and the ranks for upper and lower triangular parts may differ in general and several other more general definitions exist. The inverse of an invertible lower semi-separable matrix of rank 1 is an upper Hessenberg matrix. Any Hermitian matrix is unitary similar to a tridiagonal matrix, and the inverse of a tridiagonal is again a semi-separable matrix. That explains why the literature for unitary and Hermitian eigenvalue problems and inverse eigenvalue problems are involved with semi-separable (plus-diagonal) matrices. The extra diagonal appears because these papers do not rely on the Möbius transform ζ(G) that we used above but they use a less symmetric framework. On the other hand, when the symmetry with respect to the unit circle is lost, the poles can be anywhere in the complex plane except at infinity which means that the important case of (Laurent) polynomials is excluded.
What we have developed in this paper is however flexible enough to give up the symmetry with respect to the unit circle too but without excluding the point at infinity. This is illustrated in the next section. In the subsequent one we shall explain why in the literature the semiseparable-plus-diagonal algorithms are used.

Using the general recursion while dealing with inf inity
Suppose γ = α, then our derivation given in Sections 8 and 9 shows that with H α an upper Hessenberg matrix. It is irreducible because the subdiagonal elements are proportional to 1 − |λ| 2 . If we go through the derivation, then the same arguments used for the sequence α also apply when we do exactly the same steps using the recursion of Theorem 5.1 for the general sequence γ, where we assume for the moment that γ i = ∞ does not appear. Then we shall again arrive at the above relation, except that all α's are replaced by γ's. Thus (assume for simplicity that the whole sequence is regular) and recall that all |γ k | = 1 then Φ(I − Γ * z) H γ = Φ(zI − Γ ) and zI = ( H γ + Γ ) I + Γ * H γ −1 (12.1) where Γ = diag(γ 0 , γ 1 , . . .) and η γ = (I − Γ * Γ ) 1/2 in H γ = η −1 γ H γ η γ which is again upper Hessenberg. An important note is in order here. The previous notation is purely formal. The expressions will involve quantities 1 − |λ k | 2 and 1 − |γ k | 2 that can be negative so that their square roots, for example in the definition of η γ is problematic. However, remember the relation (5.1), which was the consequence of the fact that 1 − |λ n | 2 can only be negative if (1 − |γ n | 2 )(1 − |γ n−1 | 2 ) < 0. Therefore bringing the factor η γ and η −1 γ inside the G-factors will give square roots of positive elements. Just note that 1/ 1 − |γ n−1 | 2 1/ 1 − |γ n | 2 −λ n η n1 1 − |λ n | 2 1 − |λ n | 2 λ n η n1 and all square roots are taken from positive numbers. To keep the link with what was done for the α sequence, we used the previous notation and we shall continue doing that in the sequel. In any case the representation H γ in this setting is a Hessenberg matrix that can be computed in a proper way and we do not need a more general snake shaped matrix. By doing this we loose the possibility to include γ k = ∞. In our previous setting this was not a problem because we reduced everything to the α parameters so that γ k = ∞ was easily dealt with by mapping it to α k = 0 and this fitted seamlessly in the factors I −A * z and zI −A. In the current general setting however a choice γ k = ∞ requires the diagonal element k (z) = 1 − γ k z to be replaced by −z and similarly * k (z) = z − γ k will become −1 which means that some of the entries in the I matrix of (12.1) have to be replaced by zero and the second relation does not hold anymore. To avoid this, we can use the strategy that was used before when switching between the formalism for the α and for the β sequence, except that we now use this strategy to switch between a finite and an infinite entry in the γ sequence.
Without going through all the details, the duality between α and β can be transferred to a duality between γ = {γ 1 , γ 2 , . . .} ⊂ C \ T and the reciprocal sequenceγ = (γ 1 ,γ 2 , . . .) witȟ γ k = 1/γ k , k = 1, 2, . . .. The ORF for the sequence γ we denote as φ n and the ORF for the sequenceγ we denote asφ n . If ν = (ν 1 , ν 2 , . . .) is a sequence that picks ν k = γ k if γ k is finite and ν k =γ k = 0 if γ k = ∞, then, in analogy with Theorem 3.9, with proper normalization, the ORFs for this ν sequence, which are denoted as φ ν k will be given by First note that as long as we are dealing with finite γ k 's we can use the recurrence relation and write the analog of the relation given in Theorem 9.1, just replacing the α's by γ's (which by our convention means to remove the superscripts) and write * n−1 φ n−1 n φ * n = n−1 φ * n−1 n φ n G γ n (12.2) with G γ n = (N γ n ) −1 G γ n (N γ n ), N γ n = 1 − |γ n−1 | 2 −1/2 0 0 1 − |γ n | 2 −1/2 , G γ n = σ n−1 η n1 −λ n η n1 1 − |λ n | 2 1 − |λ n | 2 λ n η n1 1 0 0 σ n . Now, if some γ k = ∞ is involved, we shall switch to theγ sequence, which means that we avoid γ k = ∞ and replace it byγ k = 0. The factorB γ n will thus only pick up a ζ ν k = 1/z each time a γ k = ∞, hence when ν k = 0. where i n = {j : γ j = ∞, 1 ≤ j ≤ n}. Since in the previous relations, as long as γ n = ∞, we havě B n−1 =B n so that it is no problem to write n−1Bn−1 φ n−1 nBn φ * n = n−1Bn−1 φ * n−1 nBn φ n G ν n with G ν n = G γ n . If however γ n = ∞, then n = −z and this z-factor we absorb inB n . This is to say that nBn−1 = * nB n . Hencě Now G ν n is like G γ n but with γ n = ∞ replaced by 0. We can proceed exactly as in the case of the α-β duality, but now apply it to the finite-infinite duality. Thus using ν k = γ k if γ k is finite and ν k = 0 when γ k = ∞ and setting G = η −1 N Gη N , η N = (I − N * N ) 1/2 , N = diag(ν 0 , ν 1 , ν 2 , . . .), (12.4) then the analog of (9.9) becomes (all the γ i 's denoted explicitly as such are finite values and denoted as ∞ when not finite) with c n+1 a constant, I n the identity matrix of size n + 1 and Γ * n = diag(γ 0 , . . . , γ n ). If z is a zero of φ n+1 , then Φ n (I n − Γ * n z)H n = Φ n and this can be rearranged as zΦ n = Φ n I n − H −1 n Γ n withΓ n = (Γ * n ) −1 = diag(γ 0 , . . . ,γ n ) whereγ k = γ k * = 1/γ k . This clearly requires the restrictive conditions that H n and Γ n are invertible, thus γ k = 0 is excluded. It shows that z is and eigenvalue of the matrix (I n − H −1 n )Γ n and that the left eigenvector is [φ 0 (z), . . . , φ n (z)]. By a similar argument if H n is replaced by H u n , a unitary truncation of H then the eigenvalue z will be on T.
This shows thatž is an eigenvalue ofȞ −1 n +Γ n and the corresponding left eigenvector isΦ n (ž) = [φ 0 (ž), . . . ,φ n (ž)]. ThisȞ −1 n +Γ n is the semi-separable-plus-diagonal matrix that we mentioned before. This is the relation given in [43,Theorem 2.7] where it was obtained in the context of an inverse eigenvalue problem. See also [21,42] which also rely on this structure.
The matrixȞ −1 n is a semi-separable matrix, as the inverse of an unreducible Hessenberg matrix, which is as structured as the Hessenberg matrix itself. Indeed, ifȞ n can be written as a product of unitary G k factors, except for the last one which is truncated, then the semiseparable H −1 n is the product of G −1 k in reverse order. Obviously, this has computationally the same complexity.
Since this approach allows only for finite poles, this theory was later extended in [36] which is basically using the technique explained in the previous section to include ∞ in the algorithm. The Hessenberg matrix with bulges below the diagonal is then called an extended Hessenberg matrix and the associated Krylov algorithms are called extended Krylov methods.
A selection of poles in an arbitrary order is implemented in this way to solve inverse and direct eigenvalue problems, QR factorization of unitary (Hessenberg) matrices, rational Krylov type methods, rational Szegő quadrature, and rational approximation problems like in [3,22,26,35,36,37,38]. We hope that future publications will illustrate that with the approach from the first 10 sections of this paper we have provided a more elegant framework to solve all these problems.

Conclusion
We have systematically developed the basics about orthogonal rational functions on the unit circle whose poles can be freely chosen anywhere in the complex plane as long as they are not on the unit circle T = {z ∈ C : |z| = 1}. The traditional cases of all poles outside the closed disk and the so-called balanced situations where the poles alternate between inside and outside the disk are included as special instances. The case where all the poles are inside the disk is somewhat less manageable because it is not so easy to deal with the poles at infinity in this formalism, but it has been included anyway. The link between the ORF with all poles outside, all poles inside, and the general case with arbitrary location of the poles is clearly outlined. It is important that previous attempts to consider the general situation were assuming that once some pole 1/γ is chosen, then γ may not appear anywhere in the sequence of remaining poles, which would for example exclude the balanced situation. We have shown how the classical Christoffel-Darboux formula, the recurrence relation, and the relation to rational Szegő formulas can be generalized. Finally we analyzed the matrix representation of the shift operator with respect to the general basis as a matrix Möbius transform of a generalization of a snake-shape matrix, and how the latter can be factorized as a product of elementary unitary 2 × 2 blocks. It is shown that there is no theoretical or computational advantage to compute rational Szegő quadrature formulas for a general sequence of poles. It is however conceptually slightly simpler to consider the balanced situation, i.e., the generalization of the CMV representation. In the last two sections we have made a link with the linear algebra literature. In Section 11 we have given an illustration by proving some properties of AMPD matrices that are directly inspired by the ORF recursion. A relation with direct and inverse eigenvalue problems for unitary matrices and the related rational Krylov methods is briefly discussed in Section 12 illustrating that the approach given in this paper is more general and hence might therefore be a more appropriate choice.
As a final note we remark that a very similar approach can be taken for ORF on the real line or a real interval in which case the unitary Hessenberg matrices will be replaced by tridiagonal Jacobi matrices and their generalizations in case poles are introduced in arbitrary order taken from anywhere in the complex plane excluding the real line or outside the interval are in order.