On products of delta distributions and resultants

We prove an identity in integral geometry, showing that if $P_x$ and $Q_x$ are two polynomials, $\int dx \, \delta(P_x) \otimes \delta(Q_x)$ is proportional to $\delta(R)$ where $R$ is the resultant of $P_x$ and $Q_x$.


Introduction
In their classical text [2], Gelfand and Shilov introduce generalized functions localized on smooth submanifolds of R n . A most important example is δ(f ) with f : R n → R a C 1 function whose gradient Grad f vanishes nowhere on f = 0.
In this note, we introduce a variant of the original definition of δ(f ) and extend it to deal with certain cases when f and Grad f have common zeros. A salient feature of this extension is that δ(f ), when defined, is no longer always a distribution, but a positive σ-finite measure.
Our aim is to prove the following combinatorial identity: which is nothing but J with the roles of P x and Q x interchanged. Then: (i) J andJ differ (at most) by a sign, precisely J(u, v) := (−) |A||B|−1J (u, v).
(ii) J is a polynomial in (u, v) ∈ R A∪B (iii) The δ-identity R dx δ(P x (u)) ⊗ δ(Q x (v)) = |J(u, v)| δ(R(u, v)), (1) (involving J as the multiplier) holds, where for each x ∈ R, δ(P x (u)) and δ(Q x (v)) are positive σ-finite measures on R A and R B respectively and δ(R(u, v)) is a positive σ-finite measure on The main point is the combinatorial identity eq.(1) to which we refer as the δ-identity in the sequel. It is grounded in integral geometry but the identity may be useful in the following context, occurring in probability theory or in statistical mechanics.
Consider a random variable X = (x 1 , x 2 , · · · , x p ), taking its values in a compact set Ω ⊂ R p , with a density µ(x 2 , · · · , x p ); consider two r.v. P and Q, that are functions of X and polynomials of x 1 . The joint probability density function (PDF) of P and Q may be written as (which is equivalent to saying that for two smooth test functions ϕ and ψ, E(ϕ(P )ψ(Q)) = ϕ(p)ψ(q)ρ(p, q)dpdq). We rewrite eq.(2) as If the two polynomials P (x) − p and Q(x) − q of the variable x 1 have only real roots, identity eq.(1) enables one to write the x 1 integral in eq.(3) in terms of δ(R), R the resultant of P (x) − p and Q(x) − q in the variable x 1 , and of J, the expression defined in Proposition 1. This situation is encountered in the following particular case of Horn's problem. Consider two 3 × 3 real traceless symmetric matrices A and B, acted upon by conjugation by matrices of SO(3). We take A and B as random variables, independently and uniformly distributed on their orbits characterised by their eigenvalues α = (α 1 , α 2 , α 3 ) and β = (β 1 , β 2 , β 3 ). Then the characteristic polynomial of their sum where R ∈ SO(3) is taken randomly and uniformly according to the normalised Haar measure. Following eq.(2), we write the PDF of P and Q in the form For example, if R is expressed in terms of Euler angles (θ, φ, ψ), a simple calculation shows that P (R) and Q(R) are (degree 2) polynomials in c := cos θ, and This is precisely where identity eq.(1) simplifies things a great deal, reducing the previous integral to a (φ, ψ)-integral of δ(R(φ, ψ))J, where R is the resultant of the two polynomials P − p and Q − q, and J is the Jacobian given above in Proposition 1. One of the two integrations over φ, ψ localizes on the zeros of R belonging to the appropriate interval, and one is left with a single integral over the remaining variable. This method has been used in [1] to analyse the divergences that (somewhat unexpectedly) arise in the PDF ρ(p, q). We refer the reader to that article for further details. That calculation, carried out in the simple case of two degree 2 polynomials, was the source of inspiration for the more general formula eq.(1).
Before we can prove Proposition 1, we need to make sense of the δ functions that appear on each side of eq.(1). This is the goal of the next section where they are defined as positive σ-finite measures.

Preliminaries
Mainly to fix notation, we recall the definition of the δ measure. Definition 1. The measure δ x , the unit mass concentrated at point x ∈ R n is δ x (M ) := 1 x∈M for M an arbitrary Borel subset of R n .

Remark 1.
An immediate consequence is that if x ∈ R n and ϕ : R n → R is any Borel function, then R n δ x ϕ = ϕ(x). Henceforth, if f : R n → R is any Borel function, f δ x = f (x)δ x where the left-hand side is the product of the measure δ x by the function f and the right-hand side the product of the measure δ x by the constant f (x).
The measure δ x is a positive σ-finite measure. Moreover, the measure of any bounded set is finite. Thus δ x also defines a distribution: if ϕ : R n → R is C ∞ with compact support, then δ x , ϕ := ϕ(x), where the bracket can be interpreted as the integral of ϕ against the measure δ x . This is of course the definition of the δ distribution found in any textbook.
The support of δ x is a single point. Our aim is to define positive σ-finite measures whose support is a finite union of affine hyperplanes in R n and which are natural extensions of δ x . We do this in three steps. Our definition parallels closely the construction in [2]. We shall briefly comment on the similarities and differences in Remark 4.
-First step: we define δ(f ) when f : R → R is a regular function. The one dimensional case is mainly for motivation and also facilitates later comparison with the construction in [2].
Definition 3. Let f : R → R be a regular function. The measure δ(f ) is defined by If f : R → R is regular, the number of its zeros in any bounded set is finite (else the set of zeros would accumulate at a finite point, where necessarily f = f ′ = 0) and the derivative is nonzero there. Thus δ(f ) is a positive σ-finite measure such that the measure of any bounded Borel set is finite. Hence δ(f ) can also be seen as a distribution: if ϕ : R → R is C ∞ with compact support, then δ(f ), ϕ := x∈R, f (x)=0 1 |f ′ (x)| ϕ(x), a finite sum in fact (but the number of terms may depend on the support of ϕ).
Remark 2. If θ k : R → R, k ∈ N is a sequence of Borel functions such that the sequence of measures θ k (t) dt converges weakly to δ 0 (that is, if lim k→+∞ R ϕ(t)θ k (t) dt = ϕ(0) for every continuous function ϕ : R → R with compact support), and if f a,x : R → R, t → a(t − x) where x ∈ R and a ∈ R * (the condition under which f a,x is regular), then δ(f a,x ) = 1 |a| δ x is the weak limit of the sequence of measures θ k (f a,x (t))dt = θ k • f a,x (t)dt. This explains the presence of the denominators in the definition of δ(f ) and motivates that δ • f would also be an appropriate notation.
The following simple result plays a crucial role: Lemma 1. If f, g : R → R are regular functions and the loci f = 0 and g = 0 do not intersect, then h = f g is regular and More generally if f 1 , · · · , f k are regular functions with no pairwise common zeros then the product f 1 · · · f k is regular and Proof . For the first formula, the term 1 |f | δ(g) on right-hand side for instance is well defined as the product of a measure δ(g) by a measurable function 1 |f | which is finite in a neighborhood of the support of δ(g). Then the validity of the first formula is a consequence of (f g) ′ (x) = f (x)g ′ (x) when x is a zero of g, and of Remark 1. The second formula can be justified in a similar way or from the first by recursion.
-Second step: we define δ(f ) for non-constant affine functions in arbitrary dimensions.
Definition 4. If f : R n → R is a non-constant affine function then there is a linear form l 1 = 0 on R n such that f −l 1 is constant. One can find n−1 other linear forms l 2 , · · · , l n such that l 1 , · · · , l n form a basis of linear forms on R n . Consequently z : R n → R n , y := (y 1 , · · · , y n ) → z(y) where z := (l 1 (y), · · · , l n (y)) defines a non-singular change of coordinates. If M is a Borel subset of R n , we setM := z −1 (M ) ∩ {0} × R n−1 (by this we mean the elements of z −1 (M ) ⊂ R n whose first coordinate is 0). We define the measure δ(f ) by where Jac(y, z) := det ∂y i ∂z j ij = 1/Jac(z, y) is the standard Jacobian, a constant that could be pulled out of the integral.
The fact that the definition of δ(f ) is intrinsic, i.e. that δ(f )(M ) is independent on how l 1 is completed into a basis of linear forms, is guaranteed by the multiplicativity of the Jacobian determinant.
An immediate consequence of the definition is that if ϕ : R → R is a positive Borel function then If M is a Borel bounded subset of R n , thenM is also bounded, so the positive measure δ(f ) is such that δ(f )(M ) is finite. Hence δ(f ) can also be seen as a distribution: if ϕ : R → R is C ∞ with compact support, then whereφ := ϕ • z −1 is again C ∞ with compact support.
If n = 1 we may write f = f a,x as in Remark 2. A simple computation shows that the definitions of δ(f ) in Definitions 3 and 4 coincide.
If f : R n → R is a non-constant affine function and a ∈ R * then af is a non-constant affine function and δ(af ) = 1 |a| δ(f ). -Third and final step: we define δ(f ) for certain products of non-constant affine functions.
Definition 5. Let H(R n ) be the space of functions f : R n → R that can be written as a product f = f 1 · · · f k of non-constant, pairwise non-proportional, affine functions f 1 , · · · , f k from R n to R.
We say that (f 1 , · · · , f k ) is a factorization of f ∈ H(R n ).
We note that generically f i and f j , i = j have common zeros (an affine subspace of dimension n − 2 unless the 0 loci are parallel hyperplanes) so that {f = 0} is not a smooth submanifold of R n .
Definition 6. Suppose that f ∈ H(R n ) and let (f 1 , · · · , f k ) be a factorization of f . We define with δ(f j ) as in Definition 4.
It is readily seen that the right-hand side in the definition is still a positive measure on R n , and it is σ-finite because { i =j f i = 0} is a finite union of affine subspaces of dimension n − 2 in the hyperplane f j = 0 which is the support of δ(f j ).
We have to check that δ(f ) is defined intrinsically: Lemma 2. For f ∈ H(R n ) , the right-hand side in the definition of δ(f ) does not depend on the factorization of f .
The space H(R) is simply the space of non-constant real polynomials whose roots are real and simple. Thus in dimension 1, the condition that the affine functions in the factorization be non-proportional is equivalent to the condition that f be regular. It is immediate to check that for f ∈ H(R), the definitions of δ(f ) in Definitions 3 and 6 coincide.
If f, g and f g belong to H(R n ), it is immediate to check that which is formally identical to eq.(5). The extension to a product of a finite number of factors leads to a counterpart of eq.(6). In fact, once δ(f ) is defined when f is a non-constant affine function, the definition we have given of δ(f ) for f ∈ H(R n ) could be seen as a formal extension of the identity eq.(6) in this more general context. However, there is one important difference between the two situations: If f ∈ H(R n ), the δ(f )-measure of a compact set is clearly finite if it does not contain common zeros of f i and f j for any i = j, but may be infinite otherwise. Similarly, positive C ∞ functions with compact support may have an infinite integral against δ(f ). As an illustration, take n = 2, f (y) = y 1 y 2 , and ϕ : R 2 → R a C ∞ function, positive with compact support and such that ϕ(0, 0) > 0 (i.e. ϕ does not vanish in a neighborhood of the intersection y 1 = y 2 = 0). Then This is the reason why for a generic f ∈ H(R n ), n > 1, the measure δ(f ) cannot be interpreted as a distribution and distribution theory on R n cannot be the natural setting.
It will be clear from the discussion in Remark 4 below that a positive σ-finite measure δ(f ) can be defined for more general f s, but each of the functions P x , Q x , R involved in eq.(1) is the product of pairwise non-proportional affine functions on some R n , so our construction is enough to make sense of δ(P x ), δ(Q x ), δ(R), and we have achieved the preliminary goal of making sense of both sides of the δ-identity eq.(1).
We apply the definitions to an explicit computation illustrating their naturalness. In what follows, a is always a fixed parameter. Recall that P x (u) := a α∈A (x − u α ), a notation suggesting that x is a (real) parameter and u ∈ R A is the variable. But we could also denote the same expression by P u (x), a notation suggesting that u is a (vector) parameter and x ∈ R is the variable. Finally, we could denote the same expression by P (x, u), a function on R × R A . No matter how we split between parameters and variables, α∈A (x − u α ) is a product of |A| affine functions. We compare δ(P x ), δ(P u ) and δ(P ).
-The function P x is a product of |A| pairwise non-proportional affine functions of u ∈ R A . We may use (u α ′ − xδ α,α ′ ) α ′ ∈A as coordinates near the zero locus of x − u α . We infer from the definitions that, if ϕ is a positive Borel function on R A , This equation (and its counterpart for Q x ) will be the starting point of our proof of the δ-identity eq.(1).
-The polynomial (in x) P u (x) is regular if and only if the u α s are pairwise distinct, which is also the condition under which the affine functions f α (x) := x−u α of x are pairwise non-proportional and all definitions of δ(P u ) lead to for any positive Borel function ϕ on R, where P ′ u denotes the derivative of P u with respect to x. -The function P is a product of |A| pairwise non-proportional affine functions on R × R A . We infer from the definitions that if ϕ is a positive Borel function on that product space The first formula emerges if one uses (x − u α ; (u ′ α ) α ′ ∈A ) while the second formula emerges if one uses (x; (u ′ α − xδ α,α ′ ) α ′ ∈A ) as coordinates on R × R A near the zero locus of x − u α .

Remark 3.
It is clear that all these formulae are pretty much one and the same, though the one for P u involves the restriction that the components of the vector u be pairwise distinct (unless we are prepared to deal with non σ-finite measures).
We finally turn to a comparison of the construction given above with the one in [2]. The following cursory discussion is used nowhere in the sequel and the uninterested reader can jump directly to Section 3. The discussion involves the generalization of the notion of regular function for functions from R n to R.
Definition 7. A function f : R n → R is called regular if f is C 1 and its gradient Grad f vanishes nowhere on f = 0.
Remark 4. In [2], Gelfand and Shilov define δ(f ) (which we denote by δ GS to make later comparison with our definitions easier and avoid confusion) as a generalized function when f : R n → R is an arbitrary regular function. We have already stressed that this is not an automatic consequence of the definition of δ 0 (either as a measure or as a distribution). However (maybe modulo the inclusion of absolute values of Jacobians, see the first point below) the definition chosen by Gelfand and Shilov is the "only" natural one, the one that can be manipulated most intuitively. For instance, it behaves nicely if some variables in f are treated as parameters. We have seen an instance of this in our context when we compared δ(P x ), δ(P u ) and δ(P ).
Another view of the naturalness properties of the definition of δ GS (f ) is that if θ k : R → R, k ∈ N is a sequence of smooth functions converging towards δ 0 in the distribution topology (on R) then the compositions θ k (f ) = θ k • f converge towards δ(f ) in the distribution topology (on R n ). We have seen a trivial counterpart of this in Remark 2. The "approximation of distribution by smooth maps" procedure is in fact the path followed by Hörmander in [3] to define the composition of a general distribution on R m with functions from R n to R m satisfying appropriate conditions (surjectivity of the differential everywhere in Chapter 6, less stringent conditions allowing certain singularities in Section 8.2).
The reader is invited to consult [2] for the detailed definition of δ GS and other distributions localized of f = 0. Coming back to the comparison, one can check the following: -First, in one dimension, δ GS (f ) is the distribution associated to the measure δ(f ) from Definition 3, with one little proviso: Gelfand and Shilov do not include an absolute value for the derivatives in the denominators. More generally, they do not include absolute values for Jacobians because they use the framework of differential forms, whereas densities better fit our needs. When we talk of δ GS in the sequel, we always have in mind that absolute values are included.
-Second, an affine function f : R n → R is regular if and only if it is non-constant, and then δ GS (f ) is the distribution associated to the measure δ(f ) from Definition 4. In fact, the definition of δ GS (f ) for a general regular function follows the same pattern, using localization and replacing the linear change of coordinates by the implicit function theorem. Borrowing the notations from Definition 4, z 1 = f (y), z 2 , · · · , z n become arbitrary local coordinates, the Jacobian is not a constant anymore and the formula for δ GS (f ), ϕ is similar to eq.(7), but with Jac(y, z) replaced by Jac(y, z) z 1 =0 . Using a partition of unity, one may assume that the local coordinates are well-defined in an open set containing the support of ϕ so thatφ = ϕ • z −1 extends as a C ∞ function with compact support. The consistency of the procedure is guaranteed by the general change of variable formula. We refer again to [2] for all details.
-Third, the identity eq.(5) (with δ GS substituted for δ, and interpreted as an identity between distributions) remains true whenever g, f, f g : R n → R are regular. Then the analog of eq.(6) also holds. Thus, if f ∈ H(R n ) is regular, δ GS (f ) is the distribution associated to the measure δ(f ) from Definition 6. It is immediate that an element of H(R n ) is regular if and only if all factors in a factorization have the same linear part but pairwise distinct constant terms. As already noticed, all members of H(R) are regular, but generic members of H(R n ), n > 1, are not. For such members f , the measure δ(f ) assigns an infinite measure to certain bounded Borel subsets of R n , and δ(f ) cannot be interpreted as a distribution.
-Fourth, however, every f ∈ H(R n ) with factorization (f 1 , · · · , f k ) is regular on a dense connected open subset of R n , namely the complement O f of ∪ i =j {f i = f j = 0}. The Gelfand-Shilov construction defines a distribution (which we still denote by δ GS (f )) on O f , which is positive on positive test functions (with support in O f by definition), hence defines a positive σ-finite measure on O f , and on O f eq.(6) again holds. Hence this measure coincides with δ(f ) (as defined in Definition 6) on Borel subsets of O f . Consequently δ(f ) can also be interpreted as an extension of (the positive σ-finite measure on O f associated to) δ GS (f ) to the whole of R n . The ambiguity is a positive σ-finite measure with support in ∪ i =j {f i = f j = 0}, and Definition 6 consists in extending by the zero measure. We shall not try to explore whether some other extensions would preserve the nice properties of the definition of δ(f ), in particular whether some generalized version of eq.(1) could be obtained in that way. Finally we note that extension by the zero measure could be used much more generally to extend a (positive σ-finite) measure defined on some Borel subset of a measurable space to the whole space, as is customary done for the similar construction of extensions for distributions.

Proof of Proposition 1
Proof . To avoid clumsy notation, we write simply P (x) (resp. Q(x)) for P u (x) (resp. Q v (x)) in this proof.
We start with the proof of (i). Recall that The same manipulation oñ But if S = c γ∈C (x − w γ ) is an arbitrary polynomial of degree at least 2 with simple zeros, one has the "well known" identity 1 γ∈C 1 S ′ (wγ ) = 0. For the polynomial S at hand this entails .
To prove (ii) we simply note that the original formula for J is a polynomial in u with rational coefficients in v while the original formula forJ is a polynomial in v with rational coefficients in u. As J andJ differ (at most) by a sign, altogether J is a polynomial in (u, v).
We turn finally to the proof of (iii). We rewrite eq.(9) in a number of more compact forms, each of which we shall use freely, namely In the first line, δ x is the one-dimensional δ function for a unit mass at point x along the coordinate axis α, while in the second line δ(u α − x) is a distribution on R A , namely δ(f ) for the affine function f : To get the second equality in each line, we have used that a α ′ =α (x − u α ′ ) and a α ′ =α (u α − u α ′ ) = P ′ u (u α ) are equal on the support of δ x (u α ) or of δ(u α − x). Now δ(P x )δ(Q x ), which could be rewritten more carefully as δ(P x ) ⊗ δ(Q x ), is well defined as the product of two measures, δ(P x ) on R A and δ(Q x ) on R B . When P and Q are monic of degree 1, holds for every positive Borel function ϕ on R 2 , and the very definition of δ(u − v) as a positive measure on R 2 ensures that dx δ x (u)δ x (v) = δ(u − v). Thus (iii) holds in this special case, and we shall use it to prove the general case. Namely we translate the special case where δ(u α − v β ) is to be interpreted as the measure in the plane (u α , v β ) ∈ R 2 , into where now δ(u α − x) is interpreted as the measure δ(f ) on R A for f (u) = u α − x, δ(v β − x) is interpreted as the measure δ(g) on R B for g(v) = v β − x, and δ(u α − v β ) as the measure δ(h) on R A∪B for h(u, v) = u α − v β . Thus where in the last line we have used that 1 P ′ (uα)Q ′ (v β ) and 1 P ′ (v β )Q ′ (uα) coincide on the support Σ α,β of the measure δ(u α − v β ). On the other hand, using Definition 6 for R(u, v) we get where Observe that, on Σ α,β , so that, again on Σ α,β , Fix α ∈ A and β ∈ B. If β ′ = β then β ′′ =β ′ α ′′ ∈A (v β ′′ − u α ′′ ) contains a factor v β − u α . Thus, if v β = u α , all terms but the one corresponding to β ′ = β in the sum defining J(u, v) vanish. Hence, on Σ α,β , we have Using eq.(15), we obtain that, on Σ α,β , Comparison of eqs.(13) and (14) establishes the validity of (iii).
Remark 5. The explicit expression of the polynomial J(u, v) is quite complicated in general. It is rather simple in the case when Q is of degree 2. Then, writing which is a simple divided difference, hence is obviously polynomial in (u, v). Even the case when P and Q are of degree 3 leads to a prohibitively complicated explicit expression for J(u, v) as a polynomial.
Remark 6. It is a simple consequence of eq.(12) that integration over x does not introduce unexpected singularities: the measure of a compact set K in R A∪B under dx δ(P x )δ(Q x ) is finite if K does not meet any hyperplane u α = u α ′ , α = α ′ or v β = v β ′ , β = β ′ .