Asymptotics for the number of blocks in a conditional Ewens-Pitman sampling model

Stefano Favaro (University of Torino and Collegio Carlo Alberto)
Shui Feng (McMaster University)

Abstract


The study of random partitions has been an active research area in probability over the last twenty years. A quantity that has attracted a lot of attention is the number of blocks in the random partition. Depending on the area of applications this quantity could represent the number of species in a sample from a population of individuals or he number of cycles in a random permutation, etc. In the context of Bayesian nonparametric inference such a quantity is associated with the exchangeable random partition induced by sampling from certain prior models, for instance the Dirichlet process and the two parameter Poisson-Dirichlet process. In this paper we generalize some existing asymptotic results from this prior setting to the so-called posterior, or conditional, setting. Specifically, given an initial sample from a two parameter Poisson-Dirichlet process, we establish conditional fluctuation limits and conditional large deviation principles for the number of blocks generated by a large additional sample.

Full Text: Download PDF | View PDF online (requires PDF plugin)

Pages: 1-15

Publication Date: February 18, 2014

DOI: 10.1214/EJP.v19-2881

References

  • Arratia, Richard; Barbour, A. D.; Tavaré, Simon. Logarithmic combinatorial structures: a probabilistic approach. EMS Monographs in Mathematics. European Mathematical Society (EMS), Zurich, 2003. xii+363 pp. ISBN: 3-03719-000-0 MR2032426
  • Bacallado, Sergio; Favaro, Stefano; and Trippa, Lorenzo (2013). Looking-backward probabilities for Gibbs-type exchangeable random partitions. Bernoulli, to appear.
  • Charalambides, Charalambos A. Combinatorial methods in discrete distributions. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2005. xiv+415 pp. ISBN: 0-471-68027-3 MR2131068
  • Dembo, Amir; Zeitouni, Ofer. Large deviations techniques and applications. Second edition. Applications of Mathematics (New York), 38. Springer-Verlag, New York, 1998. xvi+396 pp. ISBN: 0-387-98406-2 MR1619036
  • Ewens, W. J. The sampling theory of selectively neutral alleles. Theoret. Population Biology 3 (1972), 87--112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376. MR0325177
  • Favaro, Stefano; Lijoi, Antonio; Mena, Ramsés H.; Pruenster, Igor. Bayesian non-parametric inference for species variety with a two-parameter Poisson-Dirichlet process prior. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 (2009), no. 5, 993--1008. MR2750254
  • Favaro, Stefano; Lijoi, Antonio; Pruenster, Igor. Conditional formulae for Gibbs-type exchangeable random partitions. Ann. Appl. Probab. 23 (2013), no. 5, 1721--1754. MR3114915
  • Feng, Shui. Large deviations associated with Poisson-Dirichlet distribution and Ewens sampling formula. Ann. Appl. Probab. 17 (2007), no. 5-6, 1570--1595. MR2358634
  • Feng, Shui. The Poisson-Dirichlet distribution and related topics. Models and asymptotic behaviors. Probability and its Applications (New York). Springer, Heidelberg, 2010. xiv+218 pp. ISBN: 978-3-642-11193-8 MR2663265
  • Feng, Shui; Hoppe, Fred M. Large deviation principles for some random combinatorial structures in population genetics and Brownian motion. Ann. Appl. Probab. 8 (1998), no. 4, 975--994. MR1661315
  • Ferguson, Thomas S. A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 (1973), 209--230. MR0350949
  • Flajolet, Philippe; Dumas, Philippe; Puyhaubert, Vincent. Some exactly solvable models of urn process theory. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 59--118, Discrete Math. Theor. Comput. Sci. Proc., AG, Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2006. MR2509623
  • Gnedin, A.; Pitman, J. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 (2005), Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 12, 83--102, 244--245; translation in J. Math. Sci. (N. Y.) 138 (2006), no. 3, 5674--5685 MR2160320
  • Griffiths, Robert C.; Spanò, Dario. Record indices and age-ordered frequencies in exchangeable Gibbs partitions. Electron. J. Probab. 12 (2007), 1101--1130. MR2336601
  • Hoppe, Fred M. Pòlya-like urns and the Ewens' sampling formula. J. Math. Biol. 20 (1984), no. 1, 91--94. MR0758915
  • Janson, Svante. Limit theorems for triangular urn schemes. Probab. Theory Related Fields 134 (2006), no. 3, 417--452. MR2226887
  • Korwar, Ramesh M.; Hollander, Myles. Contributions to the theory of Dirichlet processes. Ann. Probability 1 (1973), 705--711. MR0350950
  • Lijoi, Antonio; Mena, Ramsés H.; Pruenster, Igor. Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 (2007), no. 4, 769--786. MR2416792
  • Lijoi, Antonio; Pruenster, Igor; Walker, Stephen G. Bayesian nonparametric estimators derived from conditional Gibbs structures. Ann. Appl. Probab. 18 (2008), no. 4, 1519--1547. MR2434179
  • Perman, Mihael; Pitman, Jim; Yor, Marc. Size-biased sampling of Poisson point processes and excursions. Probab. Theory Related Fields 92 (1992), no. 1, 21--39. MR1156448
  • Pitman, Jim. Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 (1995), no. 2, 145--158. MR1337249
  • Pitman, Jim. Partition structures derived from Brownian motion and stable subordinators. Bernoulli 3 (1997), no. 1, 79--96. MR1466546
  • Pitman, Jim; Yor, Marc. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 (1997), no. 2, 855--900. MR1434129
  • Pitman, Jim. Poisson-Kingman partitions. Statistics and science: a Festschrift for Terry Speed, 1--34, IMS Lecture Notes Monogr. Ser., 40, Inst. Math. Statist., Beachwood, OH, 2003. MR2004330
  • Pitman, J. Combinatorial stochastic processes. Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour, July 7–24, 2002. With a foreword by Jean Picard. Lecture Notes in Mathematics, 1875. Springer-Verlag, Berlin, 2006. x+256 pp. ISBN: 978-3-540-30990-1; 3-540-30990-X MR2245368
  • Tavaré, Simon. The birth process with immigration, and the genealogical structure of large populations. J. Math. Biol. 25 (1987), no. 2, 161--168. MR0896431
  • Zabell, S.L. (1997). The continuum of inductive methods revisited. In The cosmos of science: essays of exploration, Earman, J. and Norton, J.D. University of Pittsburgh Press.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.