On the concentration of the missing mass

Daniel Berend (Ben-Gurion University)
Aryeh Kontorovich (Ben-Gurion University)


A random variable is sampled from a discrete distribution. The missing mass is the probability of the set of points not observed in the sample. We sharpen and simplify McAllester and Ortiz's results (JMLR, 2003) bounding the probability of large deviations of the missing mass. Along the way, we refine and rigorously prove a fundamental inequality of Kearns and Saul (UAI, 1998).

Pages: 1-7

Publication Date: January 9, 2013

DOI: 10.1214/ECP.v18-2359


