Departement of Human Development and Family Studies, The Pennsylvania State University, University Park, PA 16802, USA
Academic Editor: Junbin B. Gao
Copyright © 2010 Beau Abar and Eric Loken. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This study examined a historical mixture model approach to the evaluation of ratings made in “gold standard” and two-rater contingency tables. Peirce's and the derived average were discussed in relation to a widely used index of reliability in the behavioral sciences, Cohen's . Sample size, population base rate of occurrence, the true “science of the method”, and guessing rates were manipulated across simulations. In “gold standard” situations, Peirce's tended to recover the true reliability of ratings as well as better than . In two-rater situations, ave tended to recover the true reliability as well as better than in most situations. The empirical utility and potential theoretical benefits of mixture model methods in estimating reliability are discussed, as are the associations between the statistics and other modern mixture model approaches.