Journal of Probability and Statistics
Volume 2012 (2012), Article ID 642403, 15 pages
Research Article

A Two-Stage Penalized Logistic Regression Approach to Case-Control Genome-Wide Association Studies

1Human Genetics, Genome Institute of Singapore, 60 Biopolis, Genome No. 02-01, Singapore 138672
2Department of Statistics and Applied Probability, National University of Singapore, 3 Science Drive 2, Singapore 117546

Received 20 September 2011; Accepted 28 October 2011

Academic Editor: Yongzhao Shao

Copyright © 2012 Jingyuan Zhao and Zehua Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


We propose a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using 𝐿 1 -penalized logistic like-lihoods. In the selection stage, the retained features are ranked by the logistic likelihood with the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li, 2001) and Jeffrey’s Prior penalty (Firth, 1993), a sequence of nested candidate models are formed, and the models are assessed by a family of extended Bayesian information criteria (J. Chen and Z. Chen, 2008). The proposed approach is applied to the analysis of the prostate cancer data of the Cancer Genetic Markers of Susceptibility (CGEMS) project in the National Cancer Institute, USA. Simulation studies are carried out to compare the approach with the pair-wise multiple testing approach (Marchini et al. 2005) and the LASSO-patternsearch algorithm (Shi et al. 2007).