Distinguishing true from false positives in genomic studies: p values

Broer, Linda; Lill, Christina M.; Schuur, Maaike; Amin, Najaf; Roehr, Johannes T.; Bertram, Lars; Ioannidis, John P. A.; van Duijn, Cornelia M.

doi:10.1007/s10654-012-9755-x

Distinguishing true from false positives in genomic studies: p values

METHODS
Published: 01 February 2013

Volume 28, pages 131–138, (2013)
Cite this article

European Journal of Epidemiology Aims and scope Submit manuscript

Linda Broer¹,
Christina M. Lill^2,3,
Maaike Schuur⁴,
Najaf Amin¹,
Johannes T. Roehr²,
Lars Bertram²,
John P. A. Ioannidis^5,6,7,8 &
…
Cornelia M. van Duijn¹

1103 Accesses
31 Citations
17 Altmetric
1 Mention
Explore all metrics

Abstract

Distinguishing true from false positive findings is a major challenge in human genetic epidemiology. Several strategies have been devised to facilitate this, including the positive predictive value (PPV) and a set of epidemiological criteria, known as the “Venice” criteria. The PPV measures the probability of a true association, given a statistically significant finding, while the Venice criteria grade the credibility based on the amount of evidence, consistency of replication and protection from bias. A vast majority of journals use significance thresholds to identify the true positive findings. We studied the effect of p value thresholds on the PPV and used the PPV and Venice criteria to define usable thresholds of statistical significance. Theoretical and empirical analyses of data published on AlzGene show that at a nominal p value threshold of 0.05 most “positive” findings will turn out to be false if the prior probability of association is below 0.10 even if the statistical power of the study is higher than 0.80. However, in underpowered studies (0.25) with a low prior probability of 1 × 10⁻³, a p value of 1 × 10⁻⁵ yields a high PPV (>96 %). Here we have shown that the p value threshold of 1 × 10⁻⁵ gives a very strong evidence of association in almost all studies. However, in the case of a very high prior probability of association (0.50) a p value threshold of 0.05 may be sufficient, while for studies with very low prior probability of association (1 × 10⁻⁴; genome-wide association studies for instance) 1 × 10⁻⁷ may serve as a useful threshold to declare significance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Relative performance of gene- and pathway-level methods as secondary analyses for genome-wide association studies

Article Open access 08 April 2015

Genevieve L Wojcik, WH Linda Kao & Priya Duggal

Robust meta-analysis for large-scale genomic experiments based on an empirical approach

Article Open access 10 February 2022

Sinjini Sikdar

Multiple Testing in Large-Scale Genetic Studies

References

Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. A comprehensive review of genetic association studies. Genet Med. 2002;4(2):45–61.
Article PubMed CAS Google Scholar
Abou-Sleiman PM, Hanna MG, Wood NW. Genetic association studies of complex neurological diseases. J Neurol Neurosurg Psychiatr. 2006;77(12):1302–4. doi:10.1136/jnnp.2005.082024.
Article PubMed CAS Google Scholar
Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124. doi:10.1371/journal.pmed.0020124.
Article PubMed Google Scholar
Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst. 2004;96(6):434–42.
Article PubMed Google Scholar
Weitkunat R, Kaelin E, Vuillaume G, Kallischnigg G. Effectiveness of strategies to increase the validity of findings from association studies: size versus replication. BMC Med Res Methodol. 2010;10:47. doi:10.1186/1471-2288-10-47.
Article PubMed Google Scholar
Lucke JF. A critique of the false-positive report probability. Genet Epidemiol. 2009;33(2):145–50. doi:10.1002/gepi.20363.
Article PubMed Google Scholar
Matullo G, Berwick M, Vineis P. Gene-environment interactions: how many false positives? J Natl Cancer Inst. 2005;97(8):550–1. doi:10.1093/jnci/dji122.
Article PubMed Google Scholar
Rebbeck TR, Ambrosone CB, Bell DA, Chanock SJ, Hayes RB, Kadlubar FF, et al. SNPs, haplotypes, and cancer: applications in molecular epidemiology. Cancer Epidemiol Biomark Prev. 2004;13(5):681–7.
CAS Google Scholar
Thomas DC, Clayton DG. Betting odds and genetic associations. J Natl Cancer Inst. 2004;96(6):421–3.
Article PubMed Google Scholar
Moonesinghe R, Khoury MJ, Janssens AC. Most published research findings are false-but a little replication goes a long way. PLoS Med. 2007;4(2):e28. doi:10.1371/journal.pmed.0040028.
Article PubMed Google Scholar
Ioannidis JP, Boffetta P, Little J, O’Brien TR, Uitterlinden AG, Vineis P, et al. Assessment of cumulative evidence on genetic associations: interim guidelines. Int J Epidemiol. 2008;37(1):120–32. doi:10.1093/ije/dym159.
Article PubMed Google Scholar
International HapMap C. The international HapMap project. Nature. 2003;426(6968):789–96. doi:10.1038/nature02168nature02168.
Article Google Scholar
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical. J R Statist Soc B. 1995;57(1):289–300.
Google Scholar
Gordon A, Glazko G, Qiu X, Yakovlev A. Control of the mean number of false discoveries, Bonferroni and stability of multiple testing. Ann Appl Stat. 2007;1:179–90.
Article Google Scholar
Khoury MJ, Bertram L, Boffetta P, Butterworth AS, Chanock SJ, Dolan SM, et al. Genome-wide association studies, field synopses, and the development of the knowledge base on genetic variation and human diseases. Am J Epidemiol. 2009;170(3):269–79. doi:10.1093/aje/kwp119.
Article PubMed Google Scholar
Ridley J, Kolm N, Freckelton RP, Gage MJ. An unexpected influence of widely used significance thresholds on the distribution of reported p values. J Evol Biol. 2007;20(3):1082–9. doi:10.1111/j.1420-9101.2006.01291.x.
Article PubMed CAS Google Scholar
Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE. Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet. 2007;39(1):17–23. doi:10.1038/ng1934.
Article PubMed CAS Google Scholar
Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60. doi:10.1136/bmj.327.7414.557327/7414/557.
Article PubMed Google Scholar
DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–88. doi:0197-2456(86)90046-2.
Article PubMed CAS Google Scholar
Harbord RM, Egger M, Sterne JA. A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Stat Med. 2006;25(20):3443–57. doi:10.1002/sim.2380.
Article PubMed Google Scholar
Kavvoura FK, McQueen MB, Khoury MJ, Tanzi RE, Bertram L, Ioannidis JP. Evaluation of the potential excess of statistically significant findings in published genetic association studies: application to Alzheimer’s disease. Am J Epidemiol. 2008;168(8):855–65. doi:10.1093/aje/kwn206.
Article PubMed Google Scholar
Ioannidis JP. Calibration of credibility of agnostic genome-wide associations. Am J Med Genet B Neuropsychiatr Genet. 2008;147B(6):964–72. doi:10.1002/ajmg.b.30721.
Article PubMed Google Scholar
Ioannidis JP, Tarone R, McLaughlin JK. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology. 2011;22(4):450–6. doi:10.1097/EDE.0b013e31821b506e.
Article PubMed Google Scholar
Rothman KJ. Epidemiology: an introduction. 1st ed. Oxford: Oxford University Press; 2002.
Google Scholar
Panagiotou OA, Ioannidis JP, Genome-Wide Significance Project. What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int J Epidemiol. 2011;. doi:10.1093/ije/dyr178.
PubMed Google Scholar
Biotechnology Kaiser J. Researcher, two universities sued over validity of prostate cancer test. Science. 2009;325(5947):1484. doi:10.1126/science.325_1484.
Article Google Scholar
van Duijn CM. STROBE-ME too! Eur J Epidemiol. 2011;26(10):761–2. doi:10.1007/s10654-011-9628-8.
Article PubMed Google Scholar
Sanna S, Li B, Mulas A, Sidore C, Kang HM, Jackson AU, et al. Fine mapping of five Loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability. PLoS Genet. 2011;7(7):e1002198. doi:10.1371/journal.pgen.1002198PGENETICS-D-11-00557.
Article PubMed CAS Google Scholar
Boseley S. Six men in intensive care after drug trial goes wrong. The Guardian. 2006.
Farrer LA, Cupples LA, Haines JL, Hyman B, Kukull WA, Mayeux R, et al. Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis. APOE and Alzheimer disease meta analysis consortium. JAMA. 1997;278(16):1349–56.
Article PubMed CAS Google Scholar

Download references

Acknowledgments

The study was supported by grants from the Centre for Medical Systems Biology (CMSB) and ENGAGE Consortium, grant agreement HEALTH-F4-2007-201413. The genetics databases used for this project have been made possible by the kind support of the Cure Alzheimer’s Fund (CAF), the Michael J. Fox Foundation (MJFF) for Parkinson’s Research, the National Alliance for Research on Schizophrenia and Depression (NARSAD), Prize4Life, and EMD Serono. C.M.L. is supported by the Fidelity Biosciences Research Initiative. L.Bertram is supported by the German Ministry for Education and Research (BMBF).

Conflict of interest

The authors declare no competing interests.

Author information

Authors and Affiliations

Department of Epidemiology, Erasmus University Medical Center Rotterdam, PO Box 2040, 3000 CA, Rotterdam, The Netherlands
Linda Broer, Najaf Amin & Cornelia M. van Duijn
Neuropsychiatric Genetics Group, Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
Christina M. Lill, Johannes T. Roehr & Lars Bertram
Department of Neurology, University Medical Center of the Johannes Gutenberg-University, Mainz, Germany
Christina M. Lill
Department of Neurology, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
Maaike Schuur
Department of Medicine, Stanford Prevention Research Center, Stanford, CA, USA
John P. A. Ioannidis
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA
John P. A. Ioannidis
Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, CA, USA
John P. A. Ioannidis
Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece
John P. A. Ioannidis

Authors

Linda Broer
View author publications
You can also search for this author in PubMed Google Scholar
Christina M. Lill
View author publications
You can also search for this author in PubMed Google Scholar
Maaike Schuur
View author publications
You can also search for this author in PubMed Google Scholar
Najaf Amin
View author publications
You can also search for this author in PubMed Google Scholar
Johannes T. Roehr
View author publications
You can also search for this author in PubMed Google Scholar
Lars Bertram
View author publications
You can also search for this author in PubMed Google Scholar
John P. A. Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar
Cornelia M. van Duijn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cornelia M. van Duijn.

Additional information

Linda Broer, Christina M. Lill, Maaike Schuur are shared first author.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 364 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Broer, L., Lill, C.M., Schuur, M. et al. Distinguishing true from false positives in genomic studies: p values. Eur J Epidemiol 28, 131–138 (2013). https://doi.org/10.1007/s10654-012-9755-x

Download citation

Received: 01 May 2012
Accepted: 11 December 2012
Published: 01 February 2013
Issue Date: February 2013
DOI: https://doi.org/10.1007/s10654-012-9755-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Distinguishing true from false positives in genomic studies: p values

Abstract

Access this article

Similar content being viewed by others

Relative performance of gene- and pathway-level methods as secondary analyses for genome-wide association studies

Robust meta-analysis for large-scale genomic experiments based on an empirical approach

Multiple Testing in Large-Scale Genetic Studies

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (DOC 364 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distinguishing true from false positives in genomic studies: p values

Abstract

Access this article

Similar content being viewed by others

Relative performance of gene- and pathway-level methods as secondary analyses for genome-wide association studies

Robust meta-analysis for large-scale genomic experiments based on an empirical approach

Multiple Testing in Large-Scale Genetic Studies

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (DOC 364 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation