Abstract
Background Several genetic variants are associated with the risk of idiopathic pulmonary fibrosis (IPF). These have not been systematically reviewed.
Methods We searched the PubMed, Embase and GWAS Catalog databases for studies indexed between inception and 15 January 2024 describing genetic variants associated with IPF susceptibility. We included studies describing common associated single nucleotide polymorphisms (SNPs). We excluded studies describing rare variants, non-SNP variants and those without an allelic model analysis. We recorded study type, participant characteristics, genotyping methods, IPF diagnostic criteria, the SNPs and the respective genes, odds ratios, and other details. We also searched databases for functions of the identified genes.
Results The primary search retrieved 2697 publications; we included 42 studies. There were nine genome-wide association/linkage studies, while 27 were candidate gene studies. The studies included 22–11 160 IPF subjects. 88 SNPs in 58 genes or loci were found associated with IPF susceptibility. MUC5B rs35705950 was the most studied SNP. Most (n=51) SNPs were in the intronic or intergenic regions; only 11 were coding sequence variants. The SNPs had odds ratios ranging from 0.27 to 7.82 for an association with IPF. Only 22 SNPs had moderate–large effects (OR >1.5 or <0.67). Only 49.1% of the associated genes have a known functional role in IPF; the role of G protein-related signalling and transcriptional regulation (zinc-finger proteins) remain unexplored.
Conclusion Several common SNPs in over 50 genes have been found associated with IPF susceptibility. These variants may inform gene panels for future studies (PROSPERO CRD42023408912).
Shareable abstract
Our systematic review suggests that several common single nucleotide polymorphisms in over 50 genes have been found associated with susceptibility to idiopathic pulmonary fibrosis. The functional role of more than half of them remain unexplored. https://bit.ly/3VTqxkC
Introduction
Idiopathic pulmonary fibrosis (IPF) is a progressive lung disorder caused by aberrant wound repair mechanisms following epithelial injury [1]. About a third of the risk of IPF may be attributable to common genetic variants [2, 3]. It is proposed that IPF is highly polygenic with potentially hundreds of unknown variants associated with disease susceptibility [4]. Genetic association studies are crucial as they not only improve our understanding of disease pathogenesis, but drug development based on genetically supported pathways is also more likely to succeed [4].
Candidate gene studies have found association of some of the germline genetic variants in IPF [5–7]. In the past decade, genome-wide association studies (GWASs) have discovered several other variants using a hypothesis-free approach [2, 4, 8]. Genes related to airway mucin production, surfactant biology, telomere maintenance, cell–cell adhesion, cell cycle regulation, Toll-like receptor signalling and cytokine signalling have been linked to IPF [9]. While the downstream mechanisms of some genetic variants have been elucidated, several others remain unexplored. Even the known variants have been explored in a few ethnic groups only. There is a need to expand new variant discovery studies as well as confirmation studies to other populations.
There is no published systematic review available on the known genetic variants implicated in IPF. We felt an unmet need to compile a list of variants associated with IPF to act as a resource for informing future research studies. A panel of “hotspots” constructed from such a compendium might provide a starting point for genetic studies in resource-constrained settings. Herein, we present a systematic review of studies that have described an association of common single nucleotide polymorphisms (SNPs) with IPF occurrence.
Methods
This systematic review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) collaboration 2020 recommendations. The registered protocol is available through PROSPERO (CRD42023408912).
Search strategy
We searched the PubMed, Embase and GWAS Catalog databases since inception to March 2023 for studies reporting associations between SNPs and IPF. We revised the search results on 15 January 2024. We used the following search string: (“idiopathic pulmonary fibrosis” OR “familial interstitial pneumonia” OR “familial pulmonary fibrosis”) AND (mutation OR polymorphism OR variant OR “single nucleotide polymorphism” OR sequencing) AND (gene OR genetic OR genome OR exome). We did not apply any restrictions. We also searched the reference lists of the selected articles and our personal files for any other relevant studies.
Study selection
Two authors (S.D. and R.S.) screened the titles and abstracts of all studies. Studies were selected for full-text evaluation if they were primary articles investigating an association between IPF and an SNP. We resolved disagreements between reviewers by discussion. Candidate gene studies and GWASs with a control group that reported association(s) of a common or low-frequency SNP (minor allele frequency (MAF) ≥1% in the general population) with IPF were included. Both discovery and validation studies were eligible. We excluded 1) non-English articles, editorials, conference abstracts and proceedings, preprints, letters and correspondence, case reports and review articles, 2) articles or information from articles on rare variants (MAF <1% in the general or control population), 3) articles or information from articles on genetic variants other than SNPs, and 4) articles in which an allelic model had not been studied or presented. The study outcome was either an increased or decreased risk of IPF in subjects with the variant allele relative to controls expressed as an odds ratio with 95% confidence intervals and the level of significance.
Data extraction and synthesis
The information from selected articles was entered into an MS Excel spreadsheet. For each study, the following data were extracted: study type, country, the IPF subject cohort included, ethnicity or ancestry, number of cases and controls, the variables matched between cases and controls, genotyping method, gene name, dbSNP ID (also called rsID, a unique identifier assigned to a SNP submitted to the SNP database (dbSNP) of the National Center for Biotechnology Information (NCBI)), chromosome location, effect allele, other allele, minor or effect allele frequency in the cases and controls, OR (95% CI), p-value, and statistical adjustments (such as Bonferroni correction). If both discovery and validation cohorts were present and a meta-analysis had been performed, we reported the results of the meta-analysis. We preferred results that were best adjusted for various factors such as the number of signals studied. We applied a standard p-value of <0.05 for single candidate gene studies. We have reported the significance level with Bonferroni correction for candidate gene studies analysing more than one SNP, even if it was not reported by the study. We derived the Bonferroni-corrected significance level by dividing 0.05 by the number of SNPs tested. For GWASs, in general, a p-value <5×10−8 was used, unless specified otherwise.
Study quality
We assessed study quality using the quality of genetic association studies (Q-Genie) tool [10]. The tool has 11 domains, each domain having a maximum score of 7. Higher scores represent better study quality. We considered a study to be of good quality if its Q-Genie score was >45, moderate quality if the score was 36–45 and poor quality if the score was 35 or less.
Secondary searches
We searched for the reported SNPs on the SNP database of the NCBI of the National Library of Medicine. We recorded the chromosome locations based on the Genome Reference Consortium Human Build 38 (GRCh38). We also recorded the functional consequences. We searched the Gene database of the NCBI and the Human Gene Database (www.genecards.org) for the known and predicted functions of the genes with SNPs associated with IPF in the included studies. We also searched the PubMed and Google Scholar databases with terms containing the gene name AND “pulmonary fibrosis” to find studies describing the mechanisms by which the respective genes contributed to the pathogenesis of pulmonary fibrosis.
Results
The primary search retrieved 2697 studies (figure 1). We assessed 126 publications in detail and included 42 for our analysis [2, 4–8, 11–46]. We have presented studies that found no association with IPF in table S1. We did not extract further detailed information from these studies.
Demographics and IPF diagnosis
Most (26/42) studies were from North America and Europe while nine studies were from Asia (table 1). 19 studies included only Caucasian subjects (table S2). There was substantial overlap between the IPF patient cohorts used in some studies. The cohorts from the University of Chicago, University of Pittsburgh, National Jewish Hospital, University of California San Francisco, the Lung Tissue Research Consortium and several cohorts from IPF drug trials have been frequently utilised in the included studies [2, 11, 12, 14, 23, 26]. Nearly all studies used the contemporaneous American Thoracic Society/European Respiratory Society criteria for IPF diagnosis [1, 47–51]. Partanen et al. [41] used various diagnostic criteria including physician diagnosis, self-reports, the International Classification of Diseases and others for IPF diagnosis.
Study quality
Most (74%) studies were of moderate-to-good quality (table S3). The mean Q-Genie score for GWASs was 55, while that for candidate genes studies was 40 (p<0.001). About 82% GWASs were of good quality compared to only 26% of the candidate gene studies (p<0.001).
Study subjects
The studies included 22–11 160 IPF subjects; the number of control subjects ranged from 50 to nearly 1.3 million [5, 33, 41]. 11 studies specified whether they included familial or sporadic IPF subjects; eight of these included both familial and sporadic cases. 14 studies included controls matched to the cases by age, sex or smoking, while 20 studies used unmatched controls.
Study methods
All the studies used a case–control design. There were nine GWASs; two other studies presented an imputation or meta-analysis from an earlier GWAS (table 1) [2, 4, 23, 38]. There were 27 candidate gene studies. The GWASs used array-based genotyping or whole genome sequencing. Candidate gene studies used varied genotyping methods; the TaqMan SNP genotyping assay was the most common. Candidate gene studies tested 1–20 SNPs (table S2). 18 studies appropriately used statistical correction for multiple hypothesis testing. Nine studies performed multiple hypothesis testing but did not correct statistically for it.
Genes and SNPs associated with IPF
Overall, the studies reported the association of 88 SNPs with IPF (table 2) in 58 genes or loci (53 genes, five loci). The most frequently described variant (27 studies) was the rs35705950 SNP in the MUC5B gene (table S4), followed by rs2076295 in the desmoplakin (DSP) gene. Several loci in the genes MUC2, MUC5AC and TOLLIP lost significant association with IPF when conditioning on the MUC5B rs35705950 was performed [4]. Most (n=51) SNPs found in the included studies were in the intronic or intergenic regions; only 11 SNPs were in the coding sequence of a gene (table S5). The reported SNPs had ORs ranging from 0.27 to 7.82 for an association with IPF; the highest OR was reported for the HECTD2 rs537322302 SNP followed by the MUC5B rs35705950 SNP (table 2). Only 22 SNPs had ORs of >1.5 or <0.67.
Gene functions
13 genes with identified variants potentially affected fibroblast biology and extracellular matrix (ECM) synthesis or breakdown, while eight were linked to cell replication and cell cycle control (table 3, figure 2). Five each were linked to inflammation-related pathways and telomere maintenance. Potential pathways affected by the implicated genes that remain little explored include G protein-related signalling pathways (four genes) and the regulation of gene transcription and expression (three genes). About 11 genes affect other cellular processes and pathways; their role in IPF remains unknown. Overall, we found mechanisms described in the available literature, in the context of their contribution to pulmonary fibrosis for only 26 (49.1%) of the 53 genes with associated SNPs (table 3).
Discussion
We found that SNPs in over 50 genes or regions have been linked to IPF in genomic DNA studies. Most genetic variants were in the noncoding regions of the genome. The mechanisms by which the implicated genes contribute to pulmonary fibrosis have been elucidated in less than half of them.
Most variants reported here have been discovered in GWASs using a “hypothesis-free” approach. Only 22 of the 88 SNPs had odds ratios of >1.5 or <0.67 suggesting moderate–large effect sizes (table 2). The remaining majority reflect small effects, indicating the highly polygenic nature of IPF. Experts suggest that GWAS signals that have an OR <1.5 are mostly attributable to cryptic population stratification [52, 53]. Moreover, different studies, especially GWASs, have identified different IPF risk SNPs in the same gene that lie close to each other based on their chromosomal location. These are likely in the same genomic risk loci and are likely to be in significant linkage disequilibrium [54]. Thus, the number of distinct loci associated with IPF might be much smaller. This reiterates the fact that only a small proportion of the heritability of IPF is explained by the known genetic variation while large numbers of variants remain undiscovered [4]. A recent GWAS based on whole genome sequencing data estimated the SNP heritability of IPF to be about 32% [46].
Several variants are associated with IPF in certain ethnicities only. We found that SNPs in CR1, HYAL1, TGFB1 and others that have been reported to increase IPF susceptibility in candidate gene studies in Asians have not been identified as risk loci in GWASs in Caucasians [4, 7, 29, 38]. Four of the seven novel variants reported in a recent multi-ancestry GWAS were mostly driven by non-European ancestry [41]. The MUC5B rs35705950 SNP increases IPF susceptibility in Asians but has much lower minor allele frequencies in the general population [55]. As reflected from our review, there is a dearth of GWASs in populations other than Caucasians [8, 41]. Future GWASs in unexplored ethnic populations may identify several novel IPF risk loci.
The SNPs discovered in the included studies were in the genes associated with mucin production, fibroblast biology, ECM metabolism, cell cycle control and others, implicating known pathways involved in IPF pathogenesis [9]. Several genes discovered in recent GWASs are linked to novel pathways with no known mechanisms in relation to pulmonary fibrosis. Notable amongst these are genes encoding proteins involved in G-protein related signal transduction, also known as molecular switches (ARHGDIG, ATP11A, GPR157, RAPGEF2), and members of the zinc finger family proteins involved in the regulation of transcription and expression of other genes (MYNN, ZKSCAN1, ZNF462). The genes ATP11A and ZKSCAN1 were also identified by an earlier GWAS [4]. In addition, most reported genetic variants lie in the noncoding, mostly intronic regions. Intronic variants may be associated with the expression of the same or a neighbouring gene and may be in linkage disequilibrium with a genetic variant in the coding or promoter region [52]. Post-GWAS experiments including resequencing of the implicated genes and gene expression studies might reveal important causal variants for lung fibrogenesis.
The number of IPF subjects included in the GWASs of IPF has increased from 242 in a 2008 study [8] to 11 160 in the 2022 study by Partanen et al. [41]. The latter study is the largest GWAS meta-analysis performed till date and includes subjects from different ethnicities around the world. With the availability of larger genome-wide studies, the role of candidate gene studies may arguably become irrelevant. Yet, we chose to include both small candidate gene studies as well as GWASs in this review as both study types have strengths and limitations. GWASs help in identifying novel variant–trait associations, sometimes leading to the discovery of novel biological mechanisms [11, 52]. The data from these studies can be shared easily resulting in progressively larger meta-analyses that might discover newer variants [4, 41]. Meta-analysis of genetic data across ancestries might magnify the associations of causal variants and tone down the association of proxies [52]. It would also help identify the effects of cryptic population stratification in GWASs. Recently, a GWAS found a significant sex interaction for three novel SNPs in IPF while other common SNPs such as MUC5B did not have a sex interaction effect [56]. Such interaction studies help refine polygenic risk scores, which in future, can become predictive tools. However, GWASs alone have limited clinical predictive value and cannot, by themselves, identify causal variants. Candidate gene studies are crucial as well for they can be performed with lesser resources, covering smaller but much more diverse populations, and in patient cohorts much better characterised than larger biobank-based databases. They are not burdened by multiple testing, may reveal associations with larger effects and allow detailed downstream experiments that might pinpoint the actual causal variants and their pathogenetic mechanisms [52].
We have attempted to compile all the reported SNPs associated with IPF to act as a resource for informing future research studies. We have provided detailed information on the locations, dbSNP IDs and known functions of the implicated genes and variants that can act as a ready reference to construct gene panels. Such panels can be used for clinical cohort characterisation as well as cohort stratification for drug trials. There is a lack of information on IPF genetics in several ethnic groups around the globe. Ideally, GWASs must be performed in these populations to confirm known associations and find novel ones. As GWASs require greater resources, a panel of “hotspots” constructed from the set of SNPs compiled herein might provide a starting point for genetic studies in resource-constrained settings. Gene or SNP sets can also be designed to inform studies on polygenic risk scores. Our review also highlights the unmet need to explore the mechanisms by which the associated SNPs affect cellular processes. Alternatively, the actual protein-altering variants must be discovered for which these SNPs might be acting as proxies. Currently available SNP data from GWASs and their meta-analyses and the novel data from future large-scale genomic studies can be analysed using artificial intelligence (AI) systems that are improving their performance exponentially [57]. Conventional variant-calling tools are likely to be subject to systematic errors and biases due to subtleties of sample preparation, sequencing technology and context, and biologic phenomena such as somatic mosaicism. AI algorithms can process these biases with a known reference standard of variant calls and produce better calls [57].
Our review has several limitations. We have described only common SNPs and not included other types of genetic variation such as rare variants, insertions, deletions, copy number variants, translocations and others. We also excluded studies that did not describe an allelic model. We consciously chose to restrict ourselves to allelic models of common (and low frequency) SNPs to keep the review concise and focused. Also, most reported signals come from GWASs that detect mainly SNPs and describe the allelic model. We have not performed meta-analyses of the association of individual SNPs with IPF. Our purpose was to compile the large number of SNPs associated with IPF for informing future research studies and not to calculate the summary effects of the individual SNPs. Meta-analyses of studies on the association of MUC5B and TGFB1 with IPF are already available [55, 58, 59]. For the interested reader, we have summarised the studies excluded because of the absence of an association of the explored SNP with IPF or lack of the description of an allelic model (table S1).
To conclude, several common SNPs in over 50 genes have been found associated with IPF susceptibility in different studies. The reviewed variants may be used to inform gene panels for future studies.
Questions for future research
Which SNPs or other genetic variants affect susceptibility to IPF in hitherto unexplored populations?
What are the functions of the genes found linked to IPF in GWASs and candidate gene studies?
By what mechanisms do the implicated genes influence the pathogenesis of pulmonary fibrosis?
Do intronic variants found linked to IPF act as proxies for certain variants in the coding or transcription-regulating regions of the genome?
Is it feasible to devise a gene or SNP panel to characterise IPF susceptibility in clinical cohorts or in study populations recruited for pharmacological research?
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERR-0018-2024.SUPPLEMENT
Footnotes
Provenance: Submitted article, peer reviewed.
Data sharing: Data collected for the study is from published literature and will be made available to qualified researchers after the publication of the manuscript upon reasonable request made to the corresponding author (S. Dhooria).
Conflict of interest: All authors have nothing to disclose.
- Received January 30, 2024.
- Accepted June 18, 2024.
- Copyright ©The authors 2024
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org