Abstract
Epigenetic modifications are emerging as important regulatory mechanisms of gene expression in lung disease, given that they are influenced by environmental exposures and genetic variants, and that they regulate immune and fibrotic processes. In this review, we introduce these concepts with a focus on the study of DNA methylation and histone modifications and discuss how they have been applied to lung disease, and how they can be applied to sarcoidosis. This information has implications for other exposure and immunologically mediated lung diseases, such as chronic beryllium disease, hypersensitivity pneumonitis, and asbestosis.
Abstract
In this review, we summarise recent findings relating to epigenetic modifications in lung disease with a focus on sarcoidosis, and propose guidelines and rationale for future epigenomic studies of sarcoidosis https://bit.ly/3d7Is1B
Introduction to epigenetics
Epigenetics refers to potentially heritable changes in gene expression that are not the result of changes to the underlying DNA sequence. Environmental exposures and stimuli influence epigenetic processes that in turn mediate gene expression, often by modifying chromatin and its 3-dimensional structure. As such, the epigenome (the sum of epigenetic modifications across the individual's genome) is influenced both by the environment and underlying genetics. Unlike mutations to DNA sequence, epigenetic marks can be dynamic and reversible, rendering them excellent targets for therapeutics, as well as biomarkers to track disease course and distinguish subgroups in heterogeneous disease. Interestingly they may also be inherited and pass on traits and exposures from parents or even grandparents to offspring. These observations have been difficult to explain owing to a global loss of the majority of known epigenetic marks present in parental chromosomes during primordial germ cell development and after fertilisation. However, potential mechanisms of transgenerational epigenetic inheritance are being unravelled [1]. One of the key recent observations is that transcription factors (TFs) may act as carriers of epigenetic information during germ cell and pre-implantation development [2]. Epigenetic marks have been associated with many biological processes such as ageing and dosage compensation between sexes, in which the non-coding RNA Xist is transcribed from the inactivated X chromosome and is required for stable silencing. Xist expression is shown to be regulated by both DNA methylation and histone modifications [3]. Three of the most studied epigenetic processes are DNA methylation, modification of histone tails, and non-coding RNAs such as microRNAs (miRNAs) and long non-coding RNAs. Given the recent review of miRNAs in sarcoidosis [4], we will mainly focus on DNA methylation and histone modifications in this review (figure 1). Definitions of the commonly used terms in epigenetic studies are given in table 1.
Epigenetic regulation of DNA accessibility. Histone acetyltransferases (HATs) and methyltransferases (HMTs) add acetyl (Ac) and methyl (Me) group specific residues on histone tails to modulate chromatin compaction. DNA methyltransferases (DNMTs) methylate CpG sites. Promoter methylation blocks binding of RNA polymerase (POL) and transcription factors to negatively regulate gene expression. HDAC: histone deacetylases.
Commonly used terms in epigenetic studies
DNA methylation
DNA methylation refers to the addition of a methyl group to cytosine bases in DNA, which can alter accessibility of genetic loci by transcriptional machinery as well as recruit downstream proteins for chromatin remodelling. Methylation is often concentrated in gene bodies, repeat sequences and intergenic regions, with methylation in the promoters of housekeeping genes being rare to allow for gene transcription to occur [5]. Increased DNA methylation in the context of regulatory elements often indicates decreased accessibility to that locus by proteins and other molecules and leads to repression of transcription. However, recent studies have identified TFs that bind methylated DNA [6]. DNA methyltransferases (DNMTs) target CpG dinucleotides, which are loci characterised by a cytosine followed by a guanine in the DNA sequence, and catalyse the transfer of a methyl group to the 5th carbon of cytosine, forming 5-methylcytosine. Cytosine methylation rarely occurs in a non-CpG context, but is mainly restricted to stem cells and brain tissue [7]. Sites of epigenetic variability are often marked with CpG islands, conserved GC-rich regions of the genome with greater-than-average concentrations of CpGs, and more recently identified nearby elements, CpG island shores [8]. CpG islands are often found in the regulatory regions of genes such as cis promoters and more distant regulatory elements such as enhancers [9]. Demethylation of CpGs can occur either actively, through successive oxidation reactions catalysed by the ten-eleven translocase family of enzymes and eventual replacement with cytosine through base excision repair [10], or passively, in which newly synthesised DNA is not methylated by DNMTs. DNA methylation can be assayed genome wide through sodium bisulfite conversion. Sodium bisulfite causes deamination of cytosine to uracil (which is ultimately amplified as thymine), but occurs orders of magnitude more slowly in 5-methylcytosine [11]. Next-generation sequencing and array-based genotyping can then be performed, with thymines mapping to expected cytosine positions representing unmethylated sites. The most commonly used platform for studies of DNA methylation in human samples is the Infinium MethylationEPIC BeadChip that interrogates 850k CpGs [12] and Illumina has recently released the Infinium Mouse Methylation BeadChip with coverage for 285k CpGs in the mouse genome.
Histone modifications
Histone proteins form octamers around which DNA is wrapped to form nucleosomes [13]. Compaction of DNA into chromatin is dependent upon this process. Chromatin density can be dynamically altered through chemical modifications to the protruding tails of certain histones by histone-modifying enzymes. Histone tails can be modified in many ways, including methylation, acetylation, ubiquitylation, and SUMOylation (addition of small ubiquitin-like modifiers), as well as the removal of chemical groups that result from these processes [14]. Acetylation of histone tails is often associated with gene activation, as acetylation of lysine residues neutralises a positive charge that increases affinity between histones and DNA, resulting in decompaction of chromatin at that locus. A common acetylation mark, that of lysine 27 on the histone H3 (H3K27ac) is a mark of active enhancers and active promoters. Histone acetyltransferases acetylate histone tails, histone deacetylases (HDACs) remove acetyl groups from histone tails, and bromodomain proteins are chromatin readers that recognise and bind acetylated histones and play a key role in transmission of epigenetic memory across cell divisions and transcription regulation. Methylation is associated both with activation and repression of genes depending on the level of methylation at different residues in histone tails. For example, H3K4me1 is present at poised and active enhancers while H3K4me3 marks poised or active promoters. Similarly, histone methyltransferases add the methyl groups to histone tails while histone demethylases remove them. The Encyclopedia of DNA Elements (ENCODE) and Roadmap Epigenomics projects have created maps of histone modifications in cell lines and primary cells/tissues, providing important public resources for the research community [15]. Histone modifications can be queried genome wide through chromatin immunoprecipitation with DNA-sequencing (ChIP-seq) [16]. ChIP-seq involves cross-linking of proteins of interest to genomic DNA, fragmentation, pulling down proteins of interest with antibodies, and purification and amplification of bound DNA, which is then sequenced to provide a genome-wide view of protein binding.
Recent epigenomic advances
There are many new technologies that enable epigenomic research. While ChIP-seq provides detailed information on histone modifications associated with different chromatin states, it is costly and time-consuming to perform multiple ChIP-seq experiments. Because of this, the field has moved to broad assessment of chromatin accessibility [17], with the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq) more commonly being used. ATAC-Seq utilises a transposase which preferentially inserts in open chromatin to ligate sequencing adapters to these regions. Sequencing then produces “pile-ups” of reads aligning to open chromatin regions of the genome [18]. ATAC-seq has been optimised for many sample types, including frozen cells/tissues not amenable to more traditional ChIP-seq protocols [19].
The advent of single cell sequencing, in which nucleic acid is measured in thousands of suspended single cells from tissue, has been applied to epigenomics, with single-cell methylation, histone modification, and non-coding RNA sequencing methodologies. Single cell techniques allow for the quantification of thousands of cellular features in thousands of cells within the same experiment. Protocols have also been developed and are being implemented for large scale experiments to assess multiple -omes [20]. Another important technology in epigenomics is high-resolution contact mapping, which assays chromatin contacts genome wide [21]. In this assay, chromatin is crosslinked using formaldehyde, fragmented, and proximity ligated. The resulting fragments are amplified and sequenced, providing a map of chromatin contacts.
Epigenetics and the immune system
Epigenetic regulation is essential in the development and function of the immune system, including cellular differentiation. This is seen in the differentiation of naïve CD4+ helper T-cells (Th) into distinct Th subsets [22, 23]. Cell fate is determined by the presence of signalling molecules that can induce and repress expression of certain transcription factors, as their active transcription drives differentiation. For example, the cytokine interleukin (IL)-12 drives differentiation of Th1 cells, which are characterised by the active transcriptional state of the TF T-bet. Th2 differentiation, on the other hand, is induced by IL-4, with transcription of GATA-3 marking mature Th2 cells. These processes are regulated by DNA methylation and histone modifications [22]. Specifically, methylation of a DNaseI-hypersensitive region in the 3′ end of IL-4 has been associated with Th1 differentiation, while demethylation of a proximal promoter and regulatory elements in the IL-4 gene results in Th2 differentiation [24, 25]. Th1/Th2 cytokine production is affected by methylation of the promoter of interferon (IFN)-γ, impacting TF binding [26, 27]. Similarly, the TF FOXP3, regulated by DNA methylation, is critical for differentiation of regulatory T-cells [23].
Exposures and epigenetics
Environmental exposures often result in epigenetic changes that can drive, modify or indicate disease. As such, epigenetic changes may mediate the effects of exposures on human health. Epigenetics has already been shown to play a significant role in the development, maintenance and responses of the respiratory system while studies linking exposure to air pollution, epigenetics and disease are emerging [28]. Tobacco smoke has long been known to be linked to decreased lung function and risk of COPD and lung cancer [29]. Exposure of airway and bronchial epithelial cells to cigarette smoke results in histone modifications associated with transcriptional repression and generalised genome-wide demethylation [30]. Exposure to air pollution has been linked to increased morbidity and mortality [31]. Inhalation of particulate matter from traffic-related air pollution and other byproducts produced in certain occupations have been associated with declining lung health and respiratory disease [32].
Introduction to sarcoidosis
Sarcoidosis is a granulomatous disease affecting an estimated ∼60 out of 100 000 Americans per year, with higher prevalence observed in African–Americans and women [33]. Outcomes are variable and associated with race, sex and lower income [34]. Women tend to present at older ages and with less severe disease as measured by the Scadding scale [35]. While sarcoidosis is a systemic disease, affecting almost any organ and multiple organs in the body, the most prominent and severe manifestations are related to lung, heart and neurologic involvement as they are associated with increased morbidity and mortality. Pulmonary sarcoidosis is present in 90% of affected individuals [36, 37] and can be asymptomatic; if symptoms manifest, coughing, wheezing, and difficulty breathing are common, as with many interstitial lung diseases. Pulmonary fibrosis, which tends to occur later in disease, is the main cause of mortality in affected individuals. The course of disease is difficult to predict, with at least 25% of patients developing chronic or progressive disease requiring treatment [38] while acute disease often resolves within months without the need for treatment. Patients with chronic disease are older and more often male [35]. Treatment is generally initiated in symptomatic patients experiencing impaired organ involvement, quality of life or risk of complication and generally consists of corticosteroids such as prednisolone, or immunosuppressants such as methotrexate and azathioprine in steroid-intolerant patients and as steroid-sparing agents in long-term therapy for patients with chronic disease [39].
At the molecular level, sarcoidosis appears to have many similarities to other granulomatous diseases, including chronic beryllium diseases (CBD) [35]. The pathologic hallmark of sarcoidosis is the noncaseating granuloma, characterised by a cluster of immune cells with a core of macrophages, histiocytes and multinucleated giant cells surrounded by a layer of lymphocytes. A Th1 predominate immune response is noted in sarcoidosis, with increased IFN-γ and tumour necrosis factor (TNF)-α in sera, lymph nodes and bronchoalveolar lavage (BAL) [40]. Fibroblasts and collagen can be found surrounding and extending from granulomas and likely drive fibrosis of affected organs. Granulomas are thought to form in response to foreign substances, including certain antigens, organisms and foreign bodies; antigen presentation drives T-cell differentiation and promotes an inflammatory response, including macrophage recruitment, which in turn results in release of chemokines to attract cells to the site of inflammation. Ultimately, granulomatous inflammation is thought to be an immunological mechanism to isolate and sequester antigens/pathogens/foreign substances in the body. Some granulomatous diseases, such as tuberculosis, are triggered by bacterial or fungal infection [41], while others, such as CBD and hypersensitivity pneumonitis, are driven by occupational and or environmental exposures [42].
Despite an ultimately unknown aetiology, studies have identified many genetic, environmental and host immunologic risk factors associated with sarcoidosis development, course and outcomes [43].
Genetic risk factors
Familial clustering of cases has long indicated a likely genetic component to disease. Family studies suggest that between one-third and two-thirds of the risk of disease is due to genetic variation [44, 45]. Early genetic studies focused on the human leukocyte antigen (HLA) region, which showed both protective and risk-increasing alleles associated with sarcoidosis and related phenotypes in a variety of populations [46–53]. The ACCESS study showed that the HLA-DRB1*1101 allele was associated with disease in both European and African–Americans [49]. Interestingly, this allele has since been shown to significantly interact with insecticide exposure at work to confer increased sarcoidosis risk [54]. A number of studies have now identified interactions between genetic variants and exposures in sarcoidosis [55, 56].
Multiple association studies have identified further genetic associations with disease outside of the HLA region. Variants have been identified in genes such as ANXA11 [57–62], BTNL2 [63, 64], XAF1 [65] and NOTCH4 [66]. The GenPhenReSa consortium genotyped over 19 000 individuals on the Illumina Immunochip, and found novel associations mapping to the genes ATXN2/SH2B3, IL12B, MANBA/NFKB1, FAM117B and IL23R [67]. A recent genome-wide association study (GWAS) in a Japanese cohort identified signals in CCL24, STYXL1-SRRM3 and IL23R [68]. Many of the aforementioned genes have roles in immune response. However, most of the identified variants in these studies have small effect sizes, explaining a small proportion of disease risk, and are population specific. Furthermore, many identified genetic variants are intronic or intergenic and require further functional studies to elucidate their role in disease.
Genetic architecture has also been shown to affect disease presentation. Löfgren syndrome, a distinct phenotype of sarcoidosis characterised by joint pain and erythema nodosum, has a favourable prognosis and often resolves within 2 years. A recent study comparing sarcoidosis patients with and without Löfgren syndrome identified multiple genetic loci which confer risk for one phenotype but not the other [69].
Environmental risk factors
Although many studies have focused on identifying causal antigens that may drive sarcoidosis, without a known and replicable antigenic stimulus, cell and animal models have been difficult to develop and often do not recapitulate key features of disease [70, 71]. Regardless, many associations between disease and environmental factors have been observed. Studies in multiple populations have observed variable incidence of disease by season with incidence peaking in winter months in the USA [72, 73], as well as geographical clustering, such as in North Eastern USA, suggesting a likely environmental cause or stressor [74]. The ACCESS study observed associations between sarcoidosis and agricultural employment, insecticide exposure, and work environments with mould/mildew exposures [75]. Multiple studies have also linked disease to wood combustion, with firefighters displaying high rates of disease [76, 77] and wood stove and fireplace use [78], as well as silica exposure [79–81]. While these exposures may directly drive disease, they may also predispose individuals to develop disease by compromising the host immune response.
Studies have implicated a possible microbial antigen in disease [82–84], with multiple groups finding mycobacterial DNA and proteins in sarcoidosis tissue samples [82, 84]. Despite this, there is no evidence of active mycobacterial infection in sarcoidosis patients. A recent 16S sequencing study interrogated pulmonary microbiota in sarcoidosis BAL compared to other interstitial lung diseases [85] and found no significant differences in composition or diversity between groups.
Immunological risk factors
Aside from genetic variation in immune genes such as the HLA region, host immunity is altered in disease. Recent studies have shown that Th17.1 cells are the predominant CD4+ T-cell subset in disease, originating from Th17 cells and expressing IFN-γ, and are associated with inflammation and autoimmunity [86]. Sarcoidosis BAL is heavily enriched for Th17.1 cells while Th17 and Th1 percentages did not differ from controls [87]. Broos et al. [88] showed that in sarcoidosis BAL, 48% of Th cells are Th17.1s, 17% are Th1s, and that Th17.1 cells are also enriched in the mediastinal lymph nodes.
T-cells in disease have also been shown to have impaired function, consistent with cell exhaustion. CD4+ T-cells in peripheral blood show increased expression of PDCD1, a gene which encodes PD1. This upregulation was confirmed in BAL and blood at the protein level [89]. Regulatory T-cells have also shown a diminished ability to counteract granulomatous inflammation in sarcoidosis compared to other granulomatous diseases [90]. Furthermore, biased T-cell receptor variable region gene repertoires have been observed in disease [91, 92].
Evidence for epigenetic regulation in sarcoidosis
The advancement of epigenomics, along with associated technologies and methodologies, allows for novel methods to study complex diseases for which traditional methods have proven ineffective. This is especially true in the respiratory system, which has many endogenous and exogenous inputs and is the site of many exposures. Several features of the disease indicate that epigenetic mechanisms may drive sarcoidosis (figure 2). 1) Gene expression studies demonstrate significant changes in transcriptional profiles in sarcoidosis transbronchial biopsies [93] and peripheral blood cells [94, 95]. 2) Differentiation of naïve CD4+ T-cells into Th1, Th2, Th17 and T-regulatory lineages, critical in sarcoidosis, is regulated by TFs via DNA methylation and histone modifications [23]. 3) Other immune-mediated diseases implicate epigenetic regulation of immune function and immunopathogenesis. Specifically, studies of DNA methylation in lupus [96], rheumatoid arthritis [97, 98], and in CD4+ T-cells in juvenile idiopathic arthritis [99], revealed new genes and regulatory elements associated with disease and disease severity. 4) Genes that are important in epigenetic regulation, including HDAC and other histone and chromatin modifying genes are associated with sarcoidosis [93]. Demographic differences in sarcoidosis presentation also implicate possible underlying epigenetic mechanisms. The relevance of epigenetics in sarcoidosis is also supported by findings in other lung diseases. Differential methylation has been found in tuberculosis, with those infected patients with active disease (5–10% of patients) demonstrating a different transcriptomic profile than infected individuals without signs of disease [100]. While asthma and atopy are Th2 driven, they are also epigenetically regulated [101]. Another important feature common to epigenetics and sarcoidosis is the strong environmental influence. Finally, sarcoidosis has a variable disease course, with some individuals developing more progressive lung disease and even fibrosis while others do not. Another fibrotic lung disease, idiopathic pulmonary fibrosis, is associated with extensive DNA methylation and gene expression differences in lung tissue [102], suggesting that this is likely going to be the case for progressive forms of sarcoidosis.
Features common to sarcoidosis aetiology that can be mediated through epigenetic modifications, providing the rationale for the study of epigenetics in sarcoidosis. Th: T-helper cell.
Despite these strong lines of evidence for the potential of epigenetic regulation in sarcoidosis, investigations to date have been limited to studies of miRNAs [103–109], reviewed in [4], with one epigenome-wide association study (EWAS) of DNA methylation [110] and none interrogating histone modifications. However, we believe that epigenetic regulation of gene expression in sarcoidosis is a promising area of research that needs to be developed, and we use the Applications of Epigenomics to Sarcoidosis section to highlight a few potential directions.
In a 2019 study, our group investigated coupled differential methylation and expression in sarcoidosis as well as CBD [110]. In a comparison to BAL of nine progressive versus 15 remitting sarcoidosis patients, we found 516 CpGs near 420 unique genes significantly associated with disease progression (false discovery rate-adjusted p<0.05 and >5% methylation change). Interestingly, many differentially methylated sites were within or near differentially expressed genes, many of which are involved in immunity (HLA, chemokines, MSMB, KIR3DL2). We also observed considerable directional overlap between sites in sarcoidosis and CBD [110]. Koth et al. [94] observed a similar pattern in blood, with sarcoidosis transcriptomic profiles largely overlapping those of tuberculosis patients.
Applications of epigenomics to sarcoidosis
An increased awareness and understanding of epigenetics as a driver and modifier of lung disease has advanced the field. By characterising lung disease signatures of epigenetic modification, we can better study and elucidate the relationship between genetics, environment and disease, including disease pathogenesis (figure 3). We have long known that genetics and environment interact to drive many diseases, and the development of epigenetics is allowing us to explore this interaction in novel and exciting ways. The National Heart, Lung, and Blood Institute conducted a workshop to define next steps in sarcoidosis research [111] and epigenomics was defined as one of the priority areas in the application of -omics and systems biology to sarcoidosis research [112].
Conceptual overview of the proposed importance of epigenetics in sarcoidosis. Epigenetic marks influence expression of genes that regulate differentiation of T-cell subtypes, leading to pulmonary sarcoidosis, as well as other immune-mediated lung diseases, like chronic beryllium diseases and hypersensitivity pneumonitis. Epigenetic marks may be important in disease biology and severity, and represent novel biomarkers of disease as well as therapeutic targets. Th: T-helper cell; Treg: regulatory T-cell.
Endotyping
Complex diseases and their clinical phenotypes are influenced by multiple genetic and environmental components, as well as their interactions. In many cases, modifying contributory factors can impact the resultant phenotype. As such, variable genotypes and phenotypes are often observed in complex disease. This can make study difficult, as a disease may result from multiple inputs and pathways. For example, asthma has long been studied and approached as a single disease, even though phenotypes of reversible airway obstruction can be achieved through multiple pathways. Transcriptional profiling of asthma has revealed two major groups of patients: those with Th2 driven inflammation (Th2 high) and those without (Th2 low) [113]. Corticosteroid treatment has shown to be effective in Th2-high individuals but not Th2-low asthma [114], highlighting the importance of subclassifying diseases based on driving mechanisms.
Subtypes of sarcoidosis have already been defined based on clinical characteristics and computed tomography features [115]. A recent study involving RNA-sequencing of 209 sarcoidosis BAL samples found expression of many genes associated with disease, as well as specific clinical features [116]. Weighted gene co-expression network analysis followed by K-means clustering of genes within modules of interest showed four potential endotypes of disease relating to: 1) hilar lymphadenopathy and acute lymphocytic inflammation; 2) extraocular organ involvement and phosphoinositide 3-kinase (PI3K) activation; 3) chronic disease; and 4) multiorgan involvement with increased immune response. Similar to transcriptional endotypes of disease, the field will be able to define epigenetic endotypes of sarcoidosis disease; this work has begun in asthma [117].
Biomarkers
Omics studies have identified multiple protein and gene expression signatures associated with sarcoidosis. Unsurprisingly, many of the molecules upregulated in disease relate to processes such as T-cell skewing, granulomatous processes, and macrophage and lymphocyte activity. Studies of protein quantification of exhaled breath condensate have shown multiple proteins associated with disease, such as TNF-α, TGF-β1, neopterin and angiotensin converting enzyme (ACE) [118, 119]. Protein studies of blood serum have shown that SAA, ACE, lysozyme, chitotriosidase, CXCL9, CXCL10, CXCL11, CCL18, matrix metalloprotein (MMP)7, periostin and soluble IL-2 receptor (sIL-2R) levels are associated with disease [120–126], and that ACE and sIL-2R levels correlate with lung function increases in methotrexate treatment [127]. Chitotriosidase, ACE, sIL-2R, neopterin and TNF-α levels have also been shown to be increased and predictive of disease in BAL fluid, with sIL-2R levels being predictive of severity [128–131].
However, many current biomarkers are limited by lack of specificity as they are associated with broad immune activities upregulated in multiple diseases. To move past this issue, recent studies have derived signatures of disease consisting of multiplex signals from many molecules. In a proteomic screen, Bhargava et al. [132] observed 272 proteins significantly differentially abundant between sarcoidosis cases and controls. These proteins were enriched for both known (phagosome maturation and redox balance) and novel (integrin-linked kinase signalling) pathways in disease. Crouser et al. [133] showed gross transcriptomic differences in lung tissue and identified novel dysregulated genes in disease, among them MMP12 and ADAMDEC1. Koth et al. [94] performed a similar analysis in blood and built a classification model predicting disease with 92% specificity and sensitivity. Another group used peripheral blood gene expression to build a 20-gene signature distinguishing sarcoidosis with and without complications that may affect treatment [134]. These protein and gene expression studies pave the way for incorporating epigenetics into biomarker studies of sarcoidosis, as has been done in lung cancer [135].
Therapeutic targets
Current sarcoidosis therapeutics suppress inflammation, in a relatively nonspecific manner, but have no effect on fibrosis. Epigenomic studies can implicate new molecules to target with therapeutics, which can be evaluated both in vivo and in vitro. The dynamic nature of epigenetic modifications provide opportunities to experimentally modulate biological pathways, which has proven invaluable in disease study and treatment. Specifically, DNA methylation changes have been shown to drive tumour formation and malignant progression [136], and thus have established basic mechanisms for disease pathogenesis and targets for intervention in cancer. DNMT inhibitors have been approved for the treatment of myelodysplastic syndrome and are in clinical trials for treatment of solid tumours [137]. Interestingly, a recent study showed that DNMT and HDAC inhibitors induce cryptic transcription of thousands of treatment-induced non-annotated transcription start sites [138]. While currently available DNMT inhibitors lack specificity for gene(s) of interest, locus-specific therapies are currently being developed using genome editing tools based on the CRISPR/dCas9 system [139, 140]. Additionally, current US Food and Drug Administration approved and in development histone mark modifying drugs are effective in targeting specific gene loci and pathways [141] and treating lung cancer [142].
Epigenome-wide association studies
EWAS, based largely on the methodology of GWAS, compare the epigenomes between cases and controls (although quantitative traits can be interrogated as well) to identify statistically significant associations between certain marks and disease. However, interpretation of the results of these studies has been confounded by multiple issues and has prompted a set of guidelines for the second generation of EWAS [143]. Some of the challenges for studies of DNA methylation include accounting for the effect of genetic variants on methylation levels, distinguishing inherent changes in DNA methylation within a cell versus changes observed due to differences in cell proportions, and interpretation of DNA methylation data in the absence of other epigenetic data. Going forward, epigenetic studies in sarcoidosis should be designed addressing these principles. It will be important to integrate DNA methylation with genetic variants, chromatin accessibility, 3-dimensional genome information, and TF binding data to provide insight into the function of the epigenome.
Methylation quantitative trait loci studies
As methylation is one mechanism through which genetic variation can impact disease, genomic and methylomic data can be integrated to identify possible functional mechanisms driving genetic associations with disease. Genetic loci associated with methylation of specific sites, known as DNA methylation quantitative trait loci (meQTLs), are heavily enriched for cis interactions, with genetic variants often affecting methylation of proximal CpGs. meQTLs have been shown to be enriched for intronic regions and miRNA binding sites, with the associated CpG sites enriched for promoter and enhancer regions [144]. meQTLs also show co-localisation with expression quantitative loci, and Pierce et al. [145] show that in a subset of these co-localised signals, expression or methylation is mediating the effect on the other. The same analytical principles can be used to identify associations of methylation and expression, or expression quantitative methylation. Recently, Kim et al. [146] found 16 867 significant expression quantitative methylations in Puerto Rican children and adolescents with asthma. CpGs were found to be on average 378 kb from related genes and enriched in enhancers, indicating the ability of expression quantitative methylation studies to identify methylation-expression relationships relevant to disease.
Important considerations for future epigenomic studies of sarcoidosis
There are a myriad of practical concerns relating to undertaking epigenomic research. We attempt to address common considerations in designing these experiments, as well as highlight how these considerations may apply to sarcoidosis studies.
Tissue/cells of interest
Epigenetic marks are often cell- or tissue-specific, making it important to carefully consider the tissue of study. The use of samples such as blood and exhaled breath condensates is appropriate for biomarker studies, as an easily accessible tissue obtained noninvasively allows for screening of many individuals. However, to better understand molecular pathogenesis of disease, specific affected tissues and cell types should be interrogated. In studies of bulk tissue, epigenomic signals are derived from many cells. This risks confounding, especially in diseases with known alterations in cell proportions such as sarcoidosis. Multiple approaches have been implemented to account for cellular heterogeneity in epigenomic data. One approach is to perform statistical deconvolution of epigenomic profiles by relying on cell type specific features. Deconvolution approaches have been widely used in peripheral blood profiling studies [147], and more recently in complex tissues [148]. Reference-free methods overcome the issue of well-typed markers of disease, which are often difficult to establish in hard-to-access tissues such as the lung. The second approach is to characterise cell populations in the sample with a method such as flow cytometry, and use resulting counts to adjust for cell proportion differences. The third approach is to isolate cell types of interest based on flow cytometry; the concern with this approach is that cell sorting may affect cellular epigenomic patterns. The recently developed single cell technologies provide the best solution to identification of relevant cell populations albeit they have technical limitations and are currently cost prohibitive [149].
Sample size
Most previous -omic studies in sarcoidosis have been limited by small numbers. Generally, studies of >1000 subjects are adequately powered to detect most methylation differences in peripheral blood [150]. Required sample sizes will be smaller for target tissues such as BAL. While we have demonstrated that EWAS methodologies have sufficient power to detect gross differences in DNA methylation in BAL between progressive and remitting cases of disease (n=24), in order to detect stable and replicable signatures that can be used for purposes such as biomarkers and endotyping, larger numbers of individuals are necessary. Sarcoidosis research consortia such as ACCESS and GRADS have pooled samples from multiple centres, providing greater numbers to be leveraged for large-scale -omic research. Additionally, signals derived from a population with varied geography, genetics and exposures are more likely to provide generalisable and validated results across additional populations that may differ in these factors.
Timing of sampling
Epigenetic marks are dynamic. Epigenomic assays at a single time-point only provide a snapshot of molecular events at that moment in time. It is thus important to quantify epigenetic marks at temporally relevant times in disease development. For a disease such as sarcoidosis, where investigators are interested in predicting disease course, longitudinal studies should be leveraged. By quantifying epigenomic marks at multiple time-points in disease course or treatment, we can hopefully add to the understanding of disease pathogenesis, and identify epigenomic signatures related to disease stage and predictive of progression.
Conclusions
Sarcoidosis is a complex disease driven by a combination of genetic, environmental and host immunologic factors. While previous studies have elucidated many of these risk factors, there are still many gaps in our knowledge of sarcoidosis development and progression. Epigenomic studies will provide a means of linking risk factors to disease pathobiology to better understand disease course, aid in antigen studies, and subclassify patients based on molecular profiling to better direct research and treatment.
Footnotes
Provenance: Submitted article, peer reviewed.
Conflict of interest: I.R. Konigsberg has nothing to disclose.
Conflict of interest: L.A. Maier reports grants from National Institutes of Health, during the conduct of the study; and grants from National Institutes of Health, University of Cincinnati under a Mallinckrodt Foundation, MNK14344100 and ATYR1923-C-002, outside the submitted work. L.A. Maier is a member of the FSR Scientific Advisory Board. She does not receive any compensation for this activity.
Conflict of interest: I.V. Yang reports grants from NIH, during the conduct of the study; and personal fees from Eleven P15, outside the submitted work. I.V. Yang has a patent “Circulating Biomarkers of Preclinical Pulmonary Fibrosis” pending.
- Received March 16, 2021.
- Accepted April 6, 2021.
- Copyright ©The authors 2021
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org