Abstract
Multiplex protein technology has the potential to identify biomarkers for the differentiation, classification and improved understanding of the pathogenesis of interstitial lung disease. The aim of this study was to determine whether a 30-inflammatory biomarker panel could discriminate between healthy controls, sarcoidosis and systemic sclerosis (SSc) patients independently of other clinical indicators. We also evaluated whether a panel of biomarkers could differentiate between the presence or absence of lung fibrosis in SSc patients.
We measured 30 circulating biomarkers in 20 SSc patients, 21 sarcoidosis patients and 20 healthy controls using Luminex bead technology and used Fisher’s discriminant function analysis to establish the groups of classification mediators.
There were significant differences in median concentration measurements between study groups for 20 of the mediators but with considerable range overlap between the groups, limiting group differentiation by single analyte measurements. However, a 17-analyte biomarker model correctly classified 90% of study individuals to their respective group and another 14-biomarker panel correctly identified the presence of lung fibrosis in SSc patients.
These findings, if they are corroborated by independent studies in other centres, have potential for clinical application and may generate novel insights into the modulation of immune profiles during disease evolution.
Sarcoidosis, a multisystem granulomatous disease of unknown origin, which characteristically affects the lungs, is associated with upregulation of the monocyte/macrophage–T cell helper (Th) type 1 cell axis and polyclonal B cell activation 1. Systemic sclerosis (SSc) is a systemic autoimmune disease characterised by tissue fibrosis (skin and lung in particular), endothelial cell injury and persistent cellular immune activation (Th1, Th2 and B cells) 2–4. Currently, SSc is commonly classified according to the degree of skin involvement and the presence of SSc-related autoantibodies that correlate well with the clinical phenotypes. Several studies have shown that serum levels of a number of cytokine and chemokine are either up- or downregulated in patients with sarcoidosis and systemic sclerosis. Some of these have the potential to serve as markers of disease development since they also correlate with disease status and organ involvement 5, 6. Pulmonary fibrosis, an important complication of both conditions, responds poorly to current therapies. Therefore, there is considerable interest in identifying biomarkers that may support early therapeutic intervention and delineate a favourable response 7.
The use of analytical platforms capable of simultaneous analysis of large numbers of proteins (protein microarrays) can improve our understanding of the mechanistic basis of immune diseases and can also identify useful biomarkers for clinical diagnosis, subclassification of disease or response to therapy 8. The use of protein microarrays in immunology has largely concentrated on profiling secreted proteins (cytokines, chemokines or growth factors) or autoantibodies 9. Multiplex analysis of either secreted proteins or autoantibodies has been reported for a number of inflammatory diseases; however, there have been very few reports of the use of this technology in patients with respiratory disease in general and interstitial lung disease in particular 8, 10. Serum protein microarrays have enormous potential for risk stratification and as surrogate markers of response to treatment, given the inaccessibility of the lung to regular sampling. However, changes in the levels of such inflammatory biomarkers are not unique to individual clinical conditions and tend to overlap considerably between conditions. In addition, usually few mediators are concomitantly studied in such investigations and subsequently only mediators with demonstrable significant median changes are used to establish clinical differentiation by applying a threshold cut-off method outside the limits of normality. This tends to produce strict cut-off criteria that can only be fulfilled by a small subset of subjects. However, compared with individual measurements, it is possible that greater specificity and inclusivity may be achieved by a combination of mediators.
In this study, we describe the use of a protein microarray panel to discriminate between healthy controls and patients with two disparate immunologically driven conditions: pulmonary sarcoidosis or systemic sclerosis (conditions where immune activation may precede the onset of pulmonary fibrosis). The main aim of this study was to investigate inflammatory profiles using multibiomarkers consisting of cytokines, chemokines and growth factors and assess whether individual mediators or panels of mediators could classify study participants into their three distinct groups (healthy controls, sarcoidosis and systemic sclerosis) on the basis of biological measurements alone. Secondly, we tried to explore whether a multibiomarker panel could differentiate between the presence and absence of lung fibrosis in SSc. Thirdly, we assessed the complexity of correlations between mediators in health and disease. We found a multibiomarker panel of 17 analytes that could classify the subjects to their respective clinical group. We suggest that independent validation of these findings and further development of protein microarrays may provide new insights into the pathogenesis of fibrotic lung diseases and also the development of clinically informative biomarker panels.
MATERIALS AND METHODS
Patient recruitment
Serum was obtained from sarcoidosis and SSc patients. All patients satisfied the accepted international diagnostic criteria for both conditions 11, 12. Demographic information is given in table 1⇓. Given the epidemiology of both diseases it was not possible to match patient groups for age or sex. Patients were excluded if they were taking corticosteroid therapy >10 mg of prednisolone per day or equivalent, or if they were on any of the powerful second-line immunosuppressants, such as cyclophosphamide, azathioprine or methotrexate. Biomarker results were later analysed for any confounding effects of sex, age or treatment. Serum from healthy controls was obtained from an anonymised research serum bank held within the Kennedy Institute, Charing Cross Hospital, Imperial College Healthcare NHS Trust (London, England) and a proportion of these were collected during this study. Sample handling and storage were identical for historical and recent controls and patient groups. Ethical permission for this study was obtained from the Brompton, Harefield and NHLI Ethics Committee (REC No 02-002 and 01-257).
Experimental method
30 proteins were measured in serum using a fluorescent bead-based technology (Luminex® Corporation, TX, USA). The assays were carried out in compliance with the kit manufacturer. In brief, antibody-conjugated beads were prepared and aliquoted into a 96-well pre-wetted filter plate, before addition of either 100 μL of standards solution in the designated wells or 100 μL of serum, diluted 1:2 with assay diluent. Subsequent steps involved washes interspersed by the addition of biotinylated detector antibody and later Streptavidin-RPE solution. Finally, the plate was placed on the XY platform of the Luminex 100 instrument for analysis. STarStation software was used for data acquisition and analysis. Standard curves were generated for each analyte, and the mean fluorescence intensity value of each analyte in each well was converted into a concentration using the linear portion of the standard for all detected values. This value was then multiplied by the dilution coefficient 2 to give the concentration of the analyte in the original serum.
The fluorescent beads (FIDISTM) for this study were obtained from Biomedical Diagnostics (BMD, Croissy Beaubourg, France). Biomarkers assayed were classified according to their predominant immune function as follows: innate immunity (interleukin (IL)-1, -6, -8, -12p40, -15, interferon (IFN)-α, tumour necrosis factor (TNF)-α, TNF-RI and TNF-RII), adaptive immunity (IL-2, -4, -5, -10, -13, -17 and IFN-γ), chemokines interferon-inducible protein (IP-10), monokine induced by IFN-γ (MIG), macrophage inflammatory protein (MIP)-1α, MIP-1β, regulated on activation, normal T cell expressed and secreted (RANTES), monocyte chemoattractant protein (MCP)-1 and eotaxin, growth factors (granulocyte colony-stimulating factor (G-CSF), granulocyte–macrophage colony-stimulating factor (GM-CSF), epidermal growth factor (EGF), hepatocyte growth factor (HGF), fibroblast growth factor basic (FGFb) and vascular endothelium growth factor (VEGF)), and to death receptor (DR)5.
Data analysis
For individual serum mediators, differences between study groups were detected using the Kruskal–Wallis test. When the p-value was <0.05 a Mann–Whitney test was used to compare pairs from all three. Spearman’s rank correlation was used to establish the interrelationship between analyte levels in each study group. Here we report all correlations and we highlight “strong correlations” that survive Bonferroni correction for multiple comparisons (p<0.001).
Fisher’s discriminant function analysis was used for multigroup classification. A variable was entered into the “model” if the significance level of its F-value was <0.05 and was removed if the significance level was >0.1. The model was then used to classify each case into a diagnostic group. Cross-validation was also included in which the phenotype of each case was classified by the functions derived from all cases other than the case. We measured the percentage accuracy of the model, i.e. the percentage of correct predictions made by the model, and the receiver operator curve (ROC) to gain insight into the decision-making ability of the model, i.e. how likely is the model to accurately assign an individual to a correct group in the context of the other two groups. The probability from the discriminant analysis for a case belonging to a particular diagnostic category was used to build the ROC for the model. Discriminant function analysis was also used to determine which variables can differentiate between scleroderma patients with and without lung fibrosis. All analysis was conducted using the SPSS v14 software package (SPSS, Chicago, IL, USA).
RESULTS
Analysis of individual serum proteins
20 serum proteins were elevated in one or both patient groups compared with controls (table 2⇓). Innate immune system cytokines IL-1, -6, -8, TNF-RI, TNF-RII and growth factors EGF and HGF were elevated in both diseases. In scleroderma there was elevation of Th1 chemokines (IP-10, MIG, MIP-1α, MIP-1β), Th2 chemokines (MCP-1 and eotaxin), and the Th17 cytokine IL-17. In sarcoidosis there was elevation of Th1 chemokines (IP-10, MIG, MIP-1α, MIP-1β and RANTES) (table 3⇓). There was considerable overlap in serum cytokine levels between the two disease groups so that it was not possible to classify patients using individual protein levels alone (see fig. 1⇓).
We did find some differences in cytokine levels between groups according to complications. When sarcoidosis patients were classified according to the presence or absence of pulmonary infiltrates/fibrosis, only IL-6 was significantly increased (p = 0.026) in patients with pulmonary infiltrates/fibrosis (chest radiograph stage II–IV) compared with patients without pulmonary infiltrates/fibrosis (stage 0–I). In scleroderma, we observed a significant increase of EGF (p = 0.027) in those positive for the anti-Scl-70 autoantibody compared with the anti-Scl-70 negative group. We looked for any confounding effects of treatment. In sarcoidosis, IL-6 was significantly raised in treated patients compared with those on no therapy (median 44.2 pg·mL−1 versus 9.0 pg·mL−1; p = 0.036) suggesting it may be a marker for more severe disease. In scleroderma, FGFb was significantly higher in untreated patients than in treated (median 31.0 versus 0.1 pg·mL−1; p = 0.032) and eotaxin was elevated compared with healthy controls in untreated patients only (p = 0.002), suggesting that serum levels of both may be reduced by glucocorticoid therapy. However, the initial use of the Kruskal–Wallis for these analyses denotes a post hoc approach and when corrected for multiple comparisons these associations do not survive correction. Nevertheless, the tendency towards significance is reported here as it can form the basis for independent confirmatory studies.
Correlation between serum biomarkers levels
We conducted a correlation analysis to establish the underlying protein networks in each study group. To visualise these correlations we used the nodes/vertices format in which serum proteins are considered as nodes and the significant correlations as the lines/vertices in the graphs displayed (see fig. 2⇓). In healthy controls there were strong positive correlations (p<0.001 after Bonferroni correction) between IL-17 and TNF-α and IFN-α and negative correlations between granulocyte colony stimulating factor (GCSF) and MIP1β and EGF (fig. 2a⇓). In patients with sarcoidosis the topology of the inflammatory protein network was different and there appeared to be significant positive correlations between IL-8 and EGF, TNF-α and IL-1 (fig. 2b⇓). IL-1 levels also strongly correlated with IL-6 levels. The most complex inflammatory network was observed in scleroderma patients (fig. 2c⇓), with three significant relationships; positive correlations between IL-1 and GM-CSF, MIP1-α and IL-15, positive correlations between TNF-α with IL-15 and IL12p40 and negative correlations between RANTES and IL-15, IL-2, IL-4 and IFN-α.
Differentiation of the study groups by combination of mediators
We used discriminant analysis to build a “model” that can best predict to which group (healthy controls, sarcoidosis, SSc) a case belongs. Using all cases in this study we identified a group of 17 mediators (IL-1, TNF-α, IL-15, TNF-RI, TNF-RII, IFN-γ, IL-2, VEGF, IP-10, FGFb, MCP1, MIP1α, RANTES, G-CSF, eotaxin, HGF, MIG) that can be used to group cases into healthy controls, sarcoidosis or SSc. The 17-cytokine profile correctly classified 89.5% of cases. A limitation of this type of analysis is that the cytokine profile may be biased by results obtained from a small number of patients. We then cross-validated our findings by constructing a new analytical model whereby each individual patient was in turn removed from the original data set and the 17-cytokine profile obtained from the remaining cases was then used to determine whether the omitted case could be correctly classified as belonging to either the healthy control, sarcoidosis or SSc patient group. This analysis correctly classified 84.2% of omitted cases. We also randomly assigned original cases into a training cohort (60% of original cases) to obtain new classification function coefficients for the 17 analytes and then applied the new parameter obtained to a test cohort that consisted of the remaining cases. The training cohort correctly classified 94.2% of cases correctly and 75.6% cases in the test cohort were correctly classified. Overall, the highest percentage of classification accuracy was observed for healthy controls and SSc patients with healthy controls showing the best classification accuracy. The percentage of classification accuracy by the algorithms in the three groups is shown in table 4⇓. The area under the ROC based on the discriminant function analysis probability of a case belonging to a particular diagnostic category was healthy controls = 0.97; sarcoidosis = 0.96; SSc = 0.98.
We also evaluated which of the analytes contributed most to the accuracy in the generated predictive algorithms. We identified that six analytes, TNF-RII, IFN-γ, IP-10, RANTES, HGF and MIG, were the major contributors to the classification models with 78.9% of original grouped cases being correctly classified to their corresponding groups on these analytes alone. The ROC area under the curve for this model was healthy controls = 0.97, sarcoidosis = 0.90, SSc = 0.93.
We also wanted to explore whether, as a proof of principle due to the small number of cases, this approach could also be applied in differentiating between the presence or absence of lung fibrosis in SSc patients. 14 variables, IL-6, TNF-α, IFN-α, IL-17, IP-10, MIG, HGF, VEGF, TNF-RII, G-CSF, IFN-γ, MCP-1, RANTES and IL-5, classified with 100% accuracy those patients with or without lung fibrosis, with 90% of the cases correctly classified on cross-validation, although the small number of patients did not permit any further validation.
DISCUSSION
This study shows that a serum protein microarray has the capacity to distinguish healthy individuals from patients with two conditions: sarcoidosis and systemic sclerosis. In both diseases, respiratory complications are a major cause for morbidity and mortality. However, the extent of involvement is variable and markers that classify cases likely to progress or that may be candidates for immune directed therapy are needed.
Analysis of median levels of individual cytokines confirmed previously published studies showing upregulation of IL-1β, IL-6, IL-8, IL-10, IL-12p40, IL-15, IL-17, TNF-RI, TNF-RII, MIP-1α, MIP-1β, MCP-1, HGF and IP-10 in SSc patients 13–23 and upregulation of IL-8, TNF-RI, TNF-RII, IL-12p40, MIP-1α, MIP-1β, MIG and IP-10 in sarcoidosis 24–27. Novel findings, as far as we are aware, for other individual biomarkers included increased serum IFN-α, MIG, IP-10, DR5 and eotaxin levels in SSc patients and elevated serum levels of RANTES, DR5, EGF and HGF in sarcoidosis compared with healthy controls. For some proteins (notably VEGF in both diseases) and MCP-1 in sarcoidosis, we were unable to replicate previous reports 26, 28 and the reasons for these discrepant findings could include assay sensitivity and/or patient population choice. Another interesting observation was the variable complexity of correlations between mediators in the three groups. From the nodes/vertices representation, it is obvious that in SSc there is a much more complex correlation profile between mediators than in the other two groups.
Sarcoidosis and SSc share the features of immune activation and the development of pulmonary fibrosis. Despite the identification of increased inflammatory mediator levels in sarcoidosis and SSc, no single protein proved capable of discriminating between controls, sarcoidosis and SSc, which is consistent with similar observations in other interstitial lung disease studies highlighted in a recent review 29. This was mainly due to the fact that, even for those mediators in which the majority of the patients were above the 95th percentile of the normal range, there was considerable overlap between disease groups. Therefore, we investigated beyond the individual analyte level by exploring whether a combination of mediators was more informative in terms of clinical group differentiation. For this purpose we built a 17-analyte differentiation model using the ROC to gain insight into the decision-making ability of the model and examined the accuracy of correct predictions made by the model since AUC and accuracy provide different information 30. We observed that the combination of the 17-multimarker panel was able to distinguish, with 89.5% accuracy, between the three clinically distinct groups on the basis of sera inflammatory mediator levels alone. The robustness of this panel of analytes was verified by cross-validation and, separately, by the random selection of subsets of patients. Even with a reduction to six “core” analytes, a model with 78.9% clinical group classification accuracy was constructed. Interestingly, many of the analytes contributing to the models were not upregulated in a disease-specific manner. This raises the issue whether the common practice of restricting analyte selection only to those with demonstrable median levels differences is sound, especially since little additional discriminatory information is gained by selecting mediators that are up- or downregulated together. We also wanted to explore, as a proof of principle, whether the same approach could be used to discriminate in the SSc group between patient subsets with and without lung involvement. The observation, however tentative, that biological measurements distinguish between the presence and absence of lung fibrosis in SSc suggests that inflammatory biomarkers might be useful in assessing the development of lung fibrosis in this disease. These data, though tantalising, need confirmation requiring serial measurements in a large cohort and, it can be surmised, with the incorporation of other proteins (for example, transforming growth factor-β1, connective tissue growth factor and platelet-derived growth factor) that have been implicated in the pathology of this disease.
In conclusion, we show that serum protein profiles can distinguish between sarcoidosis, SSc and healthy controls. In addition, multiplex protein technology may discriminate between the presence and absence of pulmonary fibrosis in patients with SSc. Although this tool is more complex than the usual indices used to make a clear diagnosis, the identification of disease-specific mediator profiles carries the distinct possibility that changes in these profiles can be explored in the future as a means of tracking disease evolution, response to treatment or differentiation of phenotypically unusual or overlapping patterns. These findings, if they are corroborated by separate, prospective validation cohorts from other centres, have the potential for clinical application and may generate novel insights into the immune basis of respiratory disease. In future work, the range of proteins that can be detected is likely to be expanded and studies are warranted to evaluate the prognostic value of variations in disease-specific profiles.
Support statement
This work was supported by generous funding from the UK Raynaud’s and Scleroderma Association, the Arthritis Research Campaign and the Asmarley Charitable Trust.
Statement of interest
None declared.
- Received February 17, 2009.
- Accepted May 28, 2009.
- © ERS Journals Ltd