Abstract
Interstitial lung disease (ILD) is the leading cause of morbidity and mortality in systemic sclerosis (SSc). We performed a systematic review to characterise the use and validation of pulmonary function tests (PFTs) as surrogate markers for systemic sclerosis-associated interstitial lung disease (SSc-ILD) progression.
Five electronic databases were searched to identify all relevant studies. Included studies either used at least one PFT measure as a longitudinal outcome for SSc-ILD progression (i.e. outcome studies) and/or reported at least one classical measure of validity for the PFTs in SSc-ILD (i.e. validation studies).
This systematic review included 169 outcome studies and 50 validation studies. Diffusing capacity of the lung for carbon monoxide (DLCO) was cumulatively the most commonly used outcome until 2010 when it was surpassed by forced vital capacity (FVC). FVC (% predicted) was the primary endpoint in 70.4% of studies, compared to 11.3% for % predicted DLCO. Only five studies specifically aimed to validate the PFTs: two concluded that DLCO was the best measure of SSc-ILD extent, while the others did not favour any PFT. These studies also showed respectable validity measures for total lung capacity (TLC).
Despite the current preference for FVC, available evidence suggests that DLCO and TLC should not yet be discounted as potential surrogate markers for SSc-ILD progression.
Abstract
Despite the popularity of FVC, available evidence does not favour it as the best PFT surrogate marker for SSc-ILD http://ow.ly/l8vZ30j6kmj
Introduction
Systemic sclerosis (SSc) is a chronic and progressive autoimmune disease involving a complex interplay of microvasculopathy, disturbances in fibroblastic function and abnormalities of the immune system [1, 2]. In addition to disfiguring skin involvement, SSc patients can suffer from extensive internal organ damage, including interstitial lung disease (ILD) [3].
Systemic sclerosis-associated interstitial lung disease (SSc-ILD) is the leading cause of morbidity and mortality in SSc [4] and is estimated to occur in over 50% of patients [5]. The prognosis of patients with SSc-ILD can be poor and it is estimated that approximately 15% of patients will experience rapidly progressive ILD [4, 6]. For these reasons, the presence and progression of ILD are routinely monitored using pulmonary function tests (PFTs).
Since SSc-ILD was first described in 1949 [7], various PFT measures have been used to characterise its progression. However, forced vital capacity (FVC) has become the preferred surrogate marker for SSc-ILD despite the paucity of validation studies. To better understand the rationale behind FVC as the preferred outcome measure for SSc-ILD, we performed a systematic review of the literature aiming to outline the historical use and validation of PFT measures as surrogate markers for SSc-ILD progression.
The first objective was to determine the frequency at which the different PFT measures were used as outcomes in the study of SSc-ILD onset and progression. We then aimed to summarise the results of studies that validated PFT measures against either high-resolution computed tomography (HRCT) or lung biopsy results in SSc patients. The results of this systematic review would thus describe not only changing practices in the use of PFTs over time in SSc-ILD, but also assess the extent to which these trends were supported by available evidence.
Material and methods
The protocol for this review was registered with the Prospero international prospective register of systematic reviews (CRD42016039565). Institutional review board (IRB) approval was not required.
Eligibility criteria
Only studies published in English or in French were considered. To be as inclusive and comprehensive as possible all original research was eligible for inclusion, including published conference abstracts and clinical trial registrations. However, reviews, summary articles, case reports, commentaries, letters and editorials were excluded. No restrictions were placed on study design.
Eligible studies had to include patients with a diagnosis of SSc and, furthermore, SSc-ILD had to be the focus of the study. All included studies either 1) used at least one PFT measure as a longitudinal outcome for SSc-ILD; and/or 2) reported at least one classical measure of validity for the association between PFTs and either HRCT findings or lung biopsy results in SSc. The former studies are referred to hereafter as outcome studies and the latter as validation studies. Finally, a minimum of 20 SSc patients was required for inclusion in the review.
Information sources and search strategy
The MEDLINE (PubMed), Embase (Ovid), and Web of Science databases were searched on July 05, 2016 to identify all potentially relevant studies since January 1949. ClinicalTrials.gov and the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) were also searched for clinical trial registrations. Additionally, the reference lists of all studies identified via the electronic searches and adhering to the inclusion criteria were manually curated for further potentially eligible studies.
The search strategy, designed in consultation with a professional librarian, consisted of three main search terms: 1) systemic sclerosis; 2) interstitial lung disease; and 3) pulmonary function test. The detailed search strategy for the MEDLINE database is available in the supplementary material (e-Appendix 1).
Study selection
The title and abstract of each article were assessed independently for potential eligibility by two reviewers. Articles were only excluded if both reviewers came to this decision unanimously. Next, the reviewers performed a separate full-text assessment of the remaining articles to confirm their inclusion. Any disagreements were resolved by consensus. The kappa statistic was calculated at both stages of the screening process to assess inter-reviewer agreement.
Data collection
Data from the included studies was extracted using a standardised, pre-piloted form. Data extracted from the outcome studies included the PFT measures used as outcomes for SSc-ILD progression, as well as the reasoning provided for these choices. Data collected from the validation studies included the chosen clinical reference standard, the HRCT scoring system utilised (if applicable) and the reported measures of validity.
Synthesis of results
For the outcome studies, the cumulative use over time of the different PFT measures was depicted graphically. The reasoning provided for the choice in primary PFT outcome measure was also summarised. For all analyses, all variations of diffusing capacity of the lung for carbon monoxide (DLCO), including DLCO adjusted for alveolar volume (VA) and DLCO corrected for haemoglobin (Hb), were grouped together into one DLCO measure.
Among the validation studies, we identified those whose primary goal was specifically to validate the different PFT measures in SSc-ILD. We focused explicitly on these validation studies and summarised their reported measures of validity in Forest plots. Given the variability in study population and ILD scoring method, no attempts were made to subject the data to meta-analysis. The results of the remaining validation studies, whose aim was not specifically to validate PFT measures, are presented in the supplementary material (e-Appendix 6).
Risk of bias
The quality of the outcome studies was not assessed, as the purpose of this review was not to summarise measures of effect. The quality of the studies whose specific aim was to validate PFTs in SSc-ILD was evaluated using the quality assessment of diagnostic accuracy studies (QUADAS-2) tool [8]. Publication bias was not assessed as there are, to our knowledge, no available methods to do so in the context of screening/diagnostic test accuracy.
Results
An adaptation of the preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram [9] outlining the study selection process is available in figure 1. The detailed search strategy retrieved 3258 records. Following removal of duplicates, the titles and abstracts of 2056 records were screened to assess eligibility. Both co-reviewers independently agreed on the exclusion of 1415 records, with an overall percent agreement of 0.91 and a kappa statistic of 0.78. To confirm inclusion, a full-text assessment was performed on the remaining 641 records of which 384 were excluded. The overall percent agreement and kappa statistic for the secondary screening were 0.85 and 0.69, respectively. The reasons for exclusion at both the primary and secondary screening stages are presented in the supplementary material (e-Appendix 2)
Following the screening process, an additional 55 abstracts and clinical trial registrations were excluded since they matched subsequently published full-text articles. Finally, the reference lists of the 202 included studies were manually searched for further potentially relevant studies. This resulted in the addition of ten records, for a total of 212 studies included in this systematic review. Of the 212 included records, 169 qualified as outcome studies [10–178] while 50 were listed as validation studies [45, 73, 109, 146, 153, 159, 167, 179–221]. Seven records satisfied the inclusion criteria for both outcome and validation studies [45, 73, 109, 146, 153, 159, 167].
Outcome study results
The relevant study characteristics of the 169 outcome studies are summarised in the supplementary material (e-Appendix 3). The earliest longitudinal study using PFTs as outcomes for SSc-ILD progression to be identified was published in 1982 [10]. For most of the study period, DLCO was cumulatively the most commonly used PFT outcome, followed closely by FVC. In 2010, FVC surpassed DLCO as the most often used PFT (figure 2a). Similarly, the percent cumulative use of DLCO (i.e. the yearly ratio of the cumulative count of a given PFT to the cumulative number of published articles) gradually decreased over time in favour of an increase in the percent cumulative use of FVC. These trends were especially prominent after 2005 (figure 2b). Interesting trends in the use of absolute and % predicted PFT values (the latter denoted by a % symbol after the PFT abbreviation) were also observed and are further described in the supplementary material (e-Appendix 4).
Among the 169 outcome studies, 71 clearly specified a single primary PFT endpoint for SSc-ILD progression. While % predicted DLCO comprised only a small proportion of primary PFT endpoints overall (11.3%), figure 3 reveals that it was a relatively important main outcome in the 1980s and 1990s, only to be surpassed by % predicted FVC in the mid-2000s (which accounted for 70.4% of studies overall). Of these 71 studies, 42 (59.2%) did not report a reason to support their choice in outcome, 20 (28.2%) alluded to the main PFT measure's use in previous SSc-ILD studies, five (7.0%) stated that the chosen PFT had been previously validated, while four (5.6%) reported using it either due to its high sensitivity or specificity.
Interestingly, of the five studies that cited prior validation to justify their choice of PFT (% predicted FVC in four cases [69, 111, 114, 149] and % predicted DLCO in one) [33], only the study using % predicted DLCO referenced an article with reported measures of validity. The four studies which chose their main PFT outcome measure based on its high sensitivity (in the case of DLCO) [11, 14] or its high specificity (in the case of FVC) [128, 169] either did not provide a citation to support this claim [169], referred to studies performed outside the field of SSc-ILD [11, 14, 128], or cited SSc studies with fewer than 20 patients [11, 14].
Validation study results
The relevant characteristics of the 50 validation studies and their measures of validity are summarised in the supplementary material (e-Appendix 5 and e-Appendix 6, respectively). Among these studies, only five had the clear objective of validating the different PFT measures in SSc-ILD [159, 180, 210, 213, 221]. Ideally, identifying the best surrogate marker for SSc-ILD progression should involve the comparison of longitudinal variations in PFT measures with concurrent variations in either HRCT scores or lung biopsy results. This would ensure that the surrogate marker changes along with the reference standard. However, all five of these validation studies performed cross-sectional assessments of PFT results against HRCT-assessed SSc-ILD severity.
Three of the five studies correlated PFT values with HRCT scores [180, 210, 221] and Forest plots of the correlation coefficients are available in figure 4. The first of these studies, published in 1997, concluded that DLCO was the best measure of SSc-ILD extent (figure 4a) [180]. The authors of the second study, published in 2013, reported that PFTs alone may not be sufficient to identify cases of SSc-ILD (figure 4b) [210]. Finally, the third study, published in 2016, concluded in favour of % predicted DLCO being the best surrogate marker for SSc-ILD extent at any one point in time (figure 4c) [221]. The remaining two studies to have validated PFT measures reported measures of sensitivity and specificity (figure 5) [159, 213]. Neither study concluded in favour of any specific PFT. Rather, they both warned against using only PFTs to identify ILD in SSc patients and suggested evaluating more inclusive screening tools. The quality of these five validation studies is described in the supplementary material (e-Appendix 7). All studies exhibited some risk of bias, generally due to the potential for verification bias. Further issues included a lack of information about the timing of the PFTs and HRCT scans and about whether proper blinding was performed.
Of the remaining 45 studies, more than half aimed to validate specific HRCT scoring systems using PFTs as reference standards. Thus, despite HRCT being the gold-standard for assessing SSc-ILD presence, no consensus exists on the best way in which to score SSc-ILD severity. This explains the large variety of different HRCT scoring systems used in SSc-ILD and renders it challenging to summarise results. It is also worth noting that only one study correlated PFTs with lung biopsy results [179] while six reported longitudinal correlations between PFTs and HRCT scores [73, 167, 183, 192, 197, 216]. While no formal synthesising analysis was performed on these studies, their results do not appear to overwhelmingly support the superiority of any particular PFT (supplementary material e-Appendix 6).
Discussion
We performed a comprehensive systematic review of the literature to explore the use of PFTs as surrogate markers for SSc-ILD onset and progression. This review confirmed the predominant selection of FVC as a primary endpoint in recent longitudinal SSc-ILD studies, despite a paucity of evidence from validation studies that FVC has better performance characteristics than other PFT measures.
Our results show that both FVC and DLCO were historically used to monitor SSc-ILD progression. However, given that many authors did not report why they chose a given PFT measure as an outcome, it is difficult to fully understand the recent shift in preference to FVC. A possible explanation is the widespread use of FVC as a therapeutic endpoint in a comparable disease, idiopathic pulmonary fibrosis (IPF) [222, 223], especially given the lack of connective tissue disease (CTD)-ILD trials. However, a lack of consensus still surrounds FVC as the best outcome measure for this disease [224, 225]. Furthermore, IPF is characterised by usual interstitial pneumonia, a pathologically different subtype of ILD than the non-specific interstitial pneumonia pattern commonly observed in SSc-ILD [226].
Here we suggest two additional reasons which may explain FVC's current popularity in SSc, as supported by the results of this systematic review. First, given that DLCO can be confounded by the presence of pulmonary arterial hypertension (PAH), FVC is believed to be more specific than DLCO to ILD [227, 228]. However, evidence suggests that DLCO may be more sensitive than FVC [11, 180, 221]. While the appropriate trade-off between sensitivity and specificity is debatable, it is worth noting that a lower specificity due to confounding by PAH can be corrected for in analyses by using simple epidemiological techniques such as restriction and adjustment. In addition to being less specific to ILD, DLCO is also regarded as being inherently more variable than FVC given its dependence on the measurement of both the transfer coefficient of the lung for carbon monoxide (KCO) and VA. Indeed, DLCO is often not chosen as a measure of SSc-ILD progression because it is considered to not be as reproducible as FVC. However, it is worth noting that variability in FVC can be equally concerning. In fact, a recent study calculated FVC's minimal clinically important difference in SSc-ILD to be well within the range of measurement error at the individual level [229].
Secondly, we suggest that the pivotal Scleroderma Lung Study I (SLS I) also played an important role in the sudden decrease in use of DLCO in favour of FVC. SLS I was the first double-blind, randomised, placebo-controlled trial to study the effect of oral cyclophosphamide on SSc-ILD. The baseline-adjusted mean absolute 12-month difference in FVC, the study's primary endpoint, was 2.53% (0.28–4.79%), favouring cyclophosphamide. This improvement persisted at 24 months, at which point the mean absolute difference in FVC was 1.95% (1.2–2.6%). Secondary outcomes included DLCO and total lung capacity (TLC) with adjusted mean 12-month differences of −1.04% (p=0.43) and 4.09% (0.49–7.65%), respectively [50]. Despite the slight variability in these results and the modest treatment effects observed, FVC was subsequently described as having been validated as an endpoint in randomised controlled trials [230]. In fact, the quality of FVC as a measure for SSc-ILD is often erroneously assessed by its observed treatment effect, rather than by proper validation techniques against a gold-standard. Interestingly, our analyses revealed a steep increase in the use of FVC after the 2006 publication of SLS I, as well as a steep decrease in the use of DLCO following years of relative stability (figure 2b).
It is likely that FVC's specificity for ILD and perceived validity led to its current acceptance as the best surrogate marker for ILD within the SSc community. However, we suggest that this preference is not yet fully warranted given the results of studies which validated PFT measures against HRCT-assessed SSc-ILD severity. In fact, these five studies revealed that FVC does not have conclusively better performance characteristics than other PFT measures. Moreover, two of these studies suggested that DLCO best captured SSc-ILD severity [180, 221].
Also noteworthy was TLC's performance, which overlapped considerably with those of FVC and DLCO (figures 4, 5). Despite these results and the fact that the American Thoracic Society (ATS) defines restrictive lung diseases such as ILD by a reduction in TLC in the presence of a preserved or elevated forced expiratory volume in 1 s (FEV1)/FVC ratio [231], TLC has rarely been used as an outcome in SSc-ILD (figure 2) and has never been used as a main outcome (figure 3).
Ultimately, the best surrogate marker for SSc-ILD onset and progression may be a composite outcome consisting of a combination of two or more PFT measures. In fact, composite outcomes were used in several studies included in this systematic review. The most common combinations included decreases in FVC or DLCO of ≥10% [45, 55, 67, 95, 143], a decline of ≥10% in FVC or 15% in DLCO [46, 54, 83–84, 94, 101, 109–110, 140–141] and decreases in FVC or DLCO of ≥15% [20–21, 76, 117, 125, 133]. Few studies used an FVC–TLC composite endpoint [12, 93, 105] and only one study jointly considered FVC, TLC and DLCO [58]. Nevertheless, the validity of such endpoints remains to be fully evaluated, as only cross-sectional validation studies have been performed so far. These studies found composite outcomes consisting of FVC and DLCO, as well as FVC, DLCO and TLC, to have high sensitivity but low specificity (figure 5).
This systematic review highlights the need to continue to focus efforts on identifying the best sole or composite PFT surrogate marker for SSc-ILD. Presently, the widespread use of FVC has translated into the identification of few predictors of SSc-ILD progression and into attenuated treatment effects in SSc-ILD randomised controlled trials [48, 50, 232], suggesting that FVC may only weakly reflect the extent of SSc-ILD. In fact, it is possible that FVC may also be affected by the cutaneous involvement of the chest wall in SSc [233]. Following a recent consensus exercise, the Outcome Measures in Rheumatology (OMERACT) CTD-ILD working group agreed to and proposed a clinically meaningful outcome in clinical trials of CTD-ILD (a ≥10% decline in FVC or a ≥5% to <10% decline in FVC in the presence of a ≥15% decline in DLCO). However, they also recognised that these measures remain to be validated [234]. Indeed, thorough PFT validation studies are needed to better assess the validity of different PFT measures. In particular, longitudinal studies should be carried out to explore the association between concurrent changes in PFTs and clinical reference standards. Future studies should also build upon recent research aiming to develop composite measures of SSc-ILD severity (e.g. measures of lung physiology, disease manifestations and patient-reported outcomes [162]). This will ultimately strengthen the quality of both clinical trials and epidemiological studies, as well as their results and assist in clinical decision-making.
Our systematic review is not without certain limitations. First and foremost, we did not include studies which validated PFT measures against mortality given the already large scope of this review. Such studies are also integral in identifying the best PFT surrogate marker for SSc-ILD progression. Despite this exclusion, our results are complemented by those of a 2010 systematic review which found that DLCO was more consistently associated with mortality in SSc-ILD than FVC [235]. Since the publication of this review, further evidence has been amassed favouring the utility of DLCO. Indeed, one recent study found a decline in DLCO and KCO to be the most predictive of adverse outcomes, including death [236]. Likewise, a second recent study suggested that 1-year trends in an FVC and DLCO composite endpoint and 2-year trends in measures of gas transfer were most predictive of mortality [237].
Additional limitations include the decision to restrict validation studies to those with classical measures of validity and to not account for studies which reported regression coefficients and p-values. While this translated into a loss of information, it permitted a better focus on higher quality validation data and allowed for standardisation across included studies. Finally, while there is potential for publication bias, we believe that the nature of our research question, in addition to the large number of included studies and the extensive searching of multiple databases, alleviates this concern.
In summary, available evidence does not overwhelmingly favour one PFT measure as the best surrogate marker for SSc-ILD, rendering it challenging to support the current preference for FVC. Indeed, the perceived superiority of FVC is not reflected in rigorous PFT validation studies. While FVC has the potential to be a viable surrogate marker for SSc-ILD, it would be ill-advised at this stage to discount other potentially interesting PFT measures, such as DLCO and TLC.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary Material ERR-0102-2017_Supplementary_Material
Acknowledgements
The authors would like to thank the McGill University (Montreal, QC, Canada) epidemiology liaison librarian Genevieve Gore for her assistance in designing the detailed search strategy.
Footnotes
This article has supplementary material available from err.ersjournals.com
Provenance: Submitted article, peer reviewed.
Conflict of interest: M. Caron reports grants from the Fonds de Recherche du Québec (Santé PhD Studentship) and from the Canadian Institutes of Health Research (GSD–146268) during the conduct of the study. S. Hoa reports grants from the Université de Montréal Rheumatology Clinical Fellowship Program (Abbvie educational grant) and from the Arthritis Society's Postdoctoral Fellowship Award, during the conduct of the study.
- Received September 5, 2017.
- Accepted February 24, 2018.
- Copyright ©ERS 2018.
ERR articles are open access and distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0.