Abstract
Objective: This COnsensus-based Standards for the selection of health measurement INstruments (COSMIN)-based systematic review aims to identify and summarise the quality of measurement properties of dyspnoea-specific patient-reported outcome measures (PROMs) for patients with interstitial lung disease (ILD), pulmonary hypertension (PH) or connective tissue diseases (CTDs).
Methods and results: Relevant articles in PubMed and Embase were screened. Based on COSMIN analysis and the Grading of Recommendations, Assessment, Development and Evaluation approach, overall rating and level of evidence were assessed to formulate recommendations. We identified 26 publications on 10 PROMs. For patients with ILD, including CTD-associated ILD, nine PROMs were evaluated, of which the Dyspnea-12 (D12), EXACT-Respiratory Symptoms Idiopathic Pulmonary Fibrosis Breathlessness subscale (ERS-IPF-B), King's Brief Interstitial Lung Disease Health Status Questionnaire breathlessness and activities subscale (KBILD-B) and the University of California San Diego Shortness of Breath Questionnaire (UCSD-SOBQ) had high-quality evidence for sufficient internal consistency, without high-quality evidence of insufficient measurement properties. We reached this same conclusion regarding the D12 for use in patients with PH, including CTD-associated PH. Most PROMs in this systematic review have moderate- or low-quality evidence on construct validity and responsiveness.
Conclusion: Four dyspnoea-specific PROMs, D12, ERS-IPF-B, KBILD-B and UCSD-SOBQ, can be recommended for use in patients with ILD, including CTD-associated ILD. Of these four, the D12, despite the limited evidence and the lack of evidence on several important domains, is also suitable for use in patients with PH, including CTD-associated PH.
Abstract
Currently, four dyspnoea-specific patient-reported outcomes can be recommended for use in ILD patients (including CTD-associated ILD): D12, ERS-IPF-B, KBILD-B and UCSD-SOBQ. The use of D12 is also suitable for patients with PH, including CTD-associated PH. https://bit.ly/3EsWSb1
Introduction
Dyspnoea is a cardinal and common symptom in patients with interstitial lung disease (ILD) or pulmonary hypertension (PH), both of which are lethal and frequent complications of connective tissue diseases (CTDs) [1, 2]. In these conditions, dyspnoea has high self-reported disability and causes high disease burden [3, 4].
Patients with ILD, PH or CTDs and their complications are usually monitored by a multidisciplinary team in expert centres that employ guideline-recommended evaluation. Various parameters, including clinical signs and symptoms such as dyspnoea, as well as multiple diagnostic tests are used to evaluate functional capacity, exercise capacity and heart and lung function. This approach is used to evaluate treatment and for the early detection of clinical deterioration, which enables clinicians to tailor the treatment regimen to the individual patient.
Because the course of ILD, PH and CTDs associated with these complications is highly heterogeneous, there is a need to further improve the monitoring of key symptoms such as dyspnoea. A possible option is the use of patient-reported outcome measures (PROMs). PROMs are increasingly recognised as important additional tools for assessing disease severity and response to therapy [5]. The use of a well-validated PROM on dyspnoea could help to evaluate this vital symptom in patients in a structured way, and could possibly be of aid in home-monitoring settings [6, 7].
Furthermore, it has been suggested that dyspnoea be part of the core set for outcome measures in trials for CTD-associated ILD and CTD-associated pulmonary arterial hypertension (PAH), as reported in a consensus statement published by the CTD-ILD working groups of Outcome Measures in Rheumatology (OMERACT) and Outcome Measures in Pulmonary Arterial Hypertension related to Systemic Sclerosis (EPOSS), respectively [8–11]. However, an overview of PROMs and their measurement properties used in literature to assess dyspnoea in patients with ILD, PH or CTDs is lacking to date. Such an overview would support choices for the use of outcome measures both in clinical practice and research.
This review aims to assess measurement properties of self-completed PROMs assessing dyspnoea applied in patients with a diagnosis of ILD or PH, according to the recently updated COnsensus-based Standards for the selection of health Measurement in INstruments (COSMIN) guidelines [12]. With the outcome of this systematic review, we aim to provide clear and evidence-based recommendations for the most suitable available self-reported PROM instrument on dyspnoea for these patients, and to identify gaps in knowledge on the measurement properties.
Methods
Prior to identification of articles and data extraction, the protocol of this systematic review was registered with PROSPERO, the international prospective register of systematic reviews (registration no. CRD42020150519). Furthermore this review follows the COSMIN guidelines for systematic reviews of patient-reported outcomes [12], and adheres to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 statement [13].
We first performed an exploratory literature search without date limits in both PubMed and Embase in July 2019. This literature search was assisted by our medical librarian and was designed to identify relevant PROMs, with an English version available, on dyspnoea in our target population. A PROM was included as possibly relevant if no help or specific instruction in a supervised setting was necessary and if it was applied in adult patients with CTD, ILD or PH. In total, our review team agreed upon 20 possibly relevant PROMs regarding dyspnoea in our population of interest for use in the systematic literature search. The exact search terms, search results and a list of these 20 PROMs is included in supplementary file S1.
Literature search strategy; identification of studies on measurement properties of selected PROMs
A systematic literature search without date limits was performed in both PubMed and Embase, undertaken in July 2019 and last updated in March 2022 assisted by our medical librarian. This search was constructed to identify all studies that assessed measurement properties of the 20 selected PROMs. This search included three overall concepts: 1) all identified possibly relevant PROMs and their synonyms; 2) dyspnoea and 3) measurement properties. We used a previously designed, highly sensitive filter for finding studies on measurement properties of measurement instruments for our search in PubMed [14]. For Embase the same highly sensitive filter was adapted to the applicable format with the help of our librarian, and was comparable to the adapted highly sensitive PubMed filter for Embase described by Terwee et al. [14]. The exact terms and complete search strategy are included in supplementary file S2.
Eligibility criteria
Published original full-text publications were considered eligible for inclusion if they aimed to assess measurement properties of one or more of the selected PROMs regarding dyspnoea in adult patients with CTD, ILD or PH. PROMs with identified studies on their measurement properties were evaluated in full length (item by item) by the review team. Content validity (face validity) for each PROM with retrieved publications on at least one measurement property was discussed in a consensus meeting. We deemed a PROM relevant for assessing dyspnoea in our target population when ≥50% of the items of the PROM or subscale of PROM reflected the construct “dyspnoea”, and the PROM was self-reported (i.e. no help or specific instruction in a supervised setting was necessary). Only studies including ≥50% of patients with CTD, ILD or PH, or reporting these patients' subgroup results separately, were included. We excluded publications that only used the identified PROMs as an isolated outcome measurement instrument, e.g. in randomised controlled trials or publications in which the PROM was used in a validation study of another instrument (e.g. PROMs not included in our review). Furthermore we excluded publications in any other languages than English, unpublished material, conference abstracts, poster presentations, case reports, editorials, letters or reviews.
Data extraction
Identified publications were imported in Endnote (Clarivate, Philadelphia, PA, USA) and after removal of duplicates publications were entered into the online reviewing tool Rayyan (https://www.rayyan.ai/). Inclusion and exclusion criteria were independently applied by the same two reviewers (JL and CE) to titles and abstracts of the hits retrieved. In cases of discrepancy between the two reviewers, articles were read in full to decide on eligibility for inclusion. In case of disagreement, a third independent reviewer (MV) was asked to decide on possible inclusion. Furthermore, all references of the included publications were checked to search for additional studies. This resulted in the final list of publications for inclusion in the review.
Data on the included publications and the included PROMs were extracted by two authors independently (JL and CE). Study characteristics included patient age, gender, disease, disease duration, language of PROM used, country of study and study setting. PROM characteristics (e.g. items, scoring, feasibility and availability) were extracted from the questionnaires, background literature or user manuals. Data concerning instrument description and validation were collected using forms based on those suggested in the COSMIN user manual [15].
Measurement properties
According to the COSMIN methodology, eight measurement properties need to be assessed in order to determine the quality of a PROM, summarised in supplementary file S3, table A. However, because no gold standard exists for dyspnoea, criterion validity was not evaluated in our systematic review. For hypotheses testing for construct validity and responsiveness, we formulated a set of hypotheses about the expected direction and magnitude of the correlations between the PROM of interest and other PROMs, other constructs and other outcome measures, and of mean differences in scores between groups (supplementary file S3, table B).
Evaluation of measurement properties and grading of quality of evidence
The following four steps were completed. Of note: one publication can contain multiple studies on different measurement properties; therefore, each evaluation of a measurement property is indicated as “single study”.
To evaluate the methodological quality, each single study on a measurement property was assessed and rated using the COSMIN risk of bias checklist individually by two raters independently [16]. Studies were graded as “very good”, “adequate”, “doubtful”, “inadequate” or “not assessed” for the methodology used to assess each measurement property. In this checklist for each measurement property, a different set of standards is evaluated and a “worst score counts” principal is applied.
Based on the results of each single study, the performance of the PROM on a measurement property was rated against the updated criteria for good measurement properties, rated as “sufficient”, “insufficient” or “indeterminate” [15]. These criteria are summarised in supplementary file S4.
Individual ratings from step 2 for each of the included studies were pooled to provide an overall quality assessment per measurement property of each PROM. When pooling results from multiple studies, a measurement property of a PROM was deemed to have sufficient or insufficient overall quality if ≥75% of individual studies were graded as “sufficient” or “insufficient”, respectively; inconsistent quality if <75% of the results were graded as either “sufficient” or “insufficient”; and indeterminate quality if all individual studies were rated “indeterminate”.
The evidence was summarised per measurement property, per PROM in a summary of findings table. The overall quality of evidence per PROM per measurement property (taking all data of included studies in account) was graded based on the Grading of Recommendations Assessment and Evaluation (GRADE) approach for systematic reviews and clinical trials [17]. In this approach the quality of evidence was determined as “ high”, “moderate”, “ low” or “very low”, and subsequently downgraded for 1) the risk of bias, 2) inconsistency, 3) imprecision or 4) indirect results (supplementary file S5) based on the consensus of two reviewers (CE and JL).
To come to a final recommendation, PROMs were categorised as level A, B or C, as follows:
PROMs with at least low-quality evidence for sufficient internal consistency. Level A instruments are recommended for use and results obtained with these PROMs can be trusted.
PROMs not categorised as A or C. These PROMs have potential to be recommended for use, but require further research to assess their quality.
PROMs with high-quality evidence for an insufficient measurement property. PROMs categorised as C should not be recommended for further use.
Results
We identified 20 possibly relevant PROMs on dyspnoea in our target population, summarised in supplementary file S1. After careful consideration we decided to exclude publications evaluating measurement properties of the Breathlessness Beliefs Questionnaire, Cambridge Pulmonary Hypertension Outcome Review, Chronic Respiratory Questionnaire and St George's Respiratory Questionnaire because they are not dyspnoea specific; <50% of the included questions addressed dyspnoea or no separate dyspnoea subscale was present/evaluated. The search strategy did not yield publications on measurement properties of six of the 20 initially identified PROMs in our patient population: the Numerical Rating Scale, London Chest Activity of Daily Living scale, Scleroderma Burden Index, Pulmonary Functional Status and Dyspnea Questionnaire-modified, Idiopathic Pulmonary Fibrosis PROM-Dyspnoea and the Multidimensional Dyspnea Profile. As a result we assessed 26 publications evaluating 10 PROMs assessing dyspnoea in patients with CTD, ILD or PH [18–43]. A summary of the literature search results is presented in figure 1.
Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 adapted flowchart. #: n=4 patient-reported outcome measures (PROMs) excluded in screening phase, after content evaluation of PROM. ILD: interstitial lung disease; PH: pulmonary hypertension; CTD: connective tissue disease.
Of the 10 evaluated PROMS, the Dypsnoea-12 (D12) was initially developed in patients with ILD [44], EXACT-Respiratory Symptoms Idiopathic Pulmonary Fibrosis (ERS-IPF) was developed for patients with IPF, and the King's Brief Interstitial Lung Disease Health Status Questionnaire (KBILD) was developed for patients with ILD, including IPF and CTD-associated ILD [18, 22]. None of the included PROMs were specifically developed for patients with PH.
PROM summary and feasibility
A summary of the PROM characteristics is presented in table 1. In total we evaluated eight PROMs and two breathlessness subscales (the KBILD breathlessness and activities subscale (KBILD-B) and the ERS-IPF breathlessness subscale (ERS-IPF-B)). Included PROMs evaluated dyspnoea ranging from the day of evaluation up to the last 2 weeks. Most PROMs are easy to use and complete, as documented in the retrieved publications. Translated versions other than English language are available for the following PROMs: D12 (six translations [50–53]), Functional Assessment of Chronic Illness Therapy-Dyspnea (FACIT-D) (>10 translations [54]), KBILD (>10 translations [55]), Medical Research Council dyspnoea scale (MRC) (12 translations [47]), oxygen cost diagram (OCD) (one translation in Spanish [56]), Patient-Reported Outcomes Measurement Information System-Dyspnea Severity short form (PROMIS-DS) (>10 available translations [54]) and University of California San Diego Shortness of Breath Questionnaire (UCSD-SOBQ) (>10 translations [57]).
Characteristics of included PROMs
The MRC is free for research and commercial use with appropriate credit; use of the ERS-IPF, KBILD, UCSD-SOBQ, PROMIS-DS and FACIT-DS is dependent upon the setting (free for use in academic or clinical setting, fees applicable in case of commercial use). No information on user fees is reported for the D12; the questionnaire is provided in the appendix of the original article. For OCD, the New Dypsnea Scale (NS) and the visual analogue scale-dyspnoea (VAS-D), no information is available.
Study summary
An overview of the included publications evaluating measurement properties is provided in table 2. We identified 26 publications, describing 27 separate patient samples. Studies were performed in 26 different countries. Language of PROMs used was not explicitly stated in 15 of the included publications, but most frequently involved English. Patient numbers included in individual studies varied between 11 and 1933.
Characteristics of included publications
Most of the studies included patients with ILD (including CTD-associated ILD) or IPF (22 publications on a total of nine PROMs, n=4088 patients). Two publications included analyses of PAH patients; one publication evaluated the D12 in two PAH patient samples (n=201, of which n=41 had CTD-associated PH) [21], while the other evaluated the UCSD-SOBQ and VAS for use in systemic sclerosis (SSc) including associated PAH (53 of 179 patients had associated PH) [38]. SSc was the only CTD without associated ILD or PH in which measurement properties of a dyspnoea-specific PROM were reported. Two publications described performance of the FACIT-D PROM in patients with SSc without associated PH or ILD (n=173) [30, 39].
Risk of bias assessment; methodological quality
The results of the individual risk of bias in each single study per measurement property per PROM are summarised in supplementary files S6, S7 and S8. Studies were mostly graded to be very good or adequate for the methodological quality used to assess measurement properties, except for seven of 69 single studies. Four studies had doubtful/inadequate methodological quality on reliability because of a high risk of recall bias [21, 22, 28, 29]. Two studies had doubtful methodological quality for hypothesis testing on construct validity due to the inclusion of a low number of patients [33] or removal of outliers without specification [37]. Lastly, one study had doubtful methodological quality on responsiveness due to use of an inadequate statistical method [26].
Measurement property assessment and level of evidence
The individual ratings for reported results of measurement properties in each study and overall GRADE levels of evidence per measurement property after pooling of results per PROM are provided in supplementary files S6, S7 and S8. No studies evaluated structural validity or cross-cultural validity/measurement variance. Only one study assessed measurement error of the UCSD-SOBQ [28]. For construct validity and responsiveness, after pooling of individual study results, the overall ratings were either inconclusive or insufficient for all PROMs evaluated.
Summary of findings
A summary of findings per measurement property per PROM is provided in tables 3–5. According to the COSMIN methodology, PROMs with any level of content validity and at least low-quality evidence on internal consistency can be recommended for use when no insufficient measurement properties with high-quality evidence are present.
Summary of pooled findings on the quality of evidence on measurement properties of included PROMs in ILD, including CTD-associated ILD
Summary of pooled findings on the quality of evidence on measurement properties of included PROMs in PH, including CTD-associated PH
Summary of pooled findings on the rating and quality of evidence on measurement properties of included PROMS in CTD (SSc patients only)
Therefore D12, ERS-IPF-B, KBILD-B and UCSD-SOBQ can be categorised as a level A recommendations, indicating that these PROMs, or subscales respectively, can be recommended for evaluating dyspnoea in patients with ILD including CTD-associated ILD, and that results obtained with these PROMs can be trusted. We reached this same conclusion regarding the D12 for use in patients with PH, including CTD-associated PH.
All other PROMs included in this review, for which information on internal consistency is lacking, receive a level B recommendation, indicating that they have potential to be recommended for use, but require further research to assess their quality.
Lastly it must be noted that none of the included PROMs in this review have sufficient ratings for all evaluated measurement properties.
Recommendation
Of the PROMs with a level A recommendation in patients with ILD (including CTD-associated ILD), KBILD-B is the only PROM with high-quality evidence for reliability; the level of evidence for reliability of D12 is moderate, and for ERS-IPF-B and UCSD-SOBQ is very low. All four PROMs have moderate-quality evidence on construct validity and responsiveness. Therefore, based on available evidence and despite limitations, given that evidence on other important measurement properties is lacking, the best PROM on dyspnoea in patients with ILD (including CTD-associated ILD) is the KBILD-B.
The D12 is the only included dyspnoea-specific PROM with a level A recommendation for use in patients with PH (including CTD-associated PH).
Discussion
To our knowledge, this is the first study that systematically reviews measurement properties of dyspnoea-specific PROMs in patients with ILD, PH or CTD using the COSMIN methodology.
We identified 10 relevant PROMs with information on at least one measurement property for this target population. Based on current evidence, four of these receive a level A recommendation using the COSMIN criteria; therefore, ERS-IPF-B, KBILD-B and UCSD-SOBQ can be recommended for use in ILD patients, and D12 in both ILD and PH patients, with or without associated CTD.
This study highlights gaps in current knowledge relating to currently used PROMs and the quality assessment of commonly used dyspnoea PROMs in these populations. The results of our systematic literature search demonstrated that for six of the initially identified possibly relevant PROMs, no studies could be found that examined their measurement properties, indicating that these PROMs cannot be recommended for the evaluation of dyspnoea in this population. Furthermore, among the PROMs evaluated, no studies adequately assessed the structural validity or cross-cultural validity/measurement invariance and only one study, with inadequate methodology and a low level of evidence, evaluated measurement error. Additionally, most of the included PROMs in this systematic review had inconsistent scores and moderate/low-quality evidence for construct validity and responsiveness. These essential properties need further exploration to strengthen future recommendations. Lastly, this systematic review indicates that the majority of the studies identified in our search included patients with ILD, with or without CTDs, and very few studies included patients with PH. For patients with CTD without associated ILD or PH, only the FACIT-D was evaluated in patients with SSc. Currently only low-quality evidence on construct validity and responsiveness is available and information on internal consistency was not evaluated for this PROM in these patients. We conclude that this PROM shows potential for use in patients with SSc; however, further research is necessary to be able to make an evidence-based recommendation.
Our study has two major strengths. First, we performed an extensive literature search, assisted by an experienced medical librarian based on the recommendations developed by Terwee et al. [14] to identify relevant studies. Furthermore we used a well-developed consensus-based methodology reviewing the evidence on measurement properties [15].
Our study also has important limitations. First, we acknowledge that decisions regarding content validity of PROMs included in this review could be questioned. However our research team combined experience from different expertise areas, and we applied a conservative threshold of 50% or more of items on dyspnoea in the PROMs to demonstrate adequate content validity. Therefore, we feel that this methodology allowed us to identify the most appropriate PROMs of dyspnoea in this population. Second, we recognise that the exploratory search on PROMs or subscales reflecting dyspnoea in our target population is not exhaustive. We only considered PROMs for which an English version was available, which could have excluded possibly relevant PROMs or publications on measurement properties available in other languages. However, several of the PROMs we identified in this study were also recently described by the American Thoracic Society research statement as important frequently used domain-specific patient-centred outcomes for dyspnoea in ILD [58]. Third, we used the expertise present in our research team to formulate a hypothesis regarding construct validity and responsiveness regarding correlations with specific outcomes used in clinical practice. We recognise different thresholds or less strict hypotheses could lead to different conclusions on sufficiency of the evidence. Furthermore, it is important to emphasise that the evaluation regarding quality of individual study methodology using COSMIN guidelines, and overall study quality using the GRADE approach, especially the grading of inconsistency and indirectness of the evidence, is subjective and open to differences in opinion [12]. We ensured consistency in rating across studies, and thereby minimised variability, by grading studies independently using two authors and resolving any conflicts by discussion with a third team member if needed.
Points for clinical practice
Currently four dyspnoea-specific PROMs can be recommended for use in patients with ILD (including CTD-associated ILD): D12, ERS-IPF-B, KBILD-B and UCSD-SOBQ.
The use of D12 is also suitable for patients with PH, including CTD-associated PH; however, evidence is based on a limited amount of evidence.
There is not enough evidence on measurement properties of dyspnoea-specific PROMs to give a recommendation for use in patients with CTD without associated ILD or PH.
Conclusion
Research performed with PROMs of poor or unknown quality constitutes a waste of resources and is unethical [59]. Therefore it is crucial to select PROMs with good-quality evidence on the sufficiency of their measurement properties. Clinicians need to be aware of the strengths and weaknesses of available PROMs, particularly when deciding which measure to use in a specific patient population. To summarise, there are four dyspnoea-specific PROMs that can be recommended for use in patients with ILD, including CTD-associated ILD. These are the D12, ERS-IPF-B, KBILD-B and UCSD-SOBQ, of which KBILD-B (despite limitations) can currently be considered the best PROM to evaluate dyspnoea. Of these PROMs, the D12 can also be recommended for use in patients with PH, with or without associated CTD.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Literature search on Dyspnea-specific PROMs in CTD-PH and ILD patients ERR-0091-2022.SUPPLEMENTS1
Literature Search strategy ERR-0091-2022.SUPPLEMENTS2
Definitions of measurement properties and hypotheses ERR-0091-2022.SUPPLEMENTS3
Updated Criteria for good mesurement properties ERR-0091-2022.SUPPLEMENTS4
Grading of quality of evidence ERR-0091-2022.SUPPLEMENTS5
Supplementary table: Risk of bias assessment for ILD and CTD-ILD studies ERR-0091-2022.SUPPLEMENTS6
Supplementary table: Risk of bias assessment for PH studies ERR-0091-2022.SUPPLEMENTS7
Supplementary table: Risk of bias assessment for SSc patients only ERR-0091-2022.SUPPLEMENTS8
Acknowledgements
We would like to acknowledge O.Y. Chan, information specialist affiliated to the Radboud University Medical Library, Nijmegen, the Netherlands, for assistance on both constructing and performing the systematic literature search.
Footnotes
Provenance: Submitted article, peer reviewed.
Data availability statement: The authors confirm that the data supporting the findings of this study are available within the article or its supplementary materials.
Conflict of interest: The authors have no conflicts of interest to report.
- Received May 11, 2022.
- Accepted September 27, 2022.
- Copyright ©The authors 2022
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org