Abstract
Background: The number of patients completing unsupervised home spirometry has recently increased due to more widely available portable technology and the COVID-19 pandemic, despite a lack of solid evidence to support it. This systematic methodology review and meta-analysis explores quantitative differences in unsupervised spirometry compared with spirometry completed under professional supervision.
Methods: We searched four databases to find studies that directly compared unsupervised home spirometry with supervised clinic spirometry using a quantitative comparison (e.g. Bland–Altman). There were no restrictions on clinical condition. The primary outcome was measurement differences in common lung function parameters (forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC)), which were pooled to calculate overall mean differences with associated limits of agreement (LoA) and confidence intervals (CI). We used the I2 statistic to assess heterogeneity, the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool to assess risk of bias and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach to assess evidence certainty for the meta-analyses. The review has been registered with PROSPERO (CRD42021272816).
Results: 3607 records were identified and screened, with 155 full texts assessed for eligibility. We included 28 studies that quantitatively compared spirometry measurements, 17 of which reported a Bland–Altman analysis for FEV1 and FVC. Overall, unsupervised spirometry produced lower values than supervised spirometry for both FEV1 with wide variability (mean difference −107 mL; LoA= −509, 296; I2=95.8%; p<0.001; very low certainty) and FVC (mean difference −184 mL, LoA= −1028, 660; I2=96%; p<0.001; very low certainty).
Conclusions: Analysis under the conditions of the included studies indicated that unsupervised spirometry is not interchangeable with supervised spirometry for individual patients owing to variability and underestimation.
Tweetable abstract
Unsupervised spirometry for assessing patients with respiratory conditions requires caution as it underestimates spirometry in clinic; clinicians must consider the potential for differences. Home and clinic spirometry measurements are not interchangeable. https://bit.ly/42pqwG7
Introduction
Traditionally, patients perform spirometry in a clinic setting supervised by a trained professional under standardised conditions [1, 2]. Spirometry is used extensively for diagnosis and for monitoring patients with respiratory conditions between clinic visits, with measurements used to assess disease severity and control [3–5]. In clinical research, lung function is an important outcome measure to assess intervention efficacy [6–8] and is a constituent of clinical trial core outcome sets for chronic obstructive pulmonary disease (COPD) [9], bronchiectasis [10] and pulmonary infections [11].
The Association for Respiratory Technology & Physiology (ARTP) advise that spirometry should be conducted under the supervision of professionals who have completed comprehensive training [12]. However, accelerated by the COVID-19 pandemic and with increased pressure on healthcare systems, routine respiratory services and clinical trials have now adopted unsupervised remote spirometry mainly for the monitoring of patients [13–15]. These portable spirometers have been advocated by healthcare providers [16–18] and have been positively received by patients [19–22], despite no conclusive evidence that unsupervised and supervised spirometry measurements are equivalent to those obtained in clinic. It is crucial to know whether unsupervised assessments are valid and reliable before mass uptake. However, if feasible, remote unsupervised spirometry could support virtual healthcare services in routine care and enable more pragmatic trial designs [23], such as the development and scaling of decentralised clinical trials [24].
The primary objective of the review was to determine if spirometry measurements completed by patients unsupervised at home are different to those obtained under the supervision of a trained professional. It assessed differences between two quantitative methods of measurement used as part of respiratory care in clinical research and clinical practice, rather than assessing the effects of the care itself [25]. Secondary objectives were to explore adherence to unsupervised spirometry, patient satisfaction/acceptability, technical issues, quality of spirometry data, adverse events and costs.
Methods
The analysis explicitly focused on measurement differences between two methods, unsupervised and supervised spirometry (definitions in supplementary material), and so was completed as a test accuracy review. The protocol for this review was registered prospectively on PROSPERO (ID: CRD42021272816) [26], providing full details on methods. Variations between the protocol and review are described in the supplementary material.
Criteria for study inclusion
Studies were included if they compared values obtained from unsupervised spirometry to those obtained from supervised spirometry. These could be from using the same or different spirometers. There was no maximum time difference between measurements. Eligible study designs included cross-sectional, longitudinal, randomised or non-randomised controlled or crossover studies in which participants performed both forms of spirometry. Sub-studies embedded within another study were also eligible. There were no restrictions on the type of publications included but they had to be complete datasets, i.e. ongoing studies with preliminary data were not included.
Criteria for study exclusion
We excluded studies in which there was no comparison group and if the publication was not in English.
Population, intervention, comparator, outcomes
The population, intervention, comparator, outcomes (PICO) terms are detailed fully in the published protocol [26]. In brief, there were no restrictions based on age, disease type or clinical condition. We deemed the intervention group for this review as unsupervised spirometry in the complete absence of a clinician or other professional support. The comparator group was supervised spirometry use in the presence of a clinician or other relevant professional and could include assistance via telephone or video link. The primary outcome was measurement of lung function in the intervention and comparator groups, primarily assessed by mean differences in forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), forced expiratory flow at 25–75% of FVC (FEF25–75%) and peak expiratory flow (PEF) measurements. Secondary outcomes included exploring adherence, quality criteria, cost, participant satisfaction/acceptability, technical issues and adverse events.
Search strategy
We conducted searches for relevant studies from database inception to 15 July 2021 in the electronic databases MEDLINE, Embase, the Cochrane Central Register of Controlled Trials (CENTRAL) and a grey literature search on Open Access Theses and Dissertations (OATD) (no eligible studies were retrieved from the grey literature search). We also checked the reference lists of eligible studies or related reviews for additional studies and did forward citation screening of eligible studies. The full search strategy including Medical Subject Headings (MeSH) terms was validated by a medical librarian. All database searches were completed by one author. Additional search strategy information, including the specific search strategy used for each source, can be found in the supplementary material.
Selection of studies
Search results were imported into the systematic review manager software, Covidence (www.covidence.org). This platform was used for abstract screening, full-text screening and data extraction of eligible studies. Any two authors independently screened titles and abstracts according to the eligibility criteria. Studies deemed potentially eligible had their full texts independently screened by any two authors. Any disagreements or uncertainties were resolved through discussion and, if needed, the involvement of other authors.
Data extraction
Data were extracted independently for each included study by two authors using a customised data extraction form developed in Covidence. Information extracted included the type of study, eligibility criteria, participant characteristics and details of the spirometers with associated measurements (mean differences, standard deviations and correlations). Outcomes of interest were FEV1, FVC, FEF25–75% and PEF. If any of the required data were not available or insufficient, they were requested from the corresponding author of the study.
Data synthesis and statistical methods
Mean differences and standard deviations were extracted from studies which reported Bland–Altman analyses. Standard deviations were inferred from confidence intervals and limits of agreement (LoA) when they were not reported directly. When no relevant data were reported we imputed the median standard deviation from all other studies reporting that outcome [27]. Meta-analyses were conducted using DerSimonian–Laird random-effects models and the statistical heterogeneity evaluated using the I2 statistic [28] and interpreted using the thresholds defined by the Cochrane Handbook [29]. We calculated pooled mean differences with associated 95% confidence intervals, which represent uncertainty around the mean bias estimate. However, to increase the clinical utility of our results, we also calculated pooled LoA around the mean differences, which reflects an interval within which 95% of the differences would lie for a given measurement. Pooled standard deviations were calculated according to the methods outlined in the Cochrane Handbook [29]. Supervised measurements were subtracted from unsupervised, so a negative mean difference indicated that unsupervised spirometry measurements were lower than supervised measurements. The percentage of results that would be expected to have a difference >200 mL was also calculated.
We grouped diseases as obstructive lung disease (COPD and asthma), interstitial lung disease (idiopathic pulmonary fibrosis, interstitial lung disease and pulmonary sarcoidosis), suppurative lung disease (cystic fibrosis) and transplant (lung transplantation and haematopoietic cell transplantation). We reported results for different clinical groups separately if they were presented within the same study and could be separated [30]. Additionally, we performed prespecified subgroup analysis to investigate heterogeneity by patient age (children versus adult) and risk of bias (low risk of bias versus high risk of bias). We used the baseline measurement for studies that conducted spirometry at multiple timepoints. Pearson and Spearman correlation coefficients measuring the association between unsupervised and supervised spirometry were extracted and pooled based on the Fisher transformation. Data relating to secondary outcomes are presented descriptively. We collated data if it was documented that spirometry had been conducted according to American Thoracic Society (ATS)/European Respiratory Society (ERS) guidelines [1]. Two-tailed tests were used throughout and the threshold for statistical significance was set to 0.05. All analyses were conducted using STATA version 16 (StataCorp, College Station, TX, USA).
Risk of bias assessment
We assessed risk of bias of included studies using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool, which is the most appropriate method to assess risk of bias in this type of review [31]. It assesses risk across four key domains (D1–D4): patient selection, index test (unsupervised spirometry), reference standard (supervised spirometry) and patient flow/timing. A study was determined to be at an overall risk of bias if one or more domains were judged to be at high or unclear risk of bias. A study was determined to be at an overall low risk if all the following were applied: adequate quality criteria for both the supervised and unsupervised spirometry measures, appropriate patient selection and appropriate time intervals between the two measurements, as per QUADAS guidance. A tailored extraction form for this review was piloted and used (available in the supplementary material). Two review authors independently used the tool to assess eligible studies. Any disagreements or uncertainties were resolved through discussion and, if needed, the involvement of other authors.
Results
Literature search
Following database searches, reference list screening and citation screening, 3933 records were identified (figure 1). Following removal of duplicates, 3607 records had titles and abstracts screened and 155 full texts were checked for eligibility. After full-text screening, 28 studies were determined to meet all eligibility criteria and included in the review.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram.
Study characteristics
Characteristics of included trials are summarised in table 1. This review included 25 prospective studies and three retrospective studies. Included studies totalled 4560 patients, ranging from nine to 2161. Studies included four distinct patient cohorts: interstitial lung disease (n=10) [35, 41, 43–48, 54, 58], transplant (n=8) [36, 38, 42, 49, 51, 56, 57, 59], obstructive airways disease (n=5) [19, 39, 40, 50, 53] and suppurative lung disease (n=5) [30, 34, 37, 52, 55]. In all studies, unsupervised spirometry was completed using a portable handheld spirometer, with a variety of device manufacturers and models used. Eight out of 28 studies [19, 30, 34, 38, 42, 53, 56, 58] explicitly stated that the quality criteria for both unsupervised and supervised measurements was according to ATS/ERS criteria. Five out of 28 studies [19, 34, 53, 57, 58] used the same spirometer for both the unsupervised and supervised measurements.
Characteristics of included studies
Risk of bias
For the QUADAS risk of bias assessment (supplementary material), five studies were assessed as having an overall low risk of bias across all domains. 23 studies were assessed as at risk of bias due to one or more domains being at high or unclear risk, as per reasons described within the Methods section.
FEV1
For the primary outcome of FEV1, 17 studies (4517 patients; 9855 data comparison points) reported Bland–Altman data for FEV1. Pooled analysis showed the overall mean difference for FEV1 between unsupervised and supervised spirometry was −106 mL lower when unsupervised (LoA= −509 mL, 296 mL; I2=95.8%; p≤0.001; figure 2). Six studies included patients with obstructive lung disease with a mean difference of −64 mL (LoA= −378 mL, 250 mL; I2=96.7%). Two studies included patients with interstitial lung disease with a mean difference of −187 mL (LoA= −721 mL, 348 mL; I2=0.0%). Three studies included patients with suppurative lung disease with a mean difference of −83 mL (LoA= −845 mL, 680 mL; I2=95.6%). Seven studies included transplant patients with a mean difference of −149 mL (LoA= −818 mL, 520 mL; I2=78.4%).
Meta-analysis of Bland–Altman for forced expiratory volume in 1 s (in mL) according to type of disease. LoA: limits of agreement. #: Gerzon et al. [30] reported values for asthma (2020a) and cystic fibrosis patients (2020b) separately and these were treated as two separate studies.
A subgroup analysis of four studies that had a low risk of bias found a mean difference of −103 mL (LoA= −444 mL, 238 mL; I2=96.2%; table 2) between unsupervised and supervised measurements. The 14 studies that were at risk of bias had a mean difference of −115 mL (LoA= −705 mL, 475 mL; I2=95.9%). Subgroup analysis for studies that compared unsupervised and supervised measurements within the same day and studies with adults and children are shown in table 2. The forest plots for these analyses are in the supplementary material. There was noticeable asymmetry with funnel plot estimates for FEV1 (supplementary material). In addition, 12 studies reported correlation values for FEV1 with an overall median of 0.949 (IQR 0.855, 0.982; supplementary material).
Summary of findings and subgroup analyses
FVC
17 studies (n=1307 patients; 1926 data comparison points) reported Bland–Altman data for FVC, with pooled analysis showing the overall mean difference between unsupervised and supervised spirometry was −184 mL lower when unsupervised (LoA= −1028 mL, 660 mL; I2=96%; p<0.001; figure 3). Two studies included patients with obstructive lung disease with a mean difference of 2 mL (LoA= −328 mL, 333 mL; I2=77.8%). Eight studies included patients with interstitial lung disease with a mean difference of −199 mL (LoA= −753 mL, 354 mL; I2=72.4%). One study included patients with suppurative lung disease with a mean difference of −5 mL (LoA= −267 mL, 277 mL). Six studies included transplant patients with a mean difference of −260 mL (LoA= −1379 mL, 859 mL; I2=90.8%).
Meta-analysis of Bland–Altman for forced vital capacity (FVC) (in mL) according to type of disease. LoA: limits of agreement. #: Turner et al. [57] used forced expiratory volume in 6 s (FEV6) as a surrogate for FVC.
A subgroup analysis of four studies that had a low risk of bias found a mean difference of −118 mL (LoA= −886 mL, 650 mL; I2=88.5%; table 2) between unsupervised and supervised measurements. The 13 studies that were at risk of bias had a mean difference of −207 mL (LoA= −1098 mL, 684 mL; I2=96.6%). Subgroup analysis for studies that compared unsupervised and supervised measurements within the same day and studies with adults and children are shown in table 2. The forest plots for these analyses are in the supplementary material. There was noticeable asymmetry with funnel plot estimates (supplementary material). In addition, 14 studies reported correlation values for FVC with an overall median of 0.967 (IQR 0.940, 0.976; supplementary material).
FEF25–75% and PEF
Three studies reported FEF25–75% (146 patients; 899 data comparison points) with an overall mean difference of −19 mL·s−1 when unsupervised (LoA= −497 mL·s−1, 458 mL·s−1; I2=99.7%). Two studies reported PEF (124 patients; 400 data comparison points) with an overall mean difference of −92 mL·s−1 (LoA= −498 mL·s−1, 314 mL·s−1; I2=0.0%). The forest plots for these analyses are in the supplementary material.
Secondary outcomes
The secondary outcomes of adherence, patient satisfaction/acceptability, technical issues, quality of spirometry data, adverse events and costs are summarised in the supplementary material.
GRADE
For the primary outcome, the certainty of the evidence was deemed very low for all four measures (supplementary material).
Discussion
Despite the rationale for why unsupervised spirometry might be useful for monitoring respiratory patients with respiratory conditions [17, 18], we found that on average unsupervised spirometry measurements were lower (FEV1 and FVC) than supervised measurements, with wide variability. This is contrary to the interpretations in some studies that have stated that unsupervised spirometry values are acceptable. However, we found that the large variation has been obscured in many cases because of a focus on the mean differences with confidence intervals and correlations. We assert this is not the most appropriate analysis to compare agreement by two methods of measurement as demonstrated by Bland and Altman [60], who highlighted the misconception that correlation is evidence for agreement and suggested mean differences and LoA are needed [61], as was used in this review. We found a substantial percentage of results would be expected to have a difference >200 mL (table 2). Two clinical examples highlight important implications from the study findings (table 3).
Real-world clinical examples that could result from the findings of this review
For studies with a low risk of bias (table 2 and supplementary material), there were acceptable mean differences between the two methods, based on ATS/ERS repeatability criteria [1]. This suggests that unsupervised lung function might be useful as an outcome measure for the follow-up of groups of patients in large research studies because it is likely that the average results represent the population mean. However, the large variation and high heterogeneity reflect diverse patient demographics and different spirometry methods used across the studies.
Recently, the landscape of spirometry has changed, with new portable devices available for use by patients. Although most studies in this review were published from 2016 onwards, it is likely that future studies will experience a positive learning curve, meaning a greater number of patients will have prior experience with portable spirometers and might produce more accurate results unsupervised [41, 58]. This could also apply to the training of clinicians [53, 58]. The ARTP recommend that clinicians complete a practical examination and perform at least 50 spirometry tests on patients per year to remain competent in spirometry, and to evidence quality and consistency, highlighting the technicality of the procedures [12]. We advise relevant organisations to consider the increase in unsupervised spirometry and provide guidance on optimisation and use of home measurements. Furthermore, any new portable spirometers must be thoroughly validated, because a recent review found that only three of 10 devices from various manufacturers were technically acceptable [62].
Most studies that explored adherence reported good compliance to home spirometry. Patient satisfaction was also generally positive towards the portable spirometers and technical issues appeared to be low. However, a wide variety of methods were used to determine these outcomes. No costs were reported but these can be substantial, especially when paired with new airway clearance devices in some diseases [63].
Regarding strengths, this is the first review to comprehensively search for studies to estimate the difference between unsupervised and supervised spirometry. The large number of patients, conditions and comparisons provide strength to the meta-analysis. In the screening stage of the review, we identified several studies that reported feasibility with portable spirometers when used by patients in the presence of clinicians, which is likely to influence the measurements, making it indistinguishable from a true unsupervised measurement. On this basis, such studies were not eligible for this review, and we only included studies to represent what would happen in real-world situations.
A limitation of the findings is the statistical heterogeneity in the pooled analysis and very low certainty of the evidence. Although there was significant heterogeneity in the magnitude of differences across all outcomes, unsupervised measurements were consistently lower than clinic measurements in almost all studies. We were unable to create a combined Bland–Altman plot to assess patterns of variability because individual patient values were not available. Furthermore, this was a cross-sectional analysis and did not explore successive measurements. Most studies were deemed at risk due to unclear methods. Some studies did not fully report metrics around the Bland–Altman analysis. We recommend that future studies report this fully [64]. Future reviews should explore the FEV1/FVC ratio and longitudinal analysis of data across time. As outlined in the new ERS/ATS guidelines [65], it is important we understand the reproducibility of spirometry measurements (unsupervised and supervised) and what indicates a clinically meaningful change over time. The majority of studies did not explicitly report if supervised measurements were performed under the instruction of a clinician or other trained professional. In addition, the majority of studies did not report if follow-up training on the quality of spirometry technique was performed with patients after the baseline visit to improve the quality of unsupervised measurements. The studies did not comprehensively describe how quality assessment of the spirometry was performed. Future studies should aim to provide more detail on aspects of patient training and quality assessment for both supervised and unsupervised spirometry in their methodology.
In conclusion, unsupervised home spirometry underestimates lung function measurements compared to supervised spirometry. We suggest caution and proper training if used owing to the possibility of underestimation and large variation in the differences between unsupervised and supervised measurements. Unsupervised home spirometry should not be used for diagnostic purposes; however, the results do suggest that unsupervised measurements may be suitable for outcome collection within large clinical research studies. The focus here is likely to be on the mean difference in lung function between the study groups and a large sample size could overcome the added measurement variation, and so represent the population mean. Any future research should use technically validated devices in a comprehensively trained population across multiple timepoints to fully understand the value of unsupervised home spirometry.
Points for clinical practice
It is crucial to know whether assessments, such as FEV1 and FVC, taken by unsupervised patients are consistent with measurements taken in a clinical setting with a trained professional. Based on this review, we urge caution when using unsupervised spirometry for individual patients and suggest clinicians consider the potential for differences because measurements are not interchangeable and can result in underestimation.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERR-0248-2022.SUPPLEMENT
Acknowledgements
We thank Marlies Wijsenbeek and Karen Moor from Erasmus MC University Medical Centre in Rotterdam, the Netherlands, for providing additional data to aid the meta-analysis. We thank Kate Hayes from Queen's University Belfast for insight on qualitative analysis. We also thank Richard Fallis from Queen's University Belfast for expertise on the search strategy.
Footnotes
Provenance: Submitted article, peer reviewed.
Data sharing: Study data are available on request from the corresponding author.
Author contributions: R. Anand: conceptualisation, methodology, searching, screening, full-text review, extraction, formal analysis, data curation, project administration, visualisation, validation, writing original draft, and manuscript review and editing. R. McLeese: methodology, screening, full-text review, extraction, interpretation, validation, and manuscript review and editing. J. Busby: methodology, screening, full-text review, extraction, statistical analysis, data curation, visualisation, and manuscript review and editing. J. Stewart: methodology, searching, screening, full-text review, extraction, interpretation, visualisation, and manuscript review and editing. M. Clarke: conceptualisation, methodology, screening, full-text review, interpretation, and manuscript review and editing. W.D-C. Man: interpretation, and manuscript review and editing. J. Bradley: conceptualisation, methodology, searching, screening, full-text review, extraction, funding acquisition, project administration, and manuscript review and editing. All authors had full access to all the data and agreed with the final decision to submit for publication. R. Anand and J. Busby accessed and verified the data.
Conflict of interest: R. Anand, M. Clarke and J. Bradley declare that they are investigators on an ongoing clinical trial investigating bronchiectasis in which patients complete home and clinic spirometry that is funded by the National Institute for Health Research (NIHR) Health Technology Assessment (HTA) programme and supported by PARI Pharma; outside the submitted work. J. Busby has received personal fees from NuvoAir for advisory board attendance, outside the submitted work. W.D-C. Man is part-funded by a NIHR Artificial Intelligence Award exploring the use of artificial intelligence software in the interpretation of primary care spirometry and is Honorary President of the Association of Respiratory Physiology and Technology (ARTP), outside the submitted work. All other authors have nothing to disclose.
- Received December 16, 2022.
- Accepted May 31, 2023.
- Copyright ©The authors 2023
This version is distributed under the terms of the Creative Commons Attribution Licence 4.0.