Abstract
The performance of gargling for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) reverse transcriptase (RT)-PCR testing has not been previously reviewed. This review systematically assessed the performance of saline and water gargling for SARS-CoV-2 RT-PCR testing in the settings of diagnosing and monitoring viral shedding.
We included original studies comparing the performance of gargling and (oropharyngeal–)nasopharyngeal swabs for SARS-CoV-2 RT-PCR testing. Studies conducted in either suspected individuals or confirmed cases were included and analysed separately. The sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were examined using random-effects models.
Gargles achieved a high overall sensitivity (91%), specificity (97%), PPV (95%) and NPV (91%) for SARS-CoV-2 RT-PCR testing. Studies using saline gargle and water gargle have an overall sensitivity of 97% and 86%, respectively. The sensitivity values were largely maintained for saline and water gargling on stratified analysis, for both diagnosis (96% and 92%) and viral shedding monitoring (98% and 78%). A higher sensitivity was also reported by studies using sterile saline (100%), a smaller amount of gargling solution (92% versus 87%) and a longer gargling duration (95% versus 86%).
Our results supported the use of gargling as a sampling approach for SARS-CoV-2 RT-PCR testing, which achieved a high sensitivity for both diagnosis and viral shedding monitoring purposes. Further investigation on the comparative performance of different gargling mediums is needed to draw a definitive conclusion.
Abstract
Gargling is a reliable sampling approach for SARS-CoV-2 RT-PCR test. Results suggest potential differential performance in factors including the nature, sterility and volume of gargling medium, and duration of gargling, which need further exploration. https://bit.ly/3tYau81
Introduction
Since its first emergence in late 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread globally with the cumulated number of confirmed cases growing to more than 304 million (as of 9 January 2022) [1, 2]. The continuous reporting of new cases and the emergence of new variants of SARS-CoV-2 infections in many countries, including those already achieving a high level of vaccine uptake [1], highlighted the persistent importance of accurate and scalable diagnostic testing capacity in tackling the evolving pandemic [3]. For laboratory identification of SARS-CoV-2 in clinical diagnosis and monitoring of infection, reverse transcriptase (RT)-PCR on nasopharyngeal swab (NPS) is generally regarded as the gold-standard approach [4, 5]. In recent years, combined oropharyngeal–nasopharyngeal swab (ONPS) has increasingly been adopted as the reference gold standard because of its regarded higher sensitivity and diagnostic yield [5–10]. However, important drawbacks of these technically difficult sampling approaches, including procedural discomfort, expertise and manpower implications [11], and the requirement for high-level personal protective equipment and negative-pressure settings [12], have hindered their widespread use in ambulatory care settings.
Aiming to overcome these drawbacks and to improve the scalability of SARS-CoV-2 testing in different community settings under heightened testing demand [13], a number of alternative sampling approaches have been explored, including pooled nasal and throat swabs [14–16], saliva [17, 18], nasal swabs [19–24], oropharyngeal swabs (throat swabs) [25–27], and saline- or water-based gargle [28, 29]. In a recent systematic review, pooled nasal and throat swabs were reported to have the best testing performance, with a high sensitivity, specificity, and both positive predictive values (PPVs) and negative predictive values (NPVs) [30]. Saliva and nasal swabs had lower but still acceptable sensitivities. Self-sampling of nasal and throat swabs and nasal swabs was not associated with any major compromise of testing performance [30]. As collection of these specimens is technically easier than NPS, the lessened manpower requirement for experienced healthcare workers (HCWs) and protective facilities would facilitate their use in large-scale testing initiatives in an ambulatory setting [3], especially in low- and middle-income countries where resources may be limited [30]. Their reduced invasiveness compared with NPS would also help to enhance their acceptability to patients [17, 18, 23]. Throat swabs, on the other hand, were reported to have a much lower sensitivity and PPV, and were thus not being recommended as a reliable sampling approach [30].
The use of gargle as a testing sample, acquired by forcing the cricopharyngeal muscle to oscillate over the posterior pharyngeal wall with saline or water, has recently been explored as a self-sampling approach for SARS-CoV-2 testing. While gargling has been advocated by a number of recent narrative commentaries as a noninvasive, swab-free and virus transport medium-free sampling approach [31–34], a comprehensive review of the testing performance of gargle, however, is still lacking. Findings from original studies of gargles remain inconclusive, ranging from the reporting of satisfactory [35] to poor sensitivity for detecting SARS-CoV-2 compared to NPS/ONPS as the gold standard [36]. In particular, the comparative performance of gargling based on different mediums, including saline and water-based gargle, has not been directly compared nor previously reviewed [35, 37].
Here we report a systematic review of the performance of gargling for the detection of SARS-CoV-2. We comprehensively review the performance of gargle samples with a view to inform the choice of the most appropriate alternative sampling approach for SARS-CoV-2 testing in an evidence-based manner. The review is stratified according to populations with different infection status to examine the testing performance in two testing scenarios, including for the purpose of diagnosing SARS-CoV-2 infections in suspected populations presenting clinically in an ambulatory care setting and for the monitoring of viral shedding in populations with confirmed SARS-CoV-2 infections and discharged cases to inform downstream clinical and public health decisions in the hospital and community settings. We also assess the performance of saline gargle and water gargle for RT-PCR testing, and the determinants affecting their test performance.
Methods
Search strategy and selection criteria
A standardised search was done in PubMed, OVID Medline, Embase and Web of Science, using the search term“(((((gargle lavage) OR (gargle)) OR (gargling)) OR (mouthwash)) OR (mouth rinse)) AND ((((((((SARS-CoV-2) OR (sars cov 2)) OR (COVID-19)) OR (covid)) OR (ncov)) OR (2019-nCoV)) OR (n-CoV)) OR (coronavirus))”. Given the role of preprints in the timely dissemination of research studies during the coronavirus disease 2019 (COVID-19) pandemic, medRxiv and bioRxiv servers was also searched using the search term “((SARS CoV 2) OR COVID19 OR covid) AND (gargle OR gargling OR mouthwash OR (mouth rinse))”. The search on preprint servers was simplified because of their reduced search functionality. The search was done on 28 September 2021, to retrieve articles from 1 January 2019 with no language restrictions. Additional relevant articles from the reference sections were also reviewed.
Original clinical studies examining the performance of gargles for SARS-CoV-2 RT-PCR testing using either NPS or ONPS as the reference gold standard were included. Studies conducted in either suspected individuals or confirmed cases of SARS-CoV-2 infection were both included, but were analysed separately to address the performance of gargling in the two corresponding contexts. Studies on suspected individuals are useful for examining the test performance for correctly diagnosing individuals having the infection [30]. On the other hand, studies on confirmed cases are useful for assessing test performance for continuous monitoring of virus shedding in patients in different stages to inform related clinical management and public health decisions. Studies without data on performance of gargles for identifying SARS-CoV-2 by RT-PCR, with a detailed collection method that deviated from the definition of gargling, or those that only examined viricidal effect of gargles, were excluded.
Two authors (N.N.Y.T. and H.C.S.) screened all the articles, with disagreement resolved by consensus together with a third author (D.K.M.I.). Studies identified from different databases were de-duplicated after screening. Two authors (N.N.Y.T. and H.C.S.) independently extracted data from the included studies, with disagreement resolved by consensus with a third author (D.K.M.I.). Two authors (N.N.Y.T. and D.K.M.I.) assessed studies for methodological quality, including risk of bias and applicability, adapted from the Quality Assessment of Diagnostic Accuracy Studies tool for diagnostic studies [38].
Data analysis
For the included studies, either individual data or summary estimates of sample size, number of true-positive, true-negative, false-positive and false-negative results in each study were extracted. By use of a standardised data extraction chart, we also retrieved information on study period, country, setting, positivity rate of the corresponding gold-standard test adopted (NPS/ONPS), symptomatic status of population, infection status of population, criteria for SARS-CoV-2 infection among confirmed cases, sampling approaches, gargling technique and procedures, peer-review status, age of population, inclusion of children population, target genes assessed, and the cycle threshold value for positive test result. These findings were checked for agreement.
The sensitivity, specificity, PPV and NPV of RT-PCR tests and the 95% confidence intervals were calculated for gargles, with random-effects meta-analyses using the inverse variance method [39] and a restricted maximum-likelihood estimator for heterogeneity [40, 41], with NPS or ONPS as the reference. Studies capturing both positive and negative individuals for the reference gold standard were included for the calculation of all four performance indicators, while studies including only individuals positive for the reference gold standard were only included in the calculation of sensitivity, as the calculation of specificity, PPV and NPV was not allowable with the absence of negative cases. Freeman–Tukey double arcsine transformation was incorporated for normalising and stabilising the variance of sampling distribution of proportions, and pooled estimates were back-transformed using the harmonic mean [42]. The Q statistic and its p value were calculated to test whether effect sizes departed from homogeneity, and the I2 statistic and its 95% CI were calculated to examine the proportion of dispersion due to heterogeneity [43, 44]. Heterogeneity was assessed as per the definition in the Cochrane Handbook for Systematic Reviews of Interventions [45]. To assess possible factors contributing to heterogeneity, subgroup analyses were done for gargles in individuals with and without confirmed SARS-CoV-2 infection, respectively. Stratified study-level characteristics included positivity rate of ONPS (<20% or ≥20%), geographical regions (European countries or non-European countries), symptomatic status of population (symptomatic only or symptomatic and asymptomatic), infection status of population (suspected only or suspected and confirmed), peer-review status (yes or no), inclusion of children population (yes or no), and number of target genes assessed in RT-PCR (one gene or two or more genes). To examine the optimal practical implementation of gargling, the testing performance was stratified by gargle media (saline or water), sterilisation of gargles (yes or no), volume of gargling medium (≤5 mL or >5 mL) and duration of gargling procedures (≤10 s or >10 s). Comparisons between samples evaluated against NPS and ONPS were presented using forest plots for gargle, as allowable by the availability of studies adopting two different referential approaches. Sensitivity analysis of using a composite reference comparator, defined as positive if either the NPS, ONPS or gargle was positive, was conducted. Data were analysed with the metafor (version 2.4.0) packages in R (version 3.6.0). The protocol was registered on PROSPERO (CRD42022309314).
Results
From the 757 studies identified in our search, 342 duplicates were excluded. After screening the titles and abstracts of the remaining articles, 45 full texts were screened (figure 1). On the basis of our selection criteria, 27 of those studies were excluded and 18 studies [28, 29, 35, 36, 46–59] met our inclusion criteria (table 1). One study examined two study populations, of suspected infection and of confirmed cases, which were analysed separately [59]. Of these 19 study populations, seven assessed the diagnostic performance of gargle in populations with suspected infection [28, 49, 50, 54, 55, 57, 59], and 12 assessed the use of gargle for monitoring viral shedding in populations already confirmed to have SARS-CoV-2 infections, either as hospital inpatients or after being discharged (table 1) [29, 35, 36, 46–48, 51–53, 56, 58, 59]. Regarding the study location, there were five studies from Canada [28, 47, 50, 52, 57], six from Germany [36, 48, 49, 51, 54, 56] and with the rest (seven) from Finland [29], Turkey [55], China [46], Korea [58], Indonesia [59], Israel [53] and India [35]. These studies spanned from 23 March 2020 to 19 July 2021, covering the period of different prevailing SARS-CoV-2 variants, from the ancestral strain, alpha variant, to its delta variant. The main prevailing strains of SARS-CoV-2 included the ancestral strain for studies in North America, the ancestral and alpha strains for studies in European countries, the ancestral and delta strains for studies in Asian countries, and the ancestral strain for the study in Middle East countries. A total of 4444 individuals were included in the 18 eligible studies. Besides two that did not specify participants’ age [50, 56], all reviewed studies included an adult population, among which eight also included children aged below 18 years [28, 36, 47, 48, 52, 54, 57, 59]. 12 studies [28, 29, 35, 46, 47, 49, 51, 52, 54–56, 58] used NPS and six studies [36, 48, 50, 53, 57, 59] used ONPS, all collected by HCWs, as the reference gold standard. A total of 8888 clinical samples were included in our analysis, including 4444 NPS/ONPS as control, and 4444 gargle samples for testing (1187 saline gargles, 2711 water gargles and 546 gargles without specification).
Overall pooled performance estimates of gargles among these studies included a sensitivity of 91% (95% CI 85–96), a specificity of 97% (92–100), a PPV of 95% (89–99) and an NPV of 91% (81–98; figure 2). When stratified by the testing setting, the overall performance generally remained similar in the two testing scenarios. Among the seven studies consisting of suspected individuals presenting with symptoms [28, 50, 54, 55, 57, 59] or asymptomatic individuals with known exposure to confirmed cases [28, 49, 50, 54, 57, 59], the diagnostic performance of gargle for SARS-CoV-2 infection included a sensitivity of 91% (95% CI 87–95), a specificity of 98% (94–100), a PPV of 93% (85–99) and an NPV of 98% (93–100; figure 3). Among the 12 studies on individuals confirmed to have SARS-CoV-2 infection [29, 35, 36, 46–48, 51–53, 56, 58, 59], the testing performance of gargle for monitoring viral shedding of SARS-CoV-2 included a sensitivity of 91% (95% CI 80–98), a specificity of 94% (81–100), a PPV of 95% (84–100) and an NPV of 78% (63–90; figure 4).
All performance indicators of gargles for RT-PCR testing were in general heterogeneous (I2 ranged from 88.5% to 97.7%; figure 2), with diverse population characteristics and study settings. Overall, a lower testing sensitivity was reported in studies using ONPS as the reference standard (83% versus 96% in studies using NPS), or studies involving asymptomatic individuals only (62% versus 91% in studies with symptomatic and asymptomatic individuals and 91% in symptomatic individuals) (table 2). Studies including a child population had a similar pooled sensitivity (89%) to the overall figure. A lower PPV was found for studies with a lower disease prevalence (positive rate of less than 20%) (84% versus 96%), or studies done in Asia (85% versus 95% in European countries and 96% in North America; table 2). A lower NPV was found for studies with ONPS as the reference standard (81% versus 97%), studies with a disease prevalence greater or equal to 20% (84% versus 100%) and studies done in Middle East countries (47% versus 99% in North America, 90% in European countries and 85% in Asia; table 2). Heterogeneity generally remained moderate to substantial (I2≥30%) in most stratified analyses.
Regarding the medium of the gargle, saline gargle was used in eight studies [28, 35, 46, 47, 52–54, 56], with five specifying sterile saline gargle [35, 46, 47, 52, 56] while the other three did not specify on medium sterility; [28, 53, 54] water gargle was used in seven studies [29, 36, 48–50, 55, 57], with one specifying sterile water gargle [36] and the rest used nonsterile water gargle, including natural spring water gargle and tap water gargle [29, 48–50, 55, 57]. One study used bean-extract gargle [58], while the remaining two did not specify the gargling mediums [51, 59]. Studies using saline gargle and using water gargle reported an overall sensitivity of 97% and 86% respectively, and a broadly similar overall specificity (96% versus 100%), PPV (94% versus 93%) and NPV (92% versus 96%) (table 2). The respective sensitivity of saline gargle and water gargle was maintained on stratification by testing settings, including for diagnostic testing on individuals with suspected infection (96% versus 92%) (supplementary table 2), and for monitoring viral shedding of SARS-CoV-2 in individuals with confirmed SARS-CoV-2 infection (98% versus 78%) (supplementary table 3). Among the eight studies using saline gargle, five studies specified using sterile saline gargle and generally suggested a potential of higher overall sensitivity (100% versus 91%) compared to the three that did not specify if the gargle was sterile or not (table 2).
The volume of gargling solution used in the studies ranged from 2 mL to 20mL, and the duration of gargling during the sampling procedure varied from 5 s to 2 min. Regarding the volume of gargle, studies using ≤5 mL of gargling solution reported a higher overall testing sensitivity (92% versus 87%), specificity (99% versus 93%), PPV (97% versus 91%) and NPV (95% versus 74%), compared to those using >5 mL of gargling solution (table 2). For the duration of gargling during the sampling procedure, studies requiring a total gargling time of >10 s reported a higher overall testing sensitivity (95% versus 86%), specificity (100% versus 89%), PPV (98% versus 86%) and NPV (98% versus 78%), compared to those requiring ≤10 s of gargling (table 2). A robust examination of the impact of medium sterility, volume and gargling duration in different testing settings, however, was not possible due to the small number of studies on further stratification.
Sensitivity analysis of gargling using a composite reference comparator showed a similar sensitivity (90% versus 91%) and NPV (91% versus 91%) to full analysis using ONPS as the reference standard (supplementary figure 1, table 4). Quality assessment showed that all included studies were of good or acceptable quality and low risk of bias (supplementary figure 2). Sensitivity analysis was not performed regarding peer-review status as all included studies were peer-reviewed (table 2). Publication bias was examined by funnel plots of the performance indicators, which suggested the presence of a low risk of bias for sensitivity and PPV, but some risk of bias for specificity and NPV (supplementary figure 3).
Discussion
This is the first systematic review and meta-analysis examining self-collection of gargles for SARS-CoV-2 RT-PCR testing. We comprehensively evaluated the performance and reported pooled estimates of sensitivity, specificity, PPV and NPV of gargling approaches for the detection of SARS-CoV-2, and compared the differential performance of different gargling medium, volume and duration, and the contextual performance in diagnostic testing among suspected individuals and clinical virologic monitoring among confirmed COVID-19 patients. Our review included studies conducted from 23 March 2020 to 19 July 2021, covering the period of different prevailing SARS-CoV-2 variants, from the ancestral strain, alpha variant, to its delta variant. Our review minimised potential bias from different testing techniques by including only studies with RT-PCR testing procedures and excluding rapid diagnostic tests using other technologies. Studies with different positivity rate and population characteristics were included in our studies to increase the generalisability of our findings, while stratified analysis provide a context-specific estimate for different gargling mediums and procedures, and test application settings. By stratifying the performance indicators of gargling with respect to their reference gold standard, status of infection, gargling mediums, procedural implementation and study settings, our review identified the factors of heterogeneity and examined their potential impact on testing performance. The bias of using different reference comparators were minimal in our review, with similar results generated in the sensitivity analysis using a composite reference comparator and in the overall analysis using ONPS as a reference comparator.
Our findings suggest that gargling as a sampling approach offered a very good performance for detecting SARS-CoV-2 by RT-PCR, with an overall pooled sensitivity of 91%, specificity of 97%, PPV of 95% and NPV of 91%, making it a reliable alternative sampling option for SARS-CoV-2 testing. This is compatible with the reporting of the release of SARS-CoV-2 in a high quantity in the airway epithelial cells of an infected person [60]. The higher overall sensitivity form studies using saline gargle compared to those using water medium (97% versus 86%) suggested the potential dependency of testing performance on the nature of gargle medium, which is compatible with the suggestion that hypotonic solutions may prematurely lyse cells or disrupt lysosomes and degrade RNA and result in a sharp decline in the RT-PCR product [61]. However, further studies employing a head-to-head comparison of the two media would be needed before a definitive conclusion can be drawn regarding their comparative performance. Likewise, although the near-perfect overall sensitivity (100%) reported in studies exclusively using sterile saline gargle is compatible with previous reports of giving a similar viral yield for detection of SARS-CoV-2 by sterile normal saline and viral transport media (VTM) [62], and in agreement with the World Health Organization recommendation to use sterile normal saline as an alternative for RT-PCR testing when VTM is not available [4], this would need to be confirmed by further studies.
The near-perfect sensitivity achieved by saline gargle (97%) and sterile saline gargle (100%) implied the great advantage of reducing potential false negativity, which is of utmost importance and has significant implications for the control of an evolving COVID-19 epidemic in a community. As infected individuals not detected by the screening test will fail to be properly channelled to the required clinical management and isolation, they may seed further transmission and infection in the community. Likewise, false negativity of virologic monitoring among confirmed COVID-19 patients may result in premature discharge from hospitals or isolation facilities when still actively shedding viruses and risk further transmission. The maintained high sensitivity of saline gargle both for diagnosing SARS-CoV-2 infection in suspected individuals (96%) and for monitoring of viral shedding in confirmed COVID-19 patients (98%) supported its suitability for both testing purposes. For use as diagnostic testing in suspected individuals, the pooled sensitivity of saline gargle (96%) is slightly higher than those achieved by most other common sampling approaches, including nasal swab (86%), saliva (85%) and throat swab (68%), and only lower than that reported for pooled nasal and throat swab (97%) by a recent systematic review [30]. Although our results suggested that sensitivity may be further improved to near perfection by the exclusive use of sterile saline gargle, robust stratified examination for different testing settings was not allowable in this study. On the other hand, the sensitivity of 86% achieved by water gargle indicated that almost one seventh of infected cases may be missed, which suggested that it may not be an optimal sampling approach for the detection of SARS-CoV-2 in settings where potentially missing cases may be consequentially serious, and further studies with head-to-head comparisons are needed to better clarify the comparative performance of different gargle media.
In a disease screening setting, false-positive results may incur unnecessary confirmatory tests and risk wrongful hospitalisation with potential unnecessary risk exposure. False-positive results in a virologic monitoring setting are associated with delayed discharge. In the COVID-19 epidemic, cases with positive screening results are generally followed-up by health agencies and verified by further confirmatory testing. However, the concerns associated with false positivity would be low for saline gargle owing to its high specificity (100%) and PPV (99%) in diagnostic testing, and high overall specificity (96%) and high overall PPV (94%) in different settings.
Our findings also help to refine the practical procedural considerations for using gargle as a sampling medium, by highlighting that using a larger amount of gargling solution (>5 mL) or a shorter gargling duration (≤10 s) may jeopardise testing sensitivity, potentially due to a dilution effect or an inability to capture sufficient viral particles. The need to avoid using an excessive amount of gargle is compatible with the usual usage of a relatively small amount (2–3 mL) of transport medium for swabs as per the US Centers for Disease Control and Prevention's recommendation [63]. According to our findings, adopting a certain minimal duration (>10 s) and a maximal allowable volume of saline (≤5 mL) would help to ensure an adequate amount of viral particles to be captured, and to give an optimal concentration for a sufficiently sensitive result.
Although a stratified comparison of performance in adults versus children is not permissible by the data, the lack of notable difference in sensitivity (91% versus 89%) between the pooled overall estimate from all 18 studies and eight of the studies including children aged 3–18 years in their study population potentially suggested a comparable performance of gargling in adults and children, and that gargle would be an appropriate sampling option for children. The higher pooled sensitivity (89%) by gargle in studies including children as compared to that reported previously for saliva (83.6%) [64] further suggests gargling to be a viable option for a child population. Although some caution has been advocated for its use in infants due to the technical difficulty for this age group [65], gargling has also been used in some settings for the molecular testing of common respiratory infections in children [66, 67]. Further large-scale study with different age groups would help to better address the differential performance of gargling in extremities of age.
The observed lower PPV (84%) in studies with lower disease prevalence was compatible with the general understanding. On stratified analysis by study areas, a much lower PPV (85%) was observed in the subset of three studies conducted in Asia compared with those from other continents (95%-98%), this may be related to the particularly low disease prevalence (4.17%) reported in a study from China [46].
Besides the analytic sensitivity of a test, its utility also depends on its availability, accessibility, acceptability, affordability, frequency of testing, and turnaround time [68, 69]. Moreover, the ease of sample collection and delivery, availability of consumables, and need for special protective equipment are critical factors that may further constrain its use in various community settings [57]. As many SARS-CoV-2-infected individuals can be asymptomatic [70] and potentially infectiousness before symptom onset [71, 72], a test that can be easily repeated would be highly desirable for testing suspected individuals, including potential contacts in the community and people under quarantine [68, 69]. Although generally regarded as a reference clinical sample for SARS-CoV-2 testing, NPS has limited utility in public health practice because of its technical difficulty, requirement of skilled HCWs for its collection [11], procedural discomfort, and expertise and facilities constraints [12]. Under the evolving COVID-19 pandemic situation, gargling as an alternative noninvasive swab-free self-sampling approach may offer a number of practical and logistical advantages if used in a suitable context in different countries. The simple execution of the gargle self-collection procedure allowed for repeated testing in the absence of trained HCWs and high-level personal protective equipment and negative-pressure facilities. The high stability of viral RNA in gargle at room temperature for several days [47, 57] without requiring cold-chain delivery [73] allowed for the option of late and flexible drop-off specimen delivery. Besides being technically simpler to collect [28] than deep-throat saliva, gargling also offered a number of technical advantages. The higher homogeneity and lower viscosity would help to reduce the risk of clogging of liquid handling systems and testing failure [36, 74, 75]. The reduced steps needed for extraction, resuspension and transfer would help to minimise the risk of cross-contamination and false-positivity [76]. Together, these factors lead to advantages of easy deployment and scalable capacity for any screening or surveillance regimens where frequent and large-scale mass testing would be needed [69], especially in low-to-middle-income countries and remote settings with limited healthcare infrastructure and clinic and hospital accessibility [3]. The minimised manpower and zero swab requirement also allow limited resources to be redeployed to other competing needs in resource-constrained settings. The high acceptability of self-collected gargle as reported by some studies suggested that it may also potentially help to abate the testing hesitancy that is not uncommonly encountered in mandatory testing scenarios [35, 47, 77]. Our findings provide an evidence-based justification for COVID-19 testing approaches in national and international guidelines, and support the recommendation of gargling as an inexpensive and easy sampling approach for routine repeated testing in various community settings [78], including primary and secondary schools, universities [77, 79, 80], nursing homes [81], daycare facilities for children aged 1–7 years [67], children's hospitals [82], homes [83] and in other specific high-risk populations having medically indicated needs for repeated testing [78].
This review has several limitations. First, publication bias might have resulted in an overestimation of some of the performance indicators we examined if studies with null or negative results were less likely to be published [84]. However, the presence of substantial risk of bias was not supported by the funnel plots of different performance indicators. Although only a small number of relevant primary studies are available, our extended search of multiple literature databases and the inclusion of preprints should have helped to minimise the risk of publication bias in our review. Second, substantial heterogeneity was observed in the performance indicators of gargles. Various factors for heterogeneity, including gargling media, volume of gargling solution, duration of gargling, positivity rate of ONPS, reference comparators, number of genes tested and study settings, were adjusted in stratified analysis and should have helped to assess their impact on the performance indicators. Third, various factors potentially contributing to the residual heterogeneity, including case definitions of SARS-CoV-2 infection, RNA extraction methods, RT-PCR targeted genes and time delay between symptom onset and sample collection, were not accounted for due to the limited number of studies available and may have affected our observed results [53]. Detailed exploration of the adjusted impact of these factors identified in the subgroup analyses by use of a multivariate meta-regression approach was not possible with the small sample size and reduced power due to a small number of available studies. Therefore, caution regarding the generalisability of the findings should be exercised when applying the results to real-life programmes with varying sampling and testing logistics in different settings. For instance, the long delay between sample collection and symptom onset in some studies may have impacted the observed sensitivity and may not be typical of real-life situations [36]. Fourth, studies examining diagnostic performance of gargle in asymptomatic individuals only was underrepresented compared to studies on symptomatic individuals, which might have limited the generalisability of our results to the situation of general population screening. Although studies with different prevailing SARS-CoV-2 strains were included, covering the ancestral strain to delta variants, the performance of gargling, like all other sampling methods, may vary across variants, and our findings may not be fully applicable to new evolving variants. As studies examining the performance of saline gargle or water gargle were not compared head-to-head, the result may be affected by a number of factors including study population, prevalence, volume of gargling solution, instructions given and PCR methodology. The performance of saline and water gargle observed in our review would need to be further confirmed by studies with a head-to-head comparison. Further study examining the impact of duration and intensity of gargling, different volumes of gargling solutions, populations of different symptomatology, different RNA extraction methods, different cycle thresholds, different case definitions and different reference comparators is warranted for further optimising the testing performance of gargling.
Conclusions and future perspectives
In summary, in this review we synthesised the pooled performance estimates of gargling and found that self-collection of gargle achieved a high performance compared with the gold standard of ONPS and represented a reliable alternative sampling approach for both the purposes of diagnosing SARS-CoV-2 infection and monitoring viral shedding. A potential dependency of testing performance on the nature and sterility of gargle media was highlighted, which suggested the need for further studies with head-to-head comparisons for assessing the use of different gargling media in SARS-CoV-2 testing. Our study also informed the practical implementation of gargling, in terms of duration of gargling and volume of solution.
As a simple and noninvasive sampling approach that can be easily conducted, self-collection of gargle appears to be a viable option to boost acceptability and accessibility, and to enable efficient, timely and frequent SARS-CoV-2 testing in different community settings. Its lower swab, collection expertise and manpower requirements, and less stringent requirements for specimen collection venue or delivery process may help to overcome the traditional technical and resource limitations of nasopharyngeal swabbing. It may represent a suitable option for easy deployment and upscaling for mass testing in remote or resource-limited settings during the evolving pandemic. Our results provide a useful reference framework for the proper implementation of gargling and the interpretation of SARS-CoV-2 testing results using gargling, both for diagnosing SARS-CoV-2 infection and virologic monitoring in different testing settings.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERR-0014-2022.SUPPLEMENT
Footnotes
Provenance: Submitted article, peer reviewed.
Author contributions: G.M. Leung and D.K.M. Ip conceived and designed the study. N.N.Y. Tsang and H.C. So screened literature and extracted data. N.N.Y. Tsang and H.C. So had access to and verified the data. B.J. Cowling advised on the statistical analysis. N.N.Y. Tsang did the statistical analysis. N.N.Y. Tsang, H.C. So, and D.K.M. Ip interpreted the data. N.N.Y. Tsang and D.K.M. Ip wrote the first draft of the manuscript, and all authors provided critical review and revision of the text and approved the final version. D.K.M. Ip had final responsibility for the decision to submit for publication.
Data sharing: The data supporting this meta-analysis are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author, upon reasonable request.
Conflict of interest: B.J. Cowling consults for Roche and Sanofi Pasteur. All other authors declare no competing interests.
Support statement: This project was supported by the Theme-based Research Scheme (T11-712/19-N) of the Research Grants Council of the Hong Kong Special Administrative Region Government. The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received February 9, 2022.
- Accepted June 21, 2022.
- Copyright ©The authors 2022
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org