TABLE 3

Model training, validation, algorithm type and data source for selected studies in pulmonary medicine

First author [ref.]	Population/type of study	Source of data	Main findings	Reference standard/ground truth	Algorithm type	Datasets	Type of internal validation/availability of external validation
Solid pulmonary nodules/masses
Ardila [38]	CT chest of lung cancer screening patients/retrospective	NLST	Prediction of cancer risk based on CT findings For testing dataset, ROC of 94.4% (95% CI 91.1–97.3%) For validation set, AUC of 95.5% (95% CI 88.0–98.4%)	Histology; follow-up	CNN	42 290 CT images from 14 851 patients	Not reported/yes
Baldwin [39]	CT chest of lung cancer screening patients/retrospective	The IDEAL study (Artificial Intelligence and Big Data for Early Lung Cancer Diagnosis)	The AUC for CNN was 89.6% (95% CI 87.6–91.5%), compared with 86.8% (95% CI 84.3–89.1%) for the Brock model (p≤0.005)	Histology; follow-up	CNN	1397 nodules in 1187 patients	Not reported/yes
Massion [40]	CT chest of lung cancer screening patients/retrospective	NLST for model derivation and internal validation/externally tested on cohorts from two academic institutions	The AUC for CNN was 83.5% (95% CI 75.4–90.7%) and 91.9% (95% CI 88.7–94.7%) on two different cohorts (Vanderbilt and Oxford University)	Histology; follow-up	CNN	14 761 benign nodules from 5972 patients, and 932 malignant nodules from 575 patients	Not reported/yes
Ciompi [41]	CT chest of lung cancer screening patients/retrospective	Training dataset from the Multicentric Italian Lung Detection trial and validation dataset from the Danish Lung Cancer Screening Trial	CNN can achieve performance at classifying nodule type within the interobserver variability among human experts (Cohen κ-statistics ranging from 0.58 to 0.65)	Expert consensus	CNN	1352 nodules for training set and 453 nodules for validation set	Random split sample validation/yes
Nam [42]	Chest radiographs to detect malignant nodule/retrospective	Analysis of data collected from Seoul National University Hospital, Boramae Hospital and National Cancer Center, University of California San Francisco Medical Center	Chest radiograph classification and nodule detection performances of deep learning-based automatic detection were a range of 0.92–0.99 (AUROC)	Expert consensus	CNN	43 292 chest radiographs from 34 676 patients	Random split sample validation/yes
Wang [43]	Mediastinal lymph node metastasis of NSCLC from ¹⁸F-FDG PET/CT images/retrospective	Data collected at the Affiliated Tumor Hospital of Harbin Medical University	The performance of CNN is not significantly different from classic machine-learning methods and expert radiologists	Expert consensus	CNN	1397 lymph node stations from 168 patients	Resampling method/no
Wang [44]	Solitary pulmonary nodule ≤3 cm, histologically confirmed adenocarcinoma/retrospective	Analysis of data collected from Fudan University Shanghai Cancer Center	Algorithm showed AUROC of 0.892, which was higher than three expert radiologists in classifying invasive adenocarcinoma from pre-invasive lesions	Histology	CNN	CT scan from 1545 patients	Random split sample validation/no
Zhao [45]	Thin-slice chest CT scan before surgical treatment; nodule diameter ≤10 mm/retrospective	Secondary analysis of data from Huadong Hospital affiliated to Fudan University	Based on classification of tumour invasiveness, deep-learning algorithm achieved better classification performance than the radiologists (63.3% versus 56.6%)	Histology	CNN	Pre-operative thin-slice CT; 523 nodules for training/128 nodules for testing	Not reported/no
Fibrotic lung diseases
Walsh [46]	HRCT showing diffuse fibrotic lung disease confirmed by at least two thoracic radiologists/retrospective	Secondary analysis of data from La Fondazione Policlinico, Universitario and University of Parma (Italy)	Interobserver agreement between the algorithm and the radiologists’ majority opinion (n=91) was good (κw=0·69)	Expert consensus	CNN	HRCT; 929 scans for training/89 scans for validation	Not reported/yes
Christe [47]	HRCT showing NSIP or UIP confirmed by two thoracic radiologists/retrospective	HRCT dataset from the Lung Tissue Research Consortium	Interobserver agreements between the algorithm and the radiologists’ opinion were fair to moderate (κw=0.33 and 0.47)	Expert consensus	CNN	HRCT 105 patients (54 of NSIP and 51 for UIP)	Not reported/no
Raghu [48]	The whole-transcriptome RNA sequencing data from transbronchial biopsy samples/prospective	Bronchial Sample Collection for a Novel Genomic Test (BRAVE) study in 29 US and European sites	The molecular signatures had high specificity (88%) and sensitivity (70%) against diagnostic reference pathology (ROC-AUC 0.87, 95% CI 0.76–0.98)	Histology	ML; type not reported	94 patients in clinical utility analysis	Not reported/no
PH
Sweatt [49]	Peripheral blood biobank/prospective	Peripheral blood biobanked at Stanford University, USA and University of Sheffield, UK	Four distinct immunological clusters were identified. Cluster I had unique sets of upregulated proteins (TRAIL, CCL5, CCL7, CCL4, MIF), which was the cluster with the least favourable 5-year transplant-free survival rates (47.6%, 95% CI 35.4–64.1%)	N/A	Unsupervised ML	Blood biobanked; 281 patients for discovery cohort/104 patients for validation cohort	Resampling method/yes
Leha [50]	Echocardiographic parameters/retrospective	King's College Hospital (UK); University Medical Center Gottingen and University of Regensburg (Germany)	Among five ML algorithms, random forest of regression trees is the best method to identify PH patients (AUC 0.87, 95% CI 0.78–0.96) with accuracy of 0.83	Right heart catheterisation	Five ML algorithms (random forest of classification trees, random forest of regression trees, lasso-penalised logistic regression, boosted classification trees, SVM)	90 patients with invasively determined PAP with corresponding echocardiographic estimations of PAP	Resampling method/no
Asthma
Wu [51]	100 clinical, physiological, inflammatory and demographic variables/prospective	Severe Asthma Research Program (SARP) cohort from National Heart, Lung, and Blood Institutes (USA)	Four asthma clusters with differing CS responses were identified. Those in CS-responsive cluster were older, more nasal polyps, and high blood eosinophils. After CS, there was the highest increase in lung function in this group	N/A	Unsupervised ML; MML-MKKC	346 adult asthmatics with paired (before and after CS) sputum data	Random split sample validation/no
Pleural effusion
Khemasuwan [52]	19 candidate clinical variables from retrospective cohort of patients with pleural infection	A tertiary care, university-affiliated hospital, Utah, USA	Candidate predictors of tPA/DNase failure were the presence of pleural thickening (48% relative importance) and presence of an abscess/necrotising pneumonia (24%)	N/A	Supervised ML (extreme gradient boosting and coupled with decision trees)	84 patients with pleural infection and received intrapleural tPA/DNase	Random split sample validation/no
PFT interpretation and clinical diagnosis
Topalovic [53]	PFT tests and clinical diagnosis/prospective	University Hospital Leuven (Belgium)	Pulmonologists’ interpretation of PFTs matched guideline in 74.4±5.9% of cases and made correct diagnosis in 44.6±8.7% versus AI algorithm matched the PFT pattern interpretations in 100% and assigned correct diagnosis in 82% (p<0.0001)	ATS/ERS guideline and expert panel	ML; type not reported	Dataset based on 1430 historical cases/50 cases in prospective analysis	Not reported/yes
SARS-CoV-2 pandemic
Wang [54]	CT chest of patients with atypical pneumonia/retrospective	Xi'an Jiaotong University First Affiliated Hospital, Nanchang University First Hospital and Xi'An No.8 Hospital of Xi'An Medical College (China)	An internal validation achieved a total accuracy of 82.9% with specificity of 80.5% and sensitivity of 84%. The external testing dataset showed a total accuracy of 73.1% with specificity of 67% and sensitivity of 74%	Confirmed nucleic acid testing of SARS-CoV-2	CNN	CT images from 99 patients, of which 44 were confirmed cases of SARS-CoV-2	Random split sample validation/no
Li [55]	CT chest of patients with atypical pneumonia/retrospective	Six medical centres, China	AUC values for COVID-19 was 0.96 (95% CI 0.94–0.99). Sensitivity of 90% (95% CI 83–94%) and specificity 96% (95% CI 93–98%)	Confirmed nucleic acid testing of SARS-CoV-2	CNN	4356 chest CT examinations from 3322 patients	Not reported/yes

PH: pulmonary hypertension; PFT: pulmonary function test; SARS-CoV-2: severe acute respiratory syndrome-coronavirus-2; CT: computed tomography; NLST: National Lung Screening Trial; ROC: receiver operating characteristic; AUC: area under the curve; CNN: convolutional neural network; AUROC: area under the ROC curve; NSCLC: nonsmall cell lung cancer; ¹⁸F-FDG PET: fluorine-18 2-fluoro-2-deoxy-d-glucose positron emission tomography; HRCT: high-resolution computed tomography; κw: weighted κ-coefficient; NSIP: nonspecific interstitial pneumonia; UIP: usual interstitial pneumonia; ML: machine learning; TRAIL: tumor necrosis factor-related apoptosis-inducing ligand; CCL: C-C motif chemokine ligand; MIF: macrophage migration inhibitory factor; N/A: not applicable; SVM: support vector machine; PAP: pulmonary arterial pressure; CS: corticosteroids; MML-MKKC: multiview learning-multiple Kernel k-means clustering; tPA: intrapleural tissue plasminogen activator; DNase: deoxyribonuclease; ATS: American Thoracic Society; ERS: European Respiratory Society; COVID-19: coronavirus disease 2019.