TABLE 3

Model training, validation, algorithm type and data source for selected studies in pulmonary medicine

First author [ref.]Population/type of studySource of dataMain findingsReference standard/ground truthAlgorithm typeDatasetsType of internal validation/availability of external validation
Solid pulmonary nodules/masses
 Ardila [38]CT chest of lung cancer screening patients/retrospectiveNLSTPrediction of cancer risk based on CT findings
For testing dataset, ROC of 94.4% (95% CI 91.1–97.3%)
For validation set, AUC of 95.5% (95% CI 88.0–98.4%)
Histology; follow-upCNN42 290 CT images from 14 851 patientsNot reported/yes
 Baldwin [39]CT chest of lung cancer screening patients/retrospectiveThe IDEAL study (Artificial Intelligence and Big Data for Early Lung Cancer Diagnosis)The AUC for CNN was 89.6% (95% CI 87.6–91.5%), compared with 86.8% (95% CI 84.3–89.1%) for the Brock model (p≤0.005)Histology; follow-upCNN1397 nodules in 1187 patientsNot reported/yes
 Massion [40]CT chest of lung cancer screening patients/retrospectiveNLST for model derivation and internal validation/externally tested on cohorts from two academic institutionsThe AUC for CNN was 83.5% (95% CI 75.4–90.7%) and 91.9% (95% CI 88.7–94.7%) on two different cohorts (Vanderbilt and Oxford University)Histology; follow-upCNN14 761 benign nodules from 5972 patients, and 932 malignant nodules from 575 patientsNot reported/yes
 Ciompi [41]CT chest of lung cancer screening patients/retrospectiveTraining dataset from the Multicentric Italian Lung Detection trial and validation dataset from the Danish Lung Cancer Screening TrialCNN can achieve performance at classifying nodule type within the interobserver variability among human experts (Cohen κ-statistics ranging from 0.58 to 0.65)Expert consensusCNN1352 nodules for training set and 453 nodules for validation setRandom split sample validation/yes
 Nam [42]Chest radiographs to detect malignant nodule/retrospectiveAnalysis of data collected from Seoul National University Hospital, Boramae Hospital and National Cancer Center, University of California San Francisco Medical CenterChest radiograph classification and nodule detection performances of deep learning-based automatic detection were a range of 0.92–0.99 (AUROC)Expert consensusCNN43 292 chest radiographs from 34 676 patientsRandom split sample validation/yes
 Wang [43]Mediastinal lymph node metastasis of NSCLC from 18F-FDG PET/CT images/retrospectiveData collected at the Affiliated Tumor Hospital of Harbin Medical UniversityThe performance of CNN is not significantly different from classic machine-learning methods and expert radiologistsExpert consensusCNN1397 lymph node stations from 168 patientsResampling method/no
 Wang [44]Solitary pulmonary nodule ≤3 cm, histologically confirmed adenocarcinoma/retrospectiveAnalysis of data collected from Fudan University Shanghai Cancer CenterAlgorithm showed AUROC of 0.892, which was higher than three expert radiologists in classifying invasive adenocarcinoma from pre-invasive lesionsHistologyCNNCT scan from 1545 patientsRandom split sample validation/no
 Zhao [45]Thin-slice chest CT scan before surgical treatment; nodule diameter ≤10 mm/retrospectiveSecondary analysis of data from Huadong Hospital affiliated to Fudan UniversityBased on classification of tumour invasiveness, deep-learning algorithm achieved better classification performance than the radiologists (63.3% versus 56.6%)HistologyCNNPre-operative thin-slice CT; 523 nodules for training/128 nodules for testingNot reported/no
Fibrotic lung diseases
 Walsh [46]HRCT showing diffuse fibrotic lung disease confirmed by at least two thoracic radiologists/retrospectiveSecondary analysis of data from La Fondazione Policlinico, Universitario and University of Parma (Italy)Interobserver agreement between the algorithm and the radiologists’ majority opinion (n=91) was good (κw=0·69)Expert consensusCNNHRCT; 929 scans for training/89 scans for validationNot reported/yes
 Christe [47]HRCT showing NSIP or UIP confirmed by two thoracic radiologists/retrospectiveHRCT dataset from the Lung Tissue Research ConsortiumInterobserver agreements between the algorithm and the radiologists’ opinion were fair to moderate (κw=0.33 and 0.47)Expert consensusCNNHRCT 105 patients (54 of NSIP and 51 for UIP)Not reported/no
 Raghu [48]The whole-transcriptome RNA sequencing data from transbronchial biopsy samples/prospectiveBronchial Sample Collection for a Novel Genomic Test (BRAVE) study in 29 US and European sitesThe molecular signatures had high specificity (88%) and sensitivity (70%) against diagnostic reference pathology (ROC-AUC 0.87, 95% CI 0.76–0.98)HistologyML; type not reported94 patients in clinical utility analysisNot reported/no
PH
 Sweatt [49]Peripheral blood biobank/prospectivePeripheral blood biobanked at Stanford University, USA and University of Sheffield, UKFour distinct immunological clusters were identified. Cluster I had unique sets of upregulated proteins (TRAIL, CCL5, CCL7, CCL4, MIF), which was the cluster with the least favourable 5-year transplant-free survival rates (47.6%, 95% CI 35.4–64.1%)N/AUnsupervised MLBlood biobanked; 281 patients for discovery cohort/104 patients for validation cohortResampling method/yes
 Leha [50]Echocardiographic parameters/retrospectiveKing's College Hospital (UK); University Medical Center Gottingen and University of Regensburg (Germany)Among five ML algorithms, random forest of regression trees is the best method to identify PH patients (AUC 0.87, 95% CI 0.78–0.96) with accuracy of 0.83Right heart catheterisationFive ML algorithms (random forest of classification trees, random forest of regression trees, lasso-penalised logistic regression, boosted classification trees, SVM)90 patients with invasively determined PAP with corresponding echocardiographic estimations of PAPResampling method/no
Asthma
 Wu [51]100 clinical, physiological, inflammatory and demographic variables/prospectiveSevere Asthma Research Program (SARP) cohort from National Heart, Lung, and Blood Institutes (USA)Four asthma clusters with differing CS responses were identified. Those in CS-responsive cluster were older, more nasal polyps, and high blood eosinophils. After CS, there was the highest increase in lung function in this groupN/AUnsupervised ML; MML-MKKC346 adult asthmatics with paired (before and after CS) sputum dataRandom split sample validation/no
Pleural effusion
 Khemasuwan [52]19 candidate clinical variables from retrospective cohort of patients with pleural infectionA tertiary care, university-affiliated hospital, Utah, USACandidate predictors of tPA/DNase failure were the presence of pleural thickening (48% relative importance) and presence of an abscess/necrotising pneumonia (24%)N/ASupervised ML (extreme gradient boosting and coupled with decision trees)84 patients with pleural infection and received intrapleural tPA/DNaseRandom split sample validation/no
PFT interpretation and clinical diagnosis
 Topalovic [53]PFT tests and clinical diagnosis/prospectiveUniversity Hospital Leuven (Belgium)Pulmonologists’ interpretation of PFTs matched guideline in 74.4±5.9% of cases and made correct diagnosis in 44.6±8.7% versus AI algorithm matched the PFT pattern interpretations in 100% and assigned correct diagnosis in 82% (p<0.0001)ATS/ERS guideline and expert panelML; type not reportedDataset based on 1430 historical cases/50 cases in prospective analysisNot reported/yes
SARS-CoV-2 pandemic
 Wang [54]CT chest of patients with atypical pneumonia/retrospectiveXi'an Jiaotong University First Affiliated Hospital, Nanchang University First Hospital and Xi'An No.8 Hospital of Xi'An Medical College (China)An internal validation achieved a total accuracy of 82.9% with specificity of 80.5% and sensitivity of 84%. The external testing dataset showed a total accuracy of 73.1% with specificity of 67% and sensitivity of 74%Confirmed nucleic acid testing of SARS-CoV-2CNNCT images from 99 patients, of which 44 were confirmed cases of SARS-CoV-2Random split sample validation/no
 Li [55]CT chest of patients with atypical pneumonia/retrospectiveSix medical centres, ChinaAUC values for COVID-19 was 0.96 (95% CI 0.94–0.99). Sensitivity of 90% (95% CI 83–94%) and specificity 96% (95% CI 93–98%)Confirmed nucleic acid testing of SARS-CoV-2CNN4356 chest CT examinations from 3322 patientsNot reported/yes

PH: pulmonary hypertension; PFT: pulmonary function test; SARS-CoV-2: severe acute respiratory syndrome-coronavirus-2; CT: computed tomography; NLST: National Lung Screening Trial; ROC: receiver operating characteristic; AUC: area under the curve; CNN: convolutional neural network; AUROC: area under the ROC curve; NSCLC: nonsmall cell lung cancer; 18F-FDG PET: fluorine-18 2-fluoro-2-deoxy-d-glucose positron emission tomography; HRCT: high-resolution computed tomography; κw: weighted κ-coefficient; NSIP: nonspecific interstitial pneumonia; UIP: usual interstitial pneumonia; ML: machine learning; TRAIL: tumor necrosis factor-related apoptosis-inducing ligand; CCL: C-C motif chemokine ligand; MIF: macrophage migration inhibitory factor; N/A: not applicable; SVM: support vector machine; PAP: pulmonary arterial pressure; CS: corticosteroids; MML-MKKC: multiview learning-multiple Kernel k-means clustering; tPA: intrapleural tissue plasminogen activator; DNase: deoxyribonuclease; ATS: American Thoracic Society; ERS: European Respiratory Society; COVID-19: coronavirus disease 2019.