Elsevier

Clinica Chimica Acta

Volume 393, Issue 2, 17 July 2008, Pages 76-84
Clinica Chimica Acta

Detection of lung cancer using weighted digital analysis of breath biomarkers

https://doi.org/10.1016/j.cca.2008.02.021Get rights and content

Abstract

Background

A combination of biomarkers in a multivariate model may predict disease with greater accuracy than a single biomarker employed alone. We developed a non-linear method of multivariate analysis, weighted digital analysis (WDA), and evaluated its ability to predict lung cancer employing volatile biomarkers in the breath.

Methods

WDA generates a discriminant function to predict membership in disease vs no disease groups by determining weight, a cutoff value, and a sign for each predictor variable employed in the model. The weight of each predictor variable was the area under the curve (AUC) of the receiver operating characteristic (ROC) curve minus a fixed offset of 0.55, where the AUC was obtained by employing that predictor variable alone, as the sole marker of disease. The sign (±) was used to invert the predictor variable if a lower value indicated a higher probability of disease. When employed to predict the presence of a disease in a particular patient, the discriminant function was determined as the sum of the weights of all predictor variables that exceeded their cutoff values. The algorithm that generates the discriminant function is deterministic because parameters are calculated from each individual predictor variable without any optimization or adjustment. We employed WDA to re-evaluate data from a recent study of breath biomarkers of lung cancer, comprising the volatile organic compounds (VOCs) in the alveolar breath of 193 subjects with primary lung cancer and 211 controls with a negative chest CT.

Results

The WDA discriminant function accurately identified patients with lung cancer in a model employing 30 breath VOCs (ROC curve AUC = 0.90; sensitivity = 84.5%, specificity = 81.0%). These results were superior to multilinear regression analysis of the same data set (AUC = 0.74, sensitivity = 68.4, specificity = 73.5%). WDA test accuracy did not vary appreciably with TNM (tumor, node, metastasis) stage of disease, and results were not affected by tobacco smoking (ROC curve AUC = 0.92 in current smokers, 0.90 in former smokers). WDA was a robust predictor of lung cancer: random removal of 1/3 of the VOCs did not reduce the AUC of the ROC curve by > 10% (99.7% CI).

Conclusions

A test employing WDA of breath VOCs predicted lung cancer with accuracy similar to chest computed tomography. The algorithm identified dependencies that were not apparent with traditional linear methods. WDA appears to provide a useful new technique for non-linear multivariate analysis of data.

Introduction

Most diagnostic classifications are binary e.g., dead or alive, disease or no disease, cancer or no cancer. The physician's task is to correctly assign a patient to one group or the other by applying generally accepted criteria of membership. This presents little difficulty if the difference between the two groups is defined by a single criterion that can be readily estimated e.g., hypoglycemia can be distinguished from normoglycemia by measuring the blood glucose concentration, and then determining whether it is above or below a designated cutoff value. However, clinical diagnosis is usually more difficult because assignment to one group or the other generally requires a combination of several different criteria. For example, a patient with acute streptococcal pneumonia may present with several different symptoms and signs, including fever, chills, productive cough, herpes labialis, vocal fremitus, and localized crackles in the chest. As an additional complication, not all of these features are required for the diagnosis, and some are more important than others. An experienced physician takes these difficulties into account by intuitively assigning a different diagnostic weight to each finding. For example, fever has a comparatively low diagnostic weight because it may be absent in elderly or immune suppressed patients, but the diagnostic weight of chest crackles is much higher. This process of intuitively weighing all of these relative values and then incorporating them into a binary prediction is often described as “diagnostic clinical judgment”.

Since the exercise of diagnostic clinical judgment is an intuitive process, its outcome must necessarily vary with an individual's skill and experience. As a consequence, physicians have sought to improve the accuracy and reproducibility of clinical judgment by employing formal algorithms to assign patients to appropriate diagnostic groups. As early as 1944, Jones proposed an algorithm for the diagnosis of rheumatic fever that employed a combination of “high weight” major criteria (including carditis and polyarthritis) and “low weight” minor criteria (including fever and arthralgia). As evidence of its clinical value, a modified version of this algorithm is still in clinical use > 60 y later [1]. The accuracy and reproducibility of diagnostic laboratory tests can also be improved in the same fashion. A predictive algorithm employing the relative diagnostic weights of two or more biomarkers of lung cancer in combination can predict disease with greater accuracy than a single biomarker employed alone [2].

Non-linear multivariate statistical analysis provides a useful tool for determining the relative weights of clinical markers of disease and incorporating them into new diagnostic algorithms. We have previously reported that biomarkers in the breath predict lung cancer [3], [4], breast cancer [5], pulmonary tuberculosis [6], and heart transplant rejection [7]. All of these tests assigned a relative weight to a number of different biomarkers – volatile organic compounds (VOCs) in the breath – and incorporated them into a predictive algorithm with a binary outcome i.e. disease or no disease. A non-linear multivariate model employing fuzzy logic predicted lung cancer with greater accuracy than multilinear analysis [8].

We report here a new method for non-linear modeling. Weighted digital analysis (WDA) determines the relative weights of a set of VOCs as biomarkers of disease, and incorporates them into an algorithm to predict the presence or absence of disease. We present evidence for the effectiveness of WDA in a reanalysis of data obtained from a previous study of breath VOC biomarkers of lung cancer.

Section snippets

Clinical study

This has been previously reported [8]. In summary, breath samples were collected from 404 patients: 193 with untreated primary lung cancer and 211 controls with no evidence of cancer on chest CT. VOCs in alveolar breath and ambient air were analyzed by gas chromatography and mass spectroscopy. The data set comprised the breath VOCs in patients with untreated primary lung cancer and the controls.

Determination of alveolar gradients

The alveolar gradient is the difference between the abundance of a VOC in breath and air, and the

Candidate biomarkers of lung cancer

Fig. 1 displays the accuracy of a typical single breath VOC employed as a biomarker of lung cancer. Note that this VOC alone did not significantly distinguish between cancer patients and the control group; it required the combined results of several VOCs to generate a discriminant function with a significant AUC value. Fig. 2 displays the effect of random patient assignment on predictive accuracy of individual breath VOCs, and how these results were employed to identify the optimal candidate

Discussion

A test employing WDA of a combination of breath VOCs accurately identified patients with lung cancer. The accuracy of the breath test may be directly compared to that of chest CT. In a large population screening study, chest CT detected lung cancer with 55% sensitivity and 95% specificity [11]. As the ROC curve in Fig. 4 demonstrates, at the point where the breath test sensitivity was 55%, its specificity was approximately 93% i.e. close to the same as chest CT.

Table 1 lists the breath VOCs

Acknowledgements

This research was supported by SBIR award 5R44HL070411-03 from the National Heart Lung and Blood Institute of the National Institutes of Health. William N Rom was supported by Early Detection Research Network grant UO1 CA086137. Michael Phillips is President and CEO of Menssana Research, Inc. All coauthors had full access to all of the data in the study and they take responsibility for the integrity of the data and the accuracy of the data analysis. We thank James J. Grady DrPH for reviewing

References (27)

  • R. Puatanachokchai et al.

    Alpha-benzene hexachloride exerts hormesis in preneoplastic lesion formation of rat hepatocarcinogenesis with the possible role for hepatic detoxifying enzymes

    Cancer Lett

    (2006)
  • C.M. Bertea et al.

    Demonstration that menthofuran synthase of mint (Mentha) is a cytochrome P450 monooxygenase: cloning, functional expression, and characterization of the responsible gene

    Arch Biochem Biophys

    (2001)
  • J.P.G. Schneider et al.

    Fuzzy logic-based tumor marker profiles including a new marker tumor M2-PK improved sensitivity to the detection of progression in lung cancer patients

    Anticancer Res

    (2003)
  • Cited by (0)

    1

    Present address: Department of Psychiatry & Human Behavior, University of Mississippi Medical Center, Jackson, MS 39216, United States.

    2

    Present address: Saint Vincent Catholic Medical Centers, Staten Island Campus, Staten Island, NY 10310, United States.

    View full text