Abstract
The aim of this article is to summarise the published data on prognostic and predictive biomarkers for early-stage non-small cell lung cancer (NSCLC), and discuss how to integrate them into clinical trials. Large phase III trials have been published in resected NSCLC with biomarkers identifying subsets of patients benefitting from perioperative chemotherapy. Initial findings of predictive implications for the DNA repair protein, ERCC1, were not confirmed in a larger series of patients due to the versatility of the commercially available monoclonal ERCC1 antibody. Prediction of survival by RRM1 tumour expression was not confirmed in a prospective phase III trial in 275 patients with stage IV disease, precluding its use in early-stage NSCLC. BRCA1 mRNA tumour content also failed to predict platinum resistance in 287 stage IV NSCLC patients included in a phase II trial, and the results of a similar trial in early-stage patients are still pending. Of the several cDNA gene expression studies in early-stage NSCLC with non-overlapping prognostic signatures, few have been replicated in independent cohorts for prognostic value, and none received external validation for predictive value. Therefore, use of biomarkers predicting chemotherapy efficacy still needs additional validation before becoming routine practice in oncogene-driven pan-negative NSCLC patients.
Introduction
Despite recent advances, even after early diagnosis and resection with curative intent, in stage I–III non-small cell lung cancer (NSCLC) patients, occurrence of extrathoracic metastasis results in a 5-year survival rate of <50% [1]. Meta-analyses only found a 5-year absolute benefit survival of 5–6% for perioperative cisplatin-based chemotherapy [2–4]. In the International Adjuvant Lung Cancer Trial (IALT) [5], a 4.1% survival advantage was observed in the chemotherapy arm, meaning that 25 patients had to be treated for just one patient to benefit. Therefore, in most stage I–II patients, perioperative chemotherapy is, at best, not required in terms of survival and is sometimes detrimental in terms of quality of life or toxicity. There is an urgent need to identify subgroups of patients who could avoid exposure to potentially toxic inefficacious treatment or, conversely, patients who could actually benefit from adjuvant treatment. Therefore, biomarker studies have been designed to identify which subsets of patients would be the most susceptible to take advantage from specific therapeutic strategies or drugs [6].Series: “Topics in thoracic oncology”Edited by G. Zalcman and N. GirardNumber 4 in this series
Issues for predictive biomarkers in NSCLC
The first clinical, prospective, randomised trials in NSCLC patients using ancillary biomarkers have been published over the past decade. Despite REMARK (Reporting Recommendations for Tumor Marker Prognostic Studies) guidelines recommending stringent criteria for standardisation of prognostic biomarker studies [7], most recent studies simply could not fulfil such stringent criteria and still differ methodologically from each other, thus obscuring clinical value and currently impairing routine use of biomarkers in NSCLC [8].
For lung cancer the only biomarkers reaching a significant predictive effect for a targeted therapy, as compared with chemotherapy, are epidermal growth factor receptor (EGFR) mutations [9, 10] and anaplastic lymphoma kinase (ALK) rearrangements [11] in oncogene-driven tumours, representing ∼15% of metastatic NSCLC. Indeed, five prospective phase III trials in Asian and Caucasian populations clearly showed that EGFR kinase inhibitors (erlotinib, gefitinib or afatinib) were superior to cisplatin-based chemotherapy in the first-line setting in patients with EGFR activation mutations [9, 12–15]. However, these trials only showed a progression free survival (PFS) advantage, with no significant difference for overall survival as crossover with EGFR tyrosine kinase inhibitors (TKIs) was frequent in patients randomised in the chemotherapy arm. In addition, these trials only included metastatic patients and did not raise the question of adjuvant targeted therapy in EGFR-mutated patients with early-stage disease, an issue still waiting for validation. Indeed, large differences have been observed between the metastatic and adjuvant setting in colorectal and breast cancer, but results of EFGR TKI trials in stage IV NSCLC patients cannot be extended to early-stage patients.
For ALK rearrangement, another molecular tumour alteration, the first-generation ALK inhibitor crizotinib has been shown to be superior to chemotherapy in the second-line setting [16]. Crizotinib validation versus chemotherapy may be demonstrated in phase III trials that have fulfilled their recruitment goals quickly, whereas second-generation ALK inhibitors have shown promising results in phase I–II trials [17, 18]. These results were obtained in metastatic patients. No data on the potential interest of such compounds in the adjuvant setting versus classical platinum-based adjuvant chemotherapy are available in early-stage NSCLC with ALK translocation. Since prognosis of stage IV patients with ALK-rearranged tumours appears to be slightly lower than for patients with EGFR mutations or wild-type tumours in retrospective studies, such an adjuvant role could reasonably be investigated in future prospective early-stage NSCLC trials. The trials would require a large cohort and a long follow-up in order to come to any firm conclusions, and they must take into account the rarity of this abnormality (4–5% of metastatic NSCLC, with no data on its frequency in early-stage disease). Other rare mutation events, such as B-Raf mutations [19], could potentially also have their own targeted therapy and appear to have a predictive value for efficacy of the specific kinase inhibitors in preliminary phase I studies [20]; however, they only represent 1–2% of NSCLC, with no available data in stage I–II patients. Finally, all available molecular-targeted therapies have been developed in non-squamous NSCLC, predominantly adenocarcinoma, and there is currently no predictive marker for any drug in squamous cell carcinomas, which still represent 30% of NSCLC. Over time, fibroblast growth factor receptor (FGFR) inhibitors (directed at FGFR1 and 4 amplification in 20% of squamous cell carcinoma and FGFR2 and 3 mutations in 6% of squamous cell carcinoma) [21, 22] and immune-checkpoint inhibitors (antibodies directed toward PD-1 or PD-L1) will show significant advantages to classical chemotherapy in the phase III setting [23]. Therefore, the large majority of NSCLC patients still require platinum-based chemotherapy, leading to only a 30–60% response rate for stage IV or stages I–III. In these patients there are as many as 50% of p53 mutations with a prognostic value, but which are not actionable mutations, and are targeted by specific drugs [24].
Biomarkers of platinum-based chemotherapy efficacy
As all data come from trials that did not include the ability to perform the biomarker analysis as an inclusion criteria, and because the retrospective specimen collection rate has often been low in non-oncogene driven NSCLC prospective trials, there is a risk that biomarker study populations differ from the whole population. In such studies it remains questionable as to whether prognostic issues in a subset can be extended to the whole study population (selection bias).
A majority of studies present the study as a patient flow diagram, however, only a minority provide a statistical comparison of both subsets with or without biomarker analysis. For example, in the IALT phase III trial for stage I–IIIA resected NSCLC, ERCC1 tumour staining was performed in only 41% of the whole population trial, comprised of 28 centres that recruited >10 patients [25]. The 28 centres contributed 867 pathological blocks from 1045 (83%) included patients. In this subset, patients with or without ERCC1 analysis had similar clinical characteristics. However, no information was provided about possible clinical differences between the 867 ERCC1 subset blocks and the 994 remaining patients. The only neoadjuvant chemotherapy phase III trial in early lung cancer, IFCT-0002, reported molecular analyses in 39% of the included patients for whom snap-frozen tumour preservation was possible; however, TUBB3 immunohistochemistry was possible in 78% of patients [26].
Is the sample representative of the disease treated in this trial?
In prognostic molecular studies, the specimen chosen for analysis could induce false interpretation simply because it only reflects a small part of an oligoclonal tumour. Thus, a core biopsy could differ from the whole tumour or from distant metastases. Alternatively, the group of patients submitted to molecular analysis could be falsely presented as representative of a particular clinical group, leading to an “over-interpretation bias” if molecular features are thought to be prototypic of a different population.
In metastatic NSCLC, histological diagnosis mainly relies on core biopsies often containing hundreds of tumour cells, sometimes representing <20% of the tissue biopsy. In early stage NSCLC, there is the possibility to analyse core bronchial biopsies obtained by fibreoptic endoscopy of trans-thoracic computed tomography-guided biopsies before surgery, or surgical samples obtained after radical surgical resection. Taking into account tumour heterogeneity and the oligoclonal nature or lung cancer proliferation, it is questionable as to whether core biopsies are actually representative of the whole primitive tumour or of the different clones that could arise in different parts of the tumour, i.e. in mediastinal metastatic (N2) nodes, or subsequent brain, bone or liver metastases. Some studies have addressed this question and systematically compared molecular analysis of core biopsies and surgical specimens in the same patients; some histochemical markers were shown to vary considerably across the different types of pathological specimens analysed [27]. A lower number of studies compared primitive NSCLC tumours and metastases that occurred synchronously or later in the disease evolution. In such studies, driver mutations, such as EGFR and p53 mutations detected in core biopsies or in the primitive surgically removed tumour, are generally also detected in corresponding surgical specimens, distant metastases or synchronous clonal tumours [28–31]. The same features are seen for K-Ras mutations, but the data are mainly from studies of colorectal cancer [32]. Conversely, gain of such driver mutations appears to be rare in metastases or after treatment for initially mutation-negative primitive tumours. However, gain of additional non-driver molecular events seems possible within the primitive tumour itself in different sub-clones or in distant metastases, such as gain or loss of chromosomal material, amplification or deletion of critical metastasis genes, and influencing protein expression that is crucial for chemoresistance. For a chemotherapy resistance biomarker, the analysis on one core sample from surgical resection may not be representative of the whole tumoural heterogeneity and variability.
Is the method robust, reproducible, widely accessible and cost-effective?
As mentioned previously, molecular diagnosis in NSCLC patients often relies on small bronchial biopsies containing hundreds of tumour cells surrounded by stromal tissue. Therefore, molecular assays need to be sensitive, able to discriminate tumour-mutated cells from surrounding normal tissue and easy to perform in the laboratory. Recent progress in molecular biology techniques has resolved this issue for EGFR mutations in metastatic NSCLC by using cost-effective and sensitive allele-specific PCR-based assays or next-generation sequencing. This latter technology raises the issue of cost-effectiveness and some discordance across different laboratories is still being observed [33].
Methylation studies of promoter genes, such as the tumour-suppressor gene RASSF1A, have also been questioned. In the IFCT-0002 phase III trial [34], a robust cost-effective methylation-specific PCR assay was used, which produced reproducible results in two independent laboratories in the same series of patients and showed a prognostic and predictive role for RASSF1A [35]. In this study, RASSF1A methylation was associated with a significantly worse overall survival despite perioperative chemotherapy, suggesting that patients with RASSF1A methylation should have benefitted from alternative adjuvant treatments (e.g. immune-checkpoint inhibitors). Conversely, patients with RASSF1A methylation had significantly longer PFS with paclitaxel-based perioperative chemotherapy as compared with methylated patients treated by gemcitabine-based chemotherapy, with a significant interaction test showing this predictive value. Recently, more quantitative techniques have been used, such as pyrosequencing, that give additional precise information about the length of the promoter region being methylated, i.e. promoter gene methylation intensity. For promoter gene methylation assays, the main remaining issue is how methylation correlates with gene expression. In the IFCT-0002 trial, mutagenically separated-PCR was shown to correlate with statistically significant down-expression of RASSF1A protein as detected by Western blotting analysis of tumoural protein extracts [35]. Of the six major CpG island promoters, the methylation level is <20% and is sufficient to affect RASSF1A protein expression [36]. Conversely, the methylation level at the MTHFR promoter gene needs to be >80% to downregulate MTHFR mRNA. For such a gene, there would not be a strict correlation between moderate levels of methylation as detected by sensitive mutagenically separated PCR assays and protein expression. Therefore, methylation assays need to be correlated to gene expression for each specific gene.
Whereas immunohistochemistry on paraffin-embedded specimens is certainly the most effective technique for biomarker assessment and is widely used in hospital pathological laboratories, the issue of reproducibility remains for immunohistochemistry biomarkers. The historical example of ERCC1 immunohistochemistry shows how important it is that prognostic and predictive results are reproduced by different laboratories, in different geographical/ethnic populations (but comparable in terms of prognostic factors, especially TNM, type of treatment and histology), and during different time-periods. In their seminal paper, Olaussen et al. [25] reported the good prognostic value of ERCC1 high-intensity (above median value) tumour immunostaining, and the predictive value for lower overall survival of high expression of tumour ERCC1 in the adjuvant chemotherapy arm in stage I–IIIA patients. These results seemed to be confirmed by studies using ERCC1 mRNA quantification [37] in stage IV patients [38] and in a study with concomitant RRM1 immunostaining analysis or RRM1 mRNA level assessment [39, 40]. Moreover, in the long-term follow-up IALT study, if the survival advantage in the adjuvant chemotherapy arm disappeared, the predictive value of ERCC1 immunostaining was maintained after 7.5 years of median follow-up [41]. In the IFCT-0002 neoadjuvant series of patients, the prognostic role of ERCC1 immunostaining was also reported, although one major difference was the cut-off used in this study, i.e. the first quartile and not the median value [42].
The breakthrough impact of the IALT results in early-stage NSCLC patients led to a Spanish phase III trial by Cobo et al. [43], the GILT trial. The authors applied putative predictive impact or ERCC1 expression for customising the chemotherapy treatment of stage IV NSCLC patients [43]. They randomised patients (1:2 ratio) between a control arm with docetaxel/cisplatin for all-comers and a genotypic arm according to ERCC1 mRNA expression. Patients with a high ERCC1 level were allocated to a non-cisplatin-based regimen with docetaxel followed by gemcitabien, whereas patients with low ERCC1 levels were treated with docetexal followed by cisplatin. The primary end-point was overall response rate. They screened 444 patients, but only randomised 141 patients in the control arm and 228 in the genotypic arm. Indeed, there was a 14% failure rate for quantitative reverse transcriptase (qRT)-PCR and 18% of patients were excluded because of insufficient pathological material. Response rate was better in the genotypic than control arm, 51.2% versus 39.3% respectively, but PFS and overall survival did not significantly differ. Two major pitfalls were discussed for these disappointing results. First, the extrapolation of ERCC1 results in early-stage patients to metastatic patients, and secondly the use of ERCC1 mRNA and not protein content. Some studies showed the multiplicity of ERCC1 isoforms [44] and the lack of correlation between qRT-PCR ERCC1 mRNA quantification and immunohistochemistry [45]. In addition, the impact of ERCC1 tumour content on sensitivity to gemcitabine could have obscured the final results of the study. However, failure of this customised trial in stage IV patients probably reflected a major underlying problem with the ERCC1 biomarker, as the results of further confirmatory studies in early-stage resected NSCLC were disappointing.
In actual fact, initial ERCC1 findings from the IALT study were finally refuted this year in a larger population of comparable early-stage NSCLC patients of the LACE-Bio consortium, which included patients from the IALT, CALGB 9633 and JBR10 trials of adjuvant platinum-based chemotherapy trials, in which adjuvant chemotherapy was randomised versus observation [44]. In this series of 1083 patients (as compared with the 867 initial patients from the IALT trial, thus only representing a modest increment of the study population), staining was performed with what was thought to be the same monoclonal ERCC1 antibody, clone 8F1, distributed by the same manufacturer. In fact, no predictive value was found in this confirmatory study, even by using different cut-off quartile points. The authors reported that mRNA quantification of two different ERCC1 isoforms (out of the four described) was not associated with any prognostic value. But the puzzling issue was the high number of positive specimens with the same scoring based on a classical H-score; 77% of positive samples in the new study compared with 44% in the historical study of 2006. The authors checked the absence of reader interpretation biases and there was an interobserver concordance >95% with respect to ERCC1 positivity. They then demonstrated that batches of the 8F1 monoclonal antibody used in the two studies actually varied as their specificity was different. Indeed, both batches recognised the four ERCC1 isoforms but with different affinities, and only the ERCC1 202 isoform seemed to be responsible for the DNA repair function, raising the possibility that a differential affinity of the two batches could lead to preferential detection of the 202 isoform and lead to a predictive value in the 2006 study results.
Before the LACE-Bio ERCC1 validation study results were reported, the French Intergroup IFCT launched a prospective phase II–III prospective trial of customised adjuvant treatment in early-stage NSCLC patients [46]. The TASTE (Tailored Post-Surgical Therapy in Early stage NSCLC) trial incorporated ERCC1 staining and EGFR mutations in a customised adjuvant chemotherapy trail in non-squamous stage II and IIIA (non N2) patients with curative surgical resection. Patients were prospectively randomised to a control arm in which they received four cycles of pemetrexed adjuvant chemotherapy or a customised arm in which patients with EFGR mutations received erlotinib. In the EGFR wild-type cohort, the patients with high levels of ERCC1 were observed, and the patients with low levels of ERCC1 were treated with pemetrexed cisplatin adjuvant chemotherapy. The phase II part of this study was a feasibility trial, of which the rate of patients able to start therapy within 2 months of surgery was the end-point and the biomarker’s status was readily available, with a non-feasibility boundary at 64%. 150 patients were included over 3 years in 29 centres; 74 in the standard arm and 76 the experimental customised arm (seven EGFR mutated and 69 EGFR wild-type). The success rate was 80%. 120 out of 150 patients had molecular results within the two post-operative months and were actually randomised, thus showing the feasibility of such an approach in the prospective setting. For eight (5%) patients biomarker status was not available at the time and 17 (11%) patients had not recovered from surgery and were unable to receive adjuvant chemotherapy. However, a low rate of ERCC1-positive patients (25%) was observed; 44% had been expected as in IALT study. The trial was stopped due to the results of the LACE-Bio ERCC1 replication study and data on the probable discording specificities of the 2006 and 2009–2012 ERCC1 monoclonal antibody batches, which could have explained the diverging frequencies of ERCC1 positive immunostaining IALT and TASTE studies.
Findings in the LACE-Bio ERCC1 study, as in the TASTE trial, showed that the monoclonal origin of a given antibody is not guaranteed assurance for its stability over time in terms of specificity, and that antibodies for clinical use need to be certified and cross-checked over a large period of time. Such control is necessary to avoid multiple isoforms that are recognised by the same antibodies, which obscure the results of prognostic studies by functional differences. This also underlines the need to associate functional well-designed studies with pure expression studies.
Such difficulty was not encountered with MutS homologue 2 (MSH2) commercial antibodies. The IALT group studied MSH2 immunostaining [41] using a commercially available antibody to screen for mutations in the MSH2 gene that are associated with microsatellite instability. In hereditary non-polyposis colorectal cancer these mutations lead to a lack of MSH2 protein expression. As for the initial findings with ERCC1, low MSH2 content, as determined by the H-score with a cut-off point set at the median value, was significantly associated with longer survival in patients receiving cisplatin-based adjuvant chemotherapy (predictive value), whereas high MSH2 content was associated with a good prognosis in the control group (prognostic value). A test for interaction between chemotherapy and MSH2 staining produced a borderline p-value (p=0.06), possibly because of a lack of power. There was a meaningful 8% difference in 5-year overall survival from 41% in the control arm to 49% in the chemotherapy arm for patients with MSH2 low tumours, with a median long-term follow-up of 7.5 years. In the neo-adjuvant chemotherapy trial IFCT0002, the authors reported the results of MSH2 staining performed on surgical samples, evaluated by the same pathologist as in IALT-Bio study with the same monoclonal antibody, FE11 (Calbiochem, Gibbstown, NJ, USA), in early-stage NSCLC patients that received chemotherapy before immunohistochemistry analyses [42]. The quality of the staining was good with a cut-off point reported to be the median value presented in the IALT-Bio study. An apparently divergent result was observed since patients with tumours with low MSH2 content experienced a significantly longer overall survival than patients with MSH2 high tumours. However, patients received chemotherapy before surgery and immunohistochemistry analyses; therefore, this result is coherent with a predictive effect for MSH2 staining. Therefore, confirmatory studies are required in order to determine whether an MSH2 biomarker could be reliably used for the prediction of adjuvant chemotherapy, in order to erase the scepticism induced by the recent failure with ERCC1.
As mentioned previously, RRM1 tumour content has been evaluated in several studies by immunohistochemistry and the more sophisticated immunofluorescence automated quantitative analysis technology, and has shown promising results. Again, high RRM1 content was found to correlate with good prognosis but with resistance to gemcitabine-based chemotherapy in a retrospective series of stage IV or early-stage NSCLC patients. However, the most extensively studied monoclonal antibody was not commercially available and came from the group of Bepler; thus, other groups had limited access. In a small retrospective series of 187 early-stage (p-Stage I) resected patients, Zheng et al. [47] showed a good correlation between RRM1 and ERCC1 (the oldest batch). High ERCC1 patients with concomitant high RRM1 patients were the subset of patients with the best survival. A few months after this seminal paper was published, Simon et al. [48] reported the results of a phase II trial (MADeIt) customising chemotherapy in advanced NSCLC according to RRM1 and ERCC1 content using a time-consuming qRT-PCR methodology following laser capture microdissection on core biopsies. Out of 85 patients who were initially screened, the authors demonstrated in 53 out of 60 eligible patients that such an approach is feasible in a phase II setting. Patients with low RRM1 content received gemcitabine-based chemotherapy either in association with docetaxel (in the case of high ERCC1 content) or with carboplatin (in the case of low ERCC1 content). Patients with high RRM1 tumours received docetaxel-based chemotherapy with either vinorelbine (in the case of high ERCC1) or carboplatin (in the case of low ERCC1). In this overall population receiving customised treatment, the authors reported promising results of partial response (44%) and disease control rate (88%) translating into an encouraging 1-year survival rate of 59% and a median survival of 13.3 months. Thus, Bepler et al. [49] reported the results of a phase III trial that used immunohistochemistry with automated quantitative analysis technology for assessing ERCC1 and RRM1 tumour content in 267 stage IIIB/IV, performance status 0–1 patients. Patients were randomised 2:1 into an experimental customised chemotherapy arm according to ERCC1 and RRM1 tumour content as in the previous phase II trial, and a control arm in which all patients received carboplatin plus gemcitabine regardless of their ERCC1 and RRM1 tumour content. The primary goal was to increase 6-month PFS from 38% to 50%. The secondary goal was to increase median overall survival from 8 months to 12 months. The phase II trial suggested it was a reasonable objective. 307 eligible patients were screened. The initial DNA repair gene expression analysis failed in 53 (17%) leading to 47 re-biopsies with successful immunohistochemistry analysis. Therefore, 275 patients were treated according to the design (experimental arm: n=183; control arm: n=92). The two arms were well balanced with a 1:1 sex ratio, patients more frequently having adenocarcinoma (52.5%) and 70% of patients being in performance status 1. The 6-month PFS in the experimental arm was 52%, but the control arm showed an unexpected 6-month PFS rate of 56.5% with no significant differences, translating into the same 11 month median overall survival in both arms. Thus, customising chemotherapy according to ERCC1 and RRM1 failed to significantly increase survival in stage IV patients precluding any use in early-stage patients.
High BRCA1 mRNA expression (as assessed by qRT-PCR) was also suggested as a prognostic biomarker of worse survival in resected NSCLC [50], and as a predictor of lower response rate in stage II–IIIA patients who received neoadjuvant cisplatin-based chemotherapy followed by complete resection [51]. Conversely, BRCA1 was reported to be a predictor of good response and longer time to progression in stage IV patients receiving a non-cisplatin based regimen; the docetaxel plus gemcitabine doublet [52]. Therefore, BRCA1 was suggested as useful to customise chemotherapy; patients with high-expression levels respond more favourably to taxanes whereas patients with BRCA1 low expression could be cisplatinum sensitive. Such a concept led to the Spanish phase III trial customising adjuvant chemotherapy according to the BRCA1 mRNA expression quartile in the experimental arm (quartile 1: treatment by gemcitabine and cisplatin; quartile 2: treatment by docetaxel and cisplatin; quartile 3: treatment by docetaxel alone) versus a control arm of docetaxel plus cisplatin for all-comers with a 3:1 randomisation (Spanish Customized Adjuvant Treatment, NCT00478699). Up to May 2013, 500 patients had been included (non-customised control arm: n=108; customised experimental arm: n=392); however, at present, the final results are still pending. Meanwhile, results of another Spanish phase III trial customising chemotherapy according to BRCA1 and RAP80 tumour expression in extensive-stage NSCLC were presented at the 2013 American Society of Clinical Oncology. RAP80 is another DNA-repair protein of which low expression was also suggested as influencing PFS and overall survival when associated with low levels of BRCA1 [53]. In this trial the reference arm was docetaxel/cisplatin for all-comers, the experimental customised arm (1:1 randomisation) allocated gemcitabine and cisplatin to patients with low RAP80 regardless of BRCA1 level, docetaxel and cisplatin was allocated in patients with intermediate RAP80 levels but low BRCA1 (tertiles 1 and 2), and docetaxel alone was allocated in RAP80 and BRCA1 high expressing patients [54]. Again, the final results were disappointing after randomisation of 382 (versus 480 initially planned) patients with a lower median PFS in the experimental arm than in the control arm (4.38 versus 5.49 with a trend for significance; HR 1.35 (95% CI 1.02–1.78), p=0.07). Median overall survival was significantly lower by 4 months in the experimental arm (8.52 versus 12.66 months, p=0.006). The median PFS and overall survival were 2.5 months and 7.5 months, respectively, for the 43 patients only receiving docetaxel monotherapy in the first-line setting, clearly invalidating the experimental hypothesis of a better response and survival with taxanes for patients with higher RAP80 and BRCA1 content.
Successive failure of these phase III trials in stage IV patients has raised general uncertainty on chemotherapy customisation by the DNA repair pathway as compared with the widely accepted customisation treatments by oncogene-driven genotypes (EGFR and ALK). Since DNA repair status is of great interest, the lack of accurate tools precludes a reliable estimation of DNA repair efficacy in tumours and, therefore, the use of DNA repair biomarkers in advanced stage patients as in early-stage patients.
Are the methods statistically sound and the results strictly confirmed by an internal or external validation method?
Once the question of variability of molecular or immunohistochemistry assays is answered across different laboratories, limiting the confusion bias is a major step of the prognostic analysis. This is performed using multivariate analysis controlling for stratification factors of the trial and confounding clinical or pathological characteristics, which influences survival in the univariate analysis with a given p-value of <0.1 or <0.2 (depending on the studies).
The next point is to check that the results obtained in one specific series of patients could not have been obtained by chance, taking into account possible variations across different series of patients, including: geographical origin, smoking status (smokers or nonsmokers), chemotherapy-naïve or received chemotherapy, and varying sex ratios or ages (reproducibility of the biomarker prognostic influence).
False-positivity results risk limitation
The main methodological artefact is the multiplicity of analyses performed within a single series of patients. To our knowledge, IALT-Bio is the only trial listed in this review that mentioned an a priori power calculation for the biomarker study. The authors specified the need to analyse at least 800 specimens to provide a 66% power to detect a 20% survival difference at 5 years (HR 1.2), with a two-sided 1% α-level, taking into account 11 clinical and prognostic factors and three factors used in the stratified randomisation. IALT-Bio investigators reported multiple immunohistochemical analyses (at least 10) including ERCC1, MSH2 and p27, leading to a global model with ≥24 different variables [25, 41, 55, 56]. Additional variables should be added as grouped ERCC1/MSH2 and MSH2/p27 analyses were also reported. All reported p-values considered significant were ≤0.01 except one; low MSH2 prediction of a longer survival, p=0.03. However, two sequential analyses were performed at 5 years and 7.5 years of median follow-up. In this latter analysis, low ERCC1 content was still associated with better survival in patients who had received adjuvant chemotherapy, but with a substantial increase of HR between the 2006 and 2008 analyses, whereas the impact of adjuvant chemotherapy on overall survival was lost at 7.5 years of median follow-up in the clinical study population [57].
Some methodological techniques have been proposed to correct the potential effect of multiple analyses and limit the risk of false-positive results, such as Bonferroni–Holm's or Hochberg's correction, which are widely used for cDNA microarray analyses [58]. Such methods account not only for the multiplicity of variables, but also for the number of sequential analyses performed during the period of follow-up. However, to our knowledge, only the IFCT 0002 study reported the use of such a statistical correction in a prospective biomarker study in NSCLC, which re-enforces the prognostic and predictive value of RASSF1A methylation [35].
External and internal validation
Reproducibility of prognostic biomarker studies is a major issue. One possibility is to reproduce the molecular assay analysis in an independent series of patients from a distinct research group, but which are apparently comparable to the initial series, in terms of all the identified clinical and pathological variables. This “external” validation has mainly been used in microarray studies of NSCLC specimens [59], since a complete lack of concordance across in the numerous gene signatures has been published previously and claimed to be of prognostic value (table 1) [66, 67]. One of the best characterised signatures comes from the JBR10 adjuvant chemotherapy NCIC trial established in 134 out of the 482 patients who were included in the trial (observation arm: n=63; adjuvant chemotherapy arm: n=71) [61]. This cDNA expression microarray signature of 15 genes had a strong prognostic value (HR >13.0) in all patients, stage I patients or stage II patients thus showing a good stability over different prognostic groups. This was replicated in four independent cohorts of patients comprising both patients with and without adjuvant chemotherapy, both major NSCLC histology subtypes (adenocarcinoma and squamous cell carcinoma) and patients from the USA or Europe [67]; however, HR for death was lower in those replication series (from 2.01 to 3.18). This signature was also predictive for survival in patients receiving adjuvant chemotherapy as opposed to patients who did not (univariate interaction test <0.01) in the whole cohort, as in the two subsets of stage I or II patients this predictive value was replicated in the same series with another molecular technique (qRT-PCR). However, this predictive value could not be tested in an independent series of patients due to the small numbers of patients in each subset of the external series, and this predictive value has not received external validation in an independent series of patients. A second molecular signature was initially derived in breast cancer [68], and was described as a malignancy-risk gene signature as it was enriched in cell proliferation genes and associated with cancer risk in normal breast tissue, possibly reflecting a commune loss of cell cycle regulation in the earliest stages of cell transformation. This 94-gene signature was evaluated in the 442 samples from the Director's Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma [67], which contained stage IA–III samples from resected NSCLC [63]. Replication studies used a series of 117 resected early stage adenocarcinoma samples with no adjuvant chemotherapy and the JBR10 series of samples described above, in which half of the patients received adjuvant chemotherapy. Adjusted HR for death was 2.14 (95% CI 1.42–3.272) in the high-malignancy risk group, with a 45.2% 5-year overall survival as compared with 64.6% in the low-malignancy risk group. This prognostic value was replicated within each prognostic group (within each stage (I–III) in T2 tumours and within in each N subset (N0–N2)). After adjusting for multiple testing at a false discovery rate level of 0.01, 67 out of the 94 initial signatures were statistically associated with this prognostic value, of which overexpression of 48 genes involved in cell proliferation correlated with shorter overall survival. This prognostic value was fairly replicated in the subsets of patients from the three data sets who did not receive adjuvant chemotherapy, thus showing a good external validation for prognostics value. Moreover, a predictive value was reported in the JBR10 dataset with a significant interaction test (p-value for interaction 0.02). The patients in the high-malignancy group who received adjuvant chemotherapy had significantly longer survival (5-year survival 72.7%) than those in the observation group (5-year survival 39.2%), whereas in the low-malignancy risk group no statistically significant survival differences were found between the adjuvant chemotherapy and observation groups. Again, as in JBR10 study, the lack of gene-expression data in an independent cohort of patients with prospective randomisation between observation or adjuvant chemotherapy prevented the external validation of this predictive value.
The most convincing prognostic gene expression signature came from Kratz et al. [64] who derived a registered assay (Pervenio Lung RS Assay; Life Technologies, Paisley, UK) from a 14-gene expression qRT-PCR assay (11 prognostic genes and three reference genes) run in paraffin-embedded tissue samples, from a cohort of 361 patients with stage I nonsquamous NSCLC resected in hospitals in California, USA. The signature was validated on an independent cohort of 433 stage I nonsquamous NSCLC patients resected at Kaiser Permanente hospitals (Northern California), and finally on an external validation cohort of 1006 stage I–III patients from centres in the China Trials Consortium. The multivariate prognostic value of this signature was assessed in the three series of patients, with a 20% decrease of 5-year overall survival in high-risk groups of patients from 71.4% (95% CI 90.5–90.0) in the low-risk subset to 49.2% (95% CI 49.9–66.6) in the high-risk group from the initial series, and comparable quantitative overall survival results in the two validation series. The strength of this molecular signature was its practicability, since it only required paraffin-embedded specimens as opposed to previously published signatures [61, 63], making it relevant to the routine management of patients with lung cancer. The validation in a community-based US cohort, as in Chinese patients, showed its stability over time and in different ethnicities, although analyses were centralised in a single laboratory. However, no insight on a potential predictive value for adjuvant chemotherapy was shown and the need for more aggressive treatment (adjuvant chemotherapy) as the single surgical resection in the highest prognostic risk group was not provided by such a study.
Moreover, a highly significant HR for death does not necessarily mean that a prognostic signature or biomarker will be of clinical utility, since it does not imply that the predictive accuracy of the marker is sufficient to justify its use in clinical practice. For example, a molecular signature with a positive predictive value of 0.80 (i.e. 20% of patients with a poor prognosis signature will not die) does not accurately help to predict which patients will or will not die and therefore justify a treatment decision. Such clinical utility could be approached with the highest level of evidence by prospective trials with an “interaction” design, in which all patients are stratified by biomarker level and then randomised to one of two treatments [69].
However an alternative “selection” design has been chosen for the international (USA, France and China) randomised, prospective validation phase III trial using the Pervenio signature. This trial is planned in 1050 stage I non-squamous patients (700 high risk, 350 low risk) and will randomise adjuvant chemotherapy with four cycles of cisplatin/vinorelbin doublet versus observation in the high-risk group, whereas the low-risk group, as defined by Pervenio assay, will only be observed for evaluation of survival. The primary end-point is a 30% increase of overall survival in the high-risk group of patients, and the projected accrual will give an 87% power at a two-sided significance level of 5% to this trial. The lack of data on the predictive value leaves some uncertainty about the ability of the Pervenio signature to effectively reach the survival end-point in this study as it is unclear whether the signature actually defines the patients who effectively take advantage from adjuvant chemotherapy and also because this selection design will not provide any information with respect to the lack of benefit among low-risk patients.
When comparable external series of patients are lacking, some methods have been proposed giving an internal validation. The simplest method is to randomly select a training set of patients from the whole series of patients from which a multivariable prediction model, including the biomarker(s) of interest and the clinical predictors, can be derived. This model is then tested without any change in the rest of the patient series, which is used as a validation set. Such a strategy needs a large initial sample size in order to get precise estimates. Other more complex methods have been proposed, such as the leave-one-out cross-validation approach or the bootstrap method [70, 71]. Bootstrapping consists of drawing samples (with replacement) from the original data set to generate a large number of training sets (several hundred) of the same size as the original sample. A prediction model is developed on each training set and tested on the original data. Validation results are then reported as the average performance over the whole process. Unfortunately recent biomarker studies in NSCLC did not include such validation procedures in phase III clinical trials.
Conclusion
This article has focused on several methodological issues of prognostic and predictive studies of biomarkers in NSCLC clinical trials. Biomarker analyses in retrospective series of patients are still needed in order to set up a biologically robust and reproducible assay, and to generate hypotheses that need to be prospectively validated in broader series of patients provided by large prospective clinical trials. Such validations need a large specimen collection to ensure a minimal sample size for a minimal statistical power to detect a significant and clinically relevant difference in survival according to the biomarker status (table 1). Multivariate prognostic analysis of several biomarkers, including all clinical, pathological and prognostic variables, and stratification factors of clinical trials needs statistical corrections for multiple analyses to avoid false-positive result reporting. At best, external validation in comparable series of patients is suitable but currently rarely possible for the predictive value. Alternatively, internal cross-validation complex procedures could ensure the stability of the biomarker. If all these recommendations have been edited by the statistics subcommittee of the NCI-EORTC Working Group on Cancer Diagnostics then their systematic use remains rare in NSCLC. Recent prospective trials using so-called predictive markers performed in advanced NSCLC have failed to show any survival improvement in their customised arm. Prospective evaluation of prognostic molecular gene-expression signature is still pending. Therefore, the clinical routine applicability of biomarkers in early-stage patients still remains elusive and will need further investigation in the next decade.
Footnotes
Previous articles in this series: No 1: Girard N. Thymic epithelial tumours: from basic principles to individualised treatment strategies. Eur Respir Rev 2013; 22: 75–87. No 2: Dooms C, Muylle I, Yserbyt J, et al. Endobronchial ultrasound in the management of nonsmall cell lung cancer. Eur Respir Rev 2013; 22: 169–177. No. 3: Lang-Lazdunski L. Surgery for nonsmall cell lung cancer. Eur Respir Rev 2013; 22: 382–404.
Provenance: Submitted article, peer reviewed.
Conflict of interest: Disclosures can be found alongside the online version of this article at err.ersjournals.com
- Received September 20, 2013.
- Accepted October 7, 2013.
- ©ERS 2013
ERR articles are open access and distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 3.0.