Cancer remains a global health problem and is one of the leading causes of mortality. For instance, upper gastrointestinal (GI) malignancies, such as gastric cancer, have an estimated 989,600 new cases and 738,000 deaths per year worldwide.1 Early diagnosis and adequate treatment of GI cancers is essential to increasing the survival rate of the patients. Current diagnostics heavily rely on visualization of gross morphological changes using conventional white light reflectance (WLR) imaging. Recently, narrow band imaging (NBI) and autofluorescence imaging (AFI) have emerged for image-enhanced contrast of micro-vascularity and visualization of endogenous fluorophores in tissue. But these techniques remain inadequate for efficient and accurate diagnostics of grossly invisible lesions during clinical examinations, as biopsies are typically performed randomly.2 Hence, the development of real-time advanced diagnostic technologies that probe the endogenous biomolecular properties of tissues for fully automated cancer diagnostics would represent a vital advancement toward more efficient diagnosis and management of gastric patients.
Several point-wise spectroscopic techniques (e.g., fluorescence, diffuse reflectance, and near-infrared (NIR) Raman spectroscopy) have shown promising potential for early cancer diagnostic in humans.2–6 NIR Raman spectroscopy represents a unique optical technology capable of nondestructively probing endogenous biomolecules to resolve highly specific diagnostic information, while the spectral signatures can unambiguously be assigned to intra- and inter-cellular constituents (e.g., proteins, lipids and nucleic acids) of tissue. For instance, in vitro and in vivo Raman spectroscopic cancer diagnosis have been demonstrated in a variety of organs (e.g., skin,7,8 oral cavity,9,10 nasopharynx and larynx,11–15 breast,16 colon,17,18 lung,19 cervix,20,21 bladder,22 prostate,23 esophagus,24 stomach,5,6,25–29 etc.). Very recently, we have developed a rapid, high-throughput Raman spectroscopic system integrated with multimodal wide-field imaging modalities (WLR/AFI/NBI), and demonstrated its utility for in vivo diagnosis of premalignant and malignant lesions in the upper GI (i.e, esophagus,24 gastric5,25–27). A series of publications have outlined the efficient in vivo diagnosis of different gastric lesions including gastric dysplasia and neoplasia,5,25–27,29 benign and malignant ulcers.6 The inter-anatomical variability within the normal stomach and esophagus has also been examined in detail using image-guided Raman spectroscopy.29 Several groups have also reported on the implementation of real-time tissue Raman spectroscopy.30,31 For instance, Feld et al. presented a synchronized laser Raman system to study atherosclerosis that could acquire and process Raman spectra within 2 s using spectral modeling of reference biochemicals with an aid of least squares regression techniques.30 Zeng et al. developed a real-time integrated Raman system for evaluating skin that could acquire and process the signal within 1.1 s.31 Nevertheless, most of the current Raman spectroscopic analyses have been limited to off-line post processing for classification of spectra with cross-validation procedures, which render practical limitations including the setting of exposure times, post-verification of spectrum quality, and lack of automatic feedback mechanisms to the clinicians for implementation of straightforward probabilistic diagnostics in clinical settings. Hence, fully automated tissue spectral quality verification and real-time tissue cancer diagnostics are vital to translating the Raman spectroscopic diagnostic technique into practical clinical endoscopic routine. In this work, we develop an on-line biomedical spectral diagnostic framework integrated with image-guided Raman endoscopy for real-time probabilistic detection of cancer in the upper GI. We also validate the efficacy of the on-line Raman framework developed for prospective prediction of patients with gastric malignancies at clinical endoscopy.
Materials and Methods
Raman Endoscopy Instrumentation
Figure 1 shows the integrated Raman spectroscopy and trimodal wide-field imaging system developed for in vivo tissue Raman measurements at endoscopy, which has been described in detail elsewhere.32 Briefly, the clinical Raman spectroscopic system consists of a spectrum stabilized 785 nm diode laser (maximum output: 300 mW, B&W TEK Inc., Newark, Delaware) electronically synchronized with a USB 6501 digital I/O (National Instruments, Austin, Texas), a transmissive imaging spectrograph (Holospec f/1.8, Kaiser Optical Systems, Ann Arbor, MI) equipped with a liquid nitrogen-cooled, NIR-optimized, back-illuminated and deep depletion charge-coupled device (CCD) camera ( at ; Spec-10: 400BR/LN, Princeton Instruments, Trenton, New Jersey), and a specially designed Raman endoscopic probe for both laser light delivery and in vivo tissue Raman signal collection. The 1.8 mm Raman endoscopic probe is composed of 32 collection fibers surrounding the central light delivery fiber with two stages of optical filtering incorporated at the proximal and distal ends of the probe for maximizing the collection of tissue Raman signals, while reducing the interference of Rayleigh scattered light, fiber fluorescence and silica Raman signals. The Raman probe can easily pass down to the instrument channel of medical endoscopes and be directed to suspicious tissue sites under the guidance of wide-field endoscopic imaging (WLR/AFI/NBI) modalities.32–34 The system acquires Raman spectra in the wavenumber range of 800 to from in vivo upper GI tissue within 0.5 s using the 785 nm excitation power of (spot size of 200 μm) with a spectral resolution of .
The present study is part of an ongoing nationwide gastric cancer screening program, focusing on early diagnosis and treatment of upper GI malignancies run by the Singapore gastric cancer epidemiology, clinical and genetic program (GCEP).35 All patients participating in this study preoperatively signed an informed consent, permitting the investigative in vivo Raman spectroscopic acquisition of gastric tissues undergoing endoscopy in the Endoscope Centre at the National University Hospital (NUH), Singapore. This study was approved by the Institutional Review Board (IRB) of the National Healthcare Group of Singapore. A total of 2748 in vivo gastric tissue spectra (2465 normal and 283 cancer) were acquired from the 305 patients recruited to construct the spectral database for developing diagnostic algorithms for gastric cancer diagnostics. Tissue histopathology serves as the gold standard for evaluation of the performance of Raman technique for in vivo tissue diagnosis and characterization.
Online Biomedical Spectroscopic Framework
The online biomedical Raman spectroscopic framework developed has been implemented as a graphical user interface (GUI) under the Matlab 2011a (Mathworks Inc., Natick, MA) scripting environment in a fast computing workstation (64 bit I7 quad-core 4GB memory). This framework has been thoroughly optimized for rapid data processing for real-time tissue diagnostics. Hardware components of the rapid Raman system (e.g., laser power control, spectrometer, CCD shutter and camera readout synchronization) have been interfaced to the Matlab software through libraries for different spectrometers/cameras [e.g., PVCAM library (Princeton Instruments, Roper Scientific, Inc., Trenton, NJ) and Omni Driver (Ocean Optics Inc., Dunedin, FL), etc.]. A schematic of the spectral acquisition and processing flow of online diagnostic framework is depicted in Fig. 2. The laser was electronically synchronized with the CCD shutter. The automatic adjustment of laser power, exposure time and accumulation of spectra were realized by scaling to within 85% of the total photon counts (e.g., 55,250 of 65,000 photons) based on preceding tissue Raman measurements, whereas an upper limit of 0.5 s was set to realize clinically acceptable conditions. The accumulation of multiple spectra and automatic adjustment of exposure time provides a rapid and straightforward methodology to prevent CCD saturation and to obtain high signal-to-noise ratios (SNR) for endoscopic applications. The Raman-shift axis (wavelength) was calibrated using a mercury/argon calibration lamp (Ocean Optics Inc., Dunedin, FL). The spectral response correction for the wavelength-dependence of the system was conducted using a standard lamp (RS-10, EG&G Gamma Scientific, San Diego, CA). The reproducibility of the platform can be continuously monitored with the laser frequency and Raman spectra of cyclohexane and acetaminophen as wavenumber standards. All the system performance measures including CCD temperature, integration time, laser power, CCD alignment are accordingly logged into a central database via SQL server. Due to the inter-anatomical and inter-organ spectral variances as we observed earlier,9,29 the online framework we designed implements organ-specific diagnostic models and can instantly switch among the spectral databases of different organs [e.g., esophagus, gastric, colon, cervix, bladder, lung, nasopharynx, larynx, and the oral cavity (hard palate, soft palate, buccal, inner lip, ventral and the tongue)], making this Raman platform a universal diagnostic tool for cancer detection at endoscopy.
On-Line Preprocessing and Outlier Detection
Real-time preprocessing of Raman signals was realized with the rapid detection of cosmic rays using the first derivative with a 99% confidence interval (CI) over the whole spectral range set as a maximum threshold. Data points lying outside of a threshold were interpolated to 2nd order. The spectra were further scaled with integration time and laser power. A first-order, five point Savitzky-Golay smoothing filter was used to remove noise in the intensity corrected spectra, while a 5th order modified polynomial constrained to the lower bound of the smoothed spectra was subtracted to resolve the tissue Raman spectrum alone. The Raman spectra were finally normalized to the integrated areas under the curves from 800 to , enabling a better comparison of the spectral shapes and relative Raman band intensities among different tissue pathologies. The spectra were then locally mean-centered according to the specific database to remove common variations in the data. Following preprocessing, the Raman spectra were fed to a model-specific outlier analysis.
We incorporate an on-line outlier detection scheme into biomedical spectroscopy as a high-level model-specific feedback tool in our on-line framework by using principal component analysis (PCA) coupled with Hotelling’s and Q-residual statistics36–38 (Fig. 2). Briefly, PCA reduces the dimension of the Raman spectra by decomposing them into linear combinations of orthogonal components [principal components (PCs)], such that the spectral variations in the dataset are maximized. The PCA model of the data matrix X is defined by Ref. 38:Ref. 36: Ref. 36: Fig. 2). Accordingly, the Hotelling’s and Q-residuals are the two independent parameters providing quantitative information about the model fit. Using these parameters as indicators of spectra quality (i.e., probe contact mode, confounding factors, white light interference, etc.), auditory feedback has been integrated into the on-line Raman diagnostic system, facilitating real-time probe handling advice and spectroscopic screening for clinicians during clinical endoscopic procedures.
Online Probabilistic Diagnostics
Subsequent to verification of tissue Raman spectra quality, those qualified Raman spectra were immediately fed to probabilistic models for on-line in vivo diagnostics and pathology prediction. The GUI can instantly switch among different models including partial least squares-discriminant analysis (PLS-DA),6,20 PCA-linear discriminant analysis (LDA),26,28 ant colony optimization (ACO)-LDA,25 classification and regression trees (CART),39 support vector machine (SVM),17 adaptive boosting (AdaBoost),40 etc. for prospective classification at clinical endoscopic procedures. In this study, as an example, probabilistic PLS-DA was employed for gastric cancer diagnosis. PLS-DA employs the fundamental principle of PCA but further rotates the components by maximizing the covariance between the spectral variation and group affinity to obtain the diagnostically relevant variations rather than the most prominent variations in the spectral dataset.6 The system supports binary classification, one-against-all and one-against-one multiclass (i.e., benign, dysplasia, and cancer) probabilistic PLS-DA discriminatory analysis to predict the specific tissue pathologies.
Figure 3 shows the developed novel GUI for on-line biomedical spectroscopic processing and diagnostics. Specifically, we test the developed on-line framework in the stomach that represents one of the most challenging organs presenting with many confounding factors (i.e., gastric juice, food debris, bleeding, exudates, etc.) for spectroscopic diagnosis. The in vivo mean Raman spectra acquired from 305 gastric patients [normal () and cancer ()] for algorithms development are shown in Fig. 4. The Raman spectra of gastric tissue show the prominent Raman peaks at [ of hydroxyproline], [ of proteins], [ ring breathing of phenylalanine], [ of lipids], [amide III and of proteins], 1302 and [ deformation of proteins and lipids], [ of proteins and lipids], [ of porphyrins], [amide I of proteins] and [ of lipids].5,28 Clearly, gastric tissue Raman spectra contain a large contribution from triglyceride (i.e., major peaks at 1078, 1302, 1445, 1652, and ) that likely reflects the interrogation of subcutaneous fat in the gastric wall.6,29 Additionally, we also observed remarkable Raman spectral differences in the Raman peak position (e.g., 875, 936, 1004, 1078, 1265, 1302, 1335, 1445, 1618, 1652, and ) between different tissue pathologies, reconfirming our preceding in vivo Raman studies.25–27 A detailed analysis and discussion of the Raman spectral characteristics of carcinogenesis in the gastric can be found elsewhere that includes major pathological features such as upregulation of mitotic and proteomic activity, increase in DNA contents and relative reduction in lipid as well as onset of angiogenesis leading to neovasculation in the tissue.26,34
The automatic outlier detection was realized for predictive on-line analysis using PCA with Hotelling’s and Q-residuals statistics (99% CI). To make the online diagnostics efficient, a two-component PCA model was rendered that included the largest tissue spectral variations. These selected significant PCs () accounted for maximum variance of 38.71% (PC1: 30.33%, PC2: 8.38%) of the total variability in the dataset ( Raman spectra), and the corresponding PC loadings are shown in Fig. 5.
Figure 6 shows the score scatter plots (i.e., PC1 versus PC2) for the normal () and cancer tissue spectra (), exemplifying the capability of PC scores for separating the cancer spectra from normal. The 99% CI of Hotelling’s and Q residuals were accordingly calculated from the training dataset and fixed as a threshold for prospective on-line spectral validation. We then rendered probabilistic PLS-DA models for prediction of gastric cancer. The training database was randomly resampled multiple times () into learning (80%) and test (20%) sets. The generated PLS-DA models provided a predictive accuracy of 85.6% (95% CI: 82.9% to 88.2%) [sensitivity of 80.5% (95% CI: 71.4% to 89.6%) and specificity of 86.2% (95% CI: 83.6% to 88.7%)] for gastric cancer diagnosis. We subsequently tested the outlier-detection as well as probabilistic PLS-DA in 10 prospective gastric patients. PC score scatter plots (i.e., PC1 versus PC2) for the prospective normal () and cancer () tissue spectra are also shown in Fig. 6.
Figure 7 shows the prospective scatter plot of the Hotelling’s (38.71%) and Q-residuals (61.29%) with the 99% CI boundaries for 105 spectra acquired from the prospective gastric patients. It is observed that a large number of noncontact spectra lie outside the 99% CI and are therefore discarded in real-time. The verified tissue Raman spectra largely fall inside the 99% CI of and Q residuals, demonstrating that this on-line data analysis provides a rapid and highly efficient means of real-time validation of biomedical tissue spectra. The prospectively acquired spectra verified by the on-line outlier analysis are further fed to probabilistic PLS-DA for instant disease prediction, achieving a diagnostic accuracy of 80.0% () [sensitivity of 90.0% (), and specificity of 73.3% ()] for gastric cancer detection (Fig. 8), as confirmed by histopathological examination.
The receiver operating characteristic (ROC) curves were further generated to evaluate the group separations. Figure 9 shows the mean of the ROC curves computed from each random splitting of the spectral database for retrospective prediction as well as the ROC calculated for the prospective dataset prediction. The integration areas under the ROC curves generated for the retrospective and prospective datasets are 0.90 and 0.92, respectively, illustrate the robustness of the PLS-DA algorithm for gastric cancer diagnosis in vivo.
The total processing time for all the procedures (Fig. 2) implemented was 0.13 s. The processing time for each step of the flow chart (Fig. 2) are given in Table 1. Free-running optical diagnosis and processing time of can be achieved, which is critical for realizing real-time in vivo tissue diagnostics at endoscopy.
Average processing time for on-line biomedical Raman spectroscopic framework on a personal computer with a 64-bit I7 quad-core 4GB memory.
|Analyses||Computational time (milliseconds)|
|Cosmic ray rejection||0.5|
|Laser response time||10|
|Probabilistic PLS-DA prediction||70|
|Total computation time||100 to 130|
Overall, this work demonstrates for the first time that the on-line biomedical diagnostic framework can move the biomedical Raman spectroscopy into real-time, on-line cancer detection and diagnosis during routine clinical endoscopic examination.
Histopathological examination of endoscopic biopsies for observing cytological and morphological abnormalities remains the current gold standard for precancer and cancer diagnosis in tissue. Several spectroscopic techniques including reflectance, fluorescence and Raman spectroscopy have demonstrated promising potential for realizing optical biopsies. Extensive research is currently in progress especially for translating Raman spectroscopic technique into real-time, clinical precancer and cancer diagnostics. However, specifically for prospective clinical endoscopic applications, fast data acquisition and short processing times (), straightforward on-line probabilistic diagnostics, real-time feedback to clinicians of spectrum quality and tissue classification are essential to making this optical biopsy technology a robust diagnostic tool in clinical settings. In this work, we have developed a novel on-line spectral diagnostic framework that implements fully synchronized spectral acquisition, automatic integration time setting, model-specific outlier analysis as well as organ-specific probabilistic diagnostics, pushing spectroscopic techniques into real-time cancer screening with an easy-to-use interface for routine clinical endoscopic applications.
We have presented a universal on-line framework for biomedical spectroscopy (Fig. 3) that integrates preprocessing, automatic outlier detection, and deploys instant change of diagnostic models among different organs in the upper GI tract during gastroscopy. The universal platform supports different spectroscopic techniques (e.g., reflectance, fluorescence and Raman spectroscopy) with appropriate preprocessing methods and model development for each modality. We integrate the on-line diagnostic framework with the recently developed multimodal image-guided (WLR/NBI/AFI) Raman spectroscopic platform for early diagnosis and detection of precancer and cancer in the upper GI at endoscopy.5,28,29 The accumulation of tissue Raman spectra and automatic scaling of integration time with a predefined upper limit of 0.5 s allows instant acquisition of in vivo tissue spectra with improved SNR while preventing CCD signal saturation. This is especially important for endoscopic diagnostics where the autofluorescence intensity varies significantly among different anatomical regions (e.g., antrum and body in the gastric, bronchi in the lung) likely caused by distinct endogenous fluorophores in the tissue.
We have also introduced automated outlier detection for spectra verification in endoscopic applications, whereby probe handling variations can be significant and to a large extent depend on the clinicians’ expertise. PCA with related Hotelling’s and Q-residuals is a high-level metrics for outlier detection,41 and we found that it is highly efficient and reliable in our online diagnostic platform at endoscopic diagnosis. The PCA model was generated from the database accounting for the majority of common tissue variations (38.71% of total variance, PC1: 30.33%; PC2: 8.38%) (Fig. 5). The rendered PCA model was further tested prospectively on 10 gastric patients for prospective automatic outlier detection during in vivo on-line Raman spectroscopic screening (Fig. 6). Of the total 105 gastric spectra acquired from the prospective gastric patients, 75 spectra that fit the developed PCA model well [ and Q lies within 99% CI (Fig. 7—blue and red circles)] were eligible for further classification. The remaining spectra (, Fig. 7; green triangles) from noncontact measurements, food debris or blood contaminations show unusual disturbances, which were poorly reconstructed by the PCA model. Large Q-residuals indicated the presence of spectral variations unexplainable by the model due to many confounding factors (e.g., noncontact mode, contaminants, or unknown spectral variations) frequently occurring in in vivo clinical measurements.36 One noted that tissue Raman spectra presenting with Q-residuals vaguely outside the CI [() in Fig. 7] were accepted following off-line analysis because the newly introduced spectral variations could be due to tumor invasion, necrosis, tumor subtype, etc. Thus following on-line analysis, it is advisable to review the spectra in off-line mode to assess possible new spectral variations. This study thoroughly shows the necessity of utilizing automated outlier diagnostics for tissue Raman spectral quality verification in real-time endoscopic applications.
Subsequent to on-line verification of spectrum quality, the spectra were fed to probabilistic PLS-DA algorithms for disease prediction. The PLS-DA modeling provided the predictive accuracy of 80.0% () [sensitivity of 90.0% () and specificity of 73.3% ()] (Fig. 8) for cancer diagnosis on 10 prospective gastric patients, suggesting that Raman endoscopy with the integration of on-line diagnostic framework can be a diagnostic screening tool for real-time in vivo gastric cancer identification. These probabilistic diagnostic outcomes were presented in real-time to the endoscopist within 0.5 s (Table 1), which is a clinically acceptable time at endoscopy. With the immediate auditory diagnostic feedback from the GUI developed, the endoscopists can now perform routine point-wise scanning for targeted biopsies of the high-risk tissue sites. The movie (Fig. 10) shows the online Raman endoscopic diagnostic procedure for real-time in vivo detection of gastric cancer during clinical endoscopic examination. Biomedical spectroscopic modalities can thus now function in free-running mode, opening the possibility of true prospective spectroscopic screening in clinical settings.
In summary, we have developed an on-line biomedical Raman spectroscopic framework to translate the cumbersome processing and multivariate analysis into an easy-to-use GUI interface for real-time, in vivo diagnosis of malignancies in internal organs. The efficacy of the on-line diagnostic framework integrated with the multimodal image-guided clinical Raman spectroscopy was proven on the prospective gastric patients, illustrating the promising potential of moving the biomedical Raman spectroscopy technique into real-time, on-line cancer detection and diagnosis during routine clinical endoscopic examination.
This research was supported by the National Medical Research Council and the Biomedical Research Council, Singapore.