Evaluating the identification of the extent of gastric cancer by over-1000 nm near-infrared hyperspectral imaging using surgical specimens

Abstract. Significance Determining the extent of gastric cancer (GC) is necessary for evaluating the gastrectomy margin for GC. Additionally, determining the extent of the GC that is not exposed to the mucosal surface remains difficult. However, near-infrared (NIR) can penetrate mucosal tissues highly efficiently. Aim We investigated the ability of near-infrared hyperspectral imaging (NIR-HSI) to identify GC areas, including exposed and unexposed using surgical specimens, and explored the identifiable characteristics of the GC. Approach Our study examined 10 patients with diagnosed GC who underwent surgery between 2020 and 2021. Specimen images were captured using NIR-HSI. For the specimens, the exposed area was defined as an area wherein the cancer was exposed on the surface, the unexposed area as an area wherein the cancer was present although the surface was covered by normal tissue, and the normal area as an area wherein the cancer was absent. We estimated the GC (including the exposed and unexposed areas) and normal areas using a support vector machine, which is a machine-learning method for classification. The prediction accuracy of the GC region in every area and normal region was evaluated. Additionally, the tumor thicknesses of the GC were pathologically measured, and their differences in identifiable and unidentifiable areas were compared using NIR-HSI. Results The average prediction accuracy of the GC regions combined with both areas was 77.2%; with exposed and unexposed areas was 79.7% and 68.5%, respectively; and with normal regions was 79.7%. Additionally, the areas identified as cancerous had a tumor thickness of >2  mm. Conclusions NIR-HSI identified the GC regions with high rates. As a feature, the exposed and unexposed areas with tumor thicknesses of >2  mm were identified using NIR-HSI.

Approach: Our study examined 10 patients with diagnosed GC who underwent surgery between 2020 and 2021. Specimen images were captured using NIR-HSI. For the specimens, the exposed area was defined as an area wherein the cancer was exposed on the surface, the unexposed area as an area wherein the cancer was present although the surface was covered by normal tissue, and the normal area as an area wherein the cancer was absent. We estimated the GC (including the exposed and unexposed areas) and normal areas using a support vector machine, which is a machine-learning method for classification. The prediction accuracy of the GC region in every area and normal region was evaluated. Additionally, the tumor thicknesses of the GC were pathologically measured, and their differences in identifiable and unidentifiable areas were compared using NIR-HSI.
Results: The average prediction accuracy of the GC regions combined with both areas was 77.2%; with exposed and unexposed areas was 79.7% and 68.5%, respectively; and with normal regions was 79.7%. Additionally, the areas identified as cancerous had a tumor thickness of >2 mm.
Conclusions: NIR-HSI identified the GC regions with high rates. As a feature, the exposed and unexposed areas with tumor thicknesses of >2 mm were identified using NIR-HSI.

Introduction
Gastric cancer (GC) is a commonly diagnosed cancer worldwide and one of the leading causes of cancer-related mortality. 1 Gastrectomy is a curative treatment for GC. 2 The effect of gastric remnant volume on the postoperative quality of life and nutritional status has been recognized. Furthermore, organ-sparing surgery is increasingly preferred. [3][4][5] Japanese GC treatment guidelines (5th edition) recommend a proximal resection margin of at least 3 cm for tumors ≥T2 in the case of an expansive growth pattern, whereas a proximal resection margin of at least 5 cm is required in the case of an infiltrative growth pattern. 6 The large margin in the infiltrative growth patterns is attributed to the extensive submucosal or deeper layer invasions. Hence, if the extent of the non-exposed cancer invasion is identified, the stomach can be resected more appropriately. However, assessing the extent of the tumor and determining the appropriate resection margins during surgery remains difficult. Hence, several methods, including intraoperative endoscopy and endoscopic tattooing or clipping, that can identify the tumor extent and determine the appropriate surgical margins exist. 7-10 However, the endoscopic determination of the extent of the GC that is not exposed to the mucosal surface remains difficult (e.g., the infiltrative growth pattern 10 ).
We previously reported that near-infrared (NIR) hyperspectral imaging (HSI) could be used to identify gastrointestinal stromal tumors in the muscular layer that was not exposed to the mucosa. 11,12 Light in the NIR region, ranging from 800 to 2500 nm, is useful for probing deep parts of the tissues owing to its low absorption and scattering. The absorption spectra in the NIR region convey fingerprint data owing to the overtone or combined vibrations of the chemical bonds. [13][14][15] This permits the investigation of the distribution of the chemical composition within a sample. 16 Hyperspectral imaging provides a three-dimensional dataset (two spatial and one spectral) that allows a spectral curve at each pixel in the acquired images to be obtained. 17 It uses a machine-learning algorithm to acquire spectral information from each pixel and extract critical imaging data from several hyperspectral images. 18,19 Hyperspectral imaging has the potential for non-invasive, label-free diagnosis and surgical guidance. Hence, NIR-HSI can identify the extent of the GC that is not exposed to the mucosal surface.
Studies have reported the identification of GC using NIR-HSI, although its characteristics have not been considered. [19][20][21] In this study, we aim to explore whether NIR-HSI can identify the GC regions, including each exposed and unexposed area, using surgical specimens and investigate the characteristics of the identifiable GC region.

Patients and Surgical Specimen
This study included 10 patients with clinically diagnosed GC who underwent surgery between September 2020 and October 2021. The inclusion criteria were as follows: (i) clinical diagnosis of the GC, (ii) age ≥ 20 years, and (iii) written informed consent. The exclusion criteria were as follows: (i) history of prior chemotherapy and (ii) presence of the hepatitis B virus surface antigen or hepatitis C virus antibody. Ten patients who underwent gastrectomy during the study period were enrolled. Table 1 shows the characteristics of the patients and lesions. The median tumor size was 38 mm (range: 20 to 70 mm). The clinical T stage was T1 in 1 lesion, T2 in 3 lesions, and T3-4 in 6 lesions. The macroscopic type was 0-IIc in 1 lesion, T2 in 4 lesions, and T3 in 5 lesions.
The indicators for the types of gastrectomy and a proximal resection margin were determined based on the preference of the surgeon and were primarily based on the Japanese GC treatment guidelines. 6 All of the surgical specimens were fixed using formalin, cut into 5-or 10 mm-thick slices, stained with hematoxylin and eosin, and evaluated by experienced pathologists. Clinical and pathological outcomes included the Borrmann classification of the tumor, tumor size, depth of invasion, tumor thickness, and histology based on differentiation. The extent of the GC was evaluated separately for exposed and unexposed areas. The exposed area was defined as an area where the cancer itself was pathologically exposed on the surface and not covered by any normal tissue. The non-exposed area was defined as an area where the cancer was present although not exposed on the surface and the surface was covered by normal tissue. The normal areas were defined as areas where the cancer was not present, as shown in Fig. 1.
This study was approved by the Institutional Review Board of the National Cancer Center, Japan (Approval No. 2015-339), which conformed to the provisions of the Declaration of Helsinki and Epidemiological Study Guidelines issued by the Japan Ministry of Health, Labor, and Welfare. All patients provided a written informed consent prior to their inclusion in the study.

Near-Infrared Hyperspectral Image Capture and Data Preprocessing
An imaging system with a high-speed NIR hyperspectral camera (CompoVision, CV-N800HS; Sumitomo Electric Industries, Ltd., Osaka, Japan) was used to obtain the NIR-HSI images (wavelength: 1000 to 2350 nm, wavelength resolution: 6.3 nm, and depth resolution: 14 bit). The detector (NIR spectroscopic camera) captured the data values for each wavelength band on each pixel per line of the image in one scan. Three-dimensional HSI images (x-y-λ axes) were obtained by scanning multiple lines (by sliding the sample stage), thereby producing a Note: SM, submucosa; MP, muscularis propria; SS, subserosa; SE, serosa exposed; tub1, well differentiated adenocarcinoma; tub2, moderately differentiated adenocarcinoma; por, poorly differentiated adenocarcinoma; sig, signetring cell carcinoma.
virtual "data cube" for processing and analysis. 17 Each fresh specimen that, prior to the formalin fixation was resected from the stomach, was placed on the sliding stage without trimming. Each NIR-HSI image was acquired once from the mucosal side under illumination from a halogen lamp (0.96 W∕cm 2 ). To analyze the HSI data, it is necessary to calibrate the images using dark noise and a white standard for each pixelði; jÞ. Each reflectance is expressed as follows: (1) where Rði; jÞ is a row vector of the reflectance spectrum of the obtained image and I r ði; jÞ, I w ði; jÞ, and I d ði; jÞ are the row vectors of raw, white standard, and dark noise data, respectively. Spectra with wavelengths >1400 nm were removed from the analysis because of the high absorption by the water in those bands and the lower sensitivity of the NIR camera, unlike the transmission spectroscopy. 20 Additionally, the reflectance rates of more than 70% as highlights and below 10% as shadows were defined in the 1300-nm spectra. Further, the pixels of the necrosis were removed from the dataset and analysis. Normal, exposed, and unexposed GCs were marked by a pathologist to create three regions showing the pixels of the boundary lines. The 5-pixel margin around the boundary lines was excluded while extracting the spectra from each region because the boundary line was drawn freehand by the pathologist. Hence, it was susceptible to error. NIR spectral measurements reported the variance and the non-specific scattering at the surface of the sample. 22 The standard normal variate (SNV) of absorbance was used for the baseline correction of the spectrum to reduce the variance and is as expressed as follows: ZðxÞ where Aði; jÞ is a row vector of the absorbance spectrum, x is a row vector containing the original spectrum, meanðxÞ is the mean of x, stdðxÞ is the standard deviation of x, and ZðxÞ is the SNVtransformed spectrum.

Spectral Data Analysis
A support vector machine (SVM) was used to perform a two-class classification of the normal tissues and entire GC lesions, excluding the 5-pixel margin of the boundary line and necrosis area [inside of the pink line in Fig. 3(B)]. During this process, the pixels to be trained were randomly extracted from a specimen to avoid overfitting. The number of pixels was aligned with normal 200 px and GC 200 px from the confirmed area by pathology. Further, leave-one-out cross-validation was employed, wherein a specimen is classified by a training dataset (each 200 px of normal and GC from nine specimens; Fig. 1 (a) Endoscopic view of gastric cancer. White line is the predicted unexposed area. Yellow line is the predicted exposed area. (b) Histopathology image schema. Cancerous and normal areas were distinguished by the pathologist. Cancerous areas were divided into exposed and unexposed areas, according to their definitions. total 3600 spectra) that excludes the specimen pixels because the procedure provides an approximately unbiased estimate of the generalization ability. 23 This algorithm solves an optimization problem expressed as follows: 24 minimize tðw n ; subject to hx i ; w y i i − hx i ; w n i ≥ b n i − ξ i ði ¼ 1; : : : ; mÞ; E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 7 ; 6 2 9 b n i ¼ 1 − δ y i ;n ; where w n is a weight vector, C is the cost, and ξ i is the slack variable. Further, the decision function is expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 7 ; 5 9 8 g max m¼1;: : : ;k hx i ; w n i: The optimization problem was solved using the decomposition method. 25 This study used C ¼ 1, and the RBF kernel is expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 7 ; 5 5 0 where the optimal values of the hyperparameter σ 2 are estimated as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 7 ; 4 9 9 The SNV-transformed spectra of the normal tissues, except for the test sample, were averaged. Further, the mean SNV-transformed spectra of the exposed and unexposed regions in the tumor of the test sample were calculated. Hence, the difference in the spectra was obtained by comparing the spectra of the normal and tumor tissues.

Calculation of Prediction Accuracy
The coordinates of the classification pixels were compared to the line drawing images by a pathologist to evaluate the classification accuracy. The area of analysis in this study was limited to the pathologically confirmed area (green area) because only the green area in Fig. 4 is the pathologically evaluated area. This is because the pathologically evaluated sites are the cancer and surrounding normal tissue. The pixels were classified into four groups: GC is predicted as GC [true positive (TP)], GC is predicted as normal [false negative (FN)], normal is predicted as GC [false positive (FP)], and normal is predicted as normal [true negative (TN)]. The falsepositive rate and false-negative rate were calculated from the classified pixels. The specificity, sensitivity, and accuracy were expressed as follows:

Comparison of Histopathologic Tumor Thickness
Histopathological tumor thicknesses were compared to determine the factors that contributed to the identified areas (TP pixels) and unidentified areas (FN pixels) of the GC using NIR-HSI. The criteria for the identified and unidentified areas of each specimen were as follows. If the sensitivity was >80%, the pathological sections in the identified areas could be accurately recognized, but the unidentified areas were difficult to recognize. Therefore, tumor thickness was measured only in the identified areas. However, when the sensitivity was <80%, it was possible to recognize pathological sections in the identified and unidentified areas; therefore, tumor thickness was measured in the identified and unidentified areas.

Endoscopic Images
The endoscopic images of the lesions and pictures of the 10 excised specimens are shown in Figs. 2(A) and 2(B), respectively. Four specimens [d, e, g, and i in Fig. 2(B)] were exposed to the mucosal surface of all regions for cancer. Six specimens [a-c, f, h, and i in Fig. 2(B)] had unexposed cancer in some regions. Figure 3(A) shows the pictures of the 10 specimens captured using the NIR-HSI. The GC and normal regions in the NIR-HSI images were marked by a pathologist [Fig. 3(B)]. The white lines represent the extent of the GC, and the yellow ones represent the borderline between the exposed and unexposed areas of the GC. To visualize the trend of the spectral regions contributing to the cancer classification, the difference between the SNV-transformed spectra of the normal tissues (leave-one-out training data) and mean SNV-transformed spectra of the exposed and unexposed regions in the tumor of the test sample is shown in Fig. 3(C).
The results showed that the absorbance difference in the wavelength range of 1050 to 1100 nm and 1380 to 1400 nm was negative and in the wavelength range of 1250 to 1350 nm was positive, except for the unexposed and exposed areas of the specimens (f and g, respectively).

Identification Results of Gastric Cancer and Normal Regions using NIR-HSI
The analysis of the spectra from the HSI images was performed based on the training data using the SVM algorithm. Further, the GC and normal regions were identified. The pixels that were predicted as GC were assigned a red color, and the normal tissues were assigned a green color. The color-coded pixels that were predicted as GC and normal tissues are shown in the upper images in Figs. 4(a)-4(j). The lower images in Figs. 4(a)-4(j) were merged to include the boundary line of the pathologist. Furthermore, the pixel areas used for calculating the prediction accuracy were also shown. The pixels (459,688 px) were classified as follows: TP: 160,693 px, FN: 35,517 px, FP: 55,567 px, and TN: 207,911 px. The specificity, sensitivity, and accuracy of 74.8%, 77.2%, and 79.7%, respectively, were evaluated from the classified pixels. Table 2 presents the results. The prediction results of the exposed area were classified as follows: TP: 138,545 px and FN: 27,971 px. The sensitivity of 79.7% was calculated from the classified pixels. Additionally, the prediction results for the unexposed area were classified as follows: TP: 22,148 px and FN: 7545 px. Furthermore, the sensitivity of 68.5% was calculated from the classified pixels. The complete results are presented in Table 3.  The difference in the spectra between the SNV-transformed spectra of normal tissues (leave-one-out training data) and mean SNV-transformed spectra of the exposed and unexposed regions in the tumor of the test sample. Table 4 shows the tumor thickness in the GC region. NIR hyperspectral imaging detected cancer areas with pathologic thicknesses of >2 mm regardless of whether the cancer was exposed on the mucosal surface or not (Fig. 5). Table 3 Prediction results in the exposed and unexposed areas during the NIR-HSI analysis. Among the pixels analyzed in Table 2, the TP and FN of the area inside the yellow line in Fig. 3(B) were defined as "exposed," and the TP and FN of the area between the white and yellow lines were defined as "unexposed".   2 Prediction results of the NIR-HSI analysis. The GC region [the inside of the white line of Fig 2(b)] was defined as "positive," and the outside of the GC was defined "negative" in the pathologically confirmed area. The 5 px margin of the white line and necrosis area were excluded from the analysis. However, thin-tumor areas, including those exposed on the mucosal surface, were unidentified as cancerous (Fig. 6).

Relationship between Tumor Thickness and Identifiable Area using NIR-HSI
Additionally, in the prediction results of the NIR-HSI analysis as shown in Table 3, specimens g and f had the lowest sensitivity in the exposed and unexposed regions, respectively. As shown in Fig. 3(C), the spectral trends of both specimens were different from those of the other highly sensitive specimens. This may be because the lesion tissue was thin (Table 4), whereas the molecular structures of the samples were different from those of the other samples.   This study evaluated the extent of the GC, including the unexposed areas that were difficult to recognize under white light or by machine learning using the NIR-HSI data. The findings of this study are novel because previous NIR-HSI studies on GC did not discuss the identification of cancerous areas under normal mucosa. This study identified the exposed area of the GC as cancerous with high sensitivity, that is, with 79.7% and 68.5% sensitivities in the exposed and unexposed areas, respectively. However, the overall sensitivity was lower in the unexposed area than in the exposed area. Further, several FPs were detected in thin cancerous areas, which is possibly owing to the characteristics of the NIR absorption spectroscopy that uses reflected light: (i) the thicker the normal mucosa covering cancer, the more attenuated the characteristic absorption spectrum of cancer and (ii) the thinner the cancer, the sparser the information in the absorption spectrum. As shown in Fig. 3(C), the absorbance difference between the normal area spectra in the training data and that of the unexposed area was smaller than that of the exposed area. Areas with tumor thicknesses of >2 mm were identified using the NIR-HSI, even in unexposed areas (Table 4). However, areas with thin tumors remained unidentified in either the exposed or unexposed areas. This study shows that NIR-HSI can be used to accurately recognize unexposed areas that are difficult to recognize under white light when the thickness is 2 mm or more.
This study was limited by the small number of specimens. However, it classified NIR-HSIcaptured pixels (total pixel: 459,688 px) using the leave-one-out cross-validation procedure. Hence, the sample size was sufficient, and its generalizability was demonstrated. The results for shallow-depth areas were inadequate because numerous specimens in the study had advanced cancer, and sufficient data on the areas with shallow depths of cancer invasion were lacking. Gathering more data on shallow lesions was necessary, as represented by the specimen in Fig. 2(B) [case (g)]. Hence, the shallow lesions were unidentified as cancers. The optimization of the measurement conditions, such as changes in the illumination method of the NIR-HSI system, specimen placement, and design focal length, improved the detection performance. Additionally, although an SVM was adopted as a machine learning method in this study, other methods such as neural network and principal correlation analysis have been proposed to identify lesions. The identification accuracy may be improved by investigating the optimal algorithm and parameter. Moreover, significantly increasing the number of specimens would broaden the range of options for popular deep learning techniques and allow for the exploration of more robust classifiers.
However, imaging conditions in the stomach using an endoscope with NIR-HSI differed from those of our study. For example, the postoperative specimen was imaged from the front in this study, though it may not be possible in the case of an endoscope in the stomach. Additionally, the absorption spectra may change before gastrectomy because of the blood flow and other factors. Hence, the NIR-HSI devices for endoscopic use must be developed, and the optimization of the measurement conditions must be considered in the future.
Our group is currently developing an NIR-HSI device that can be used with endoscopes. We expect to perform an endoscopic NIR-HSI analysis in the near future. If this technology is used with an endoscope, it may be possible to intraoperatively or preoperatively identify the extent of the unexposed area and define a more appropriate line for the resection of the stomach.

Conclusion
This study accurately identified the exposed and unexposed GC areas with thicknesses >2 mm. In the future, we expect to develop an NIR-HSI system that will be used with endoscopes to enhance its practicality.

Disclosures
The authors have no relevant financial interests in the manuscript and no other potential conflicts of interest to disclose. Toshihiro Takamatsu is an associate professor at Tokyo University of Science, Chiba, Japan, and a researcher at the National Cancer Center, Chiba, Japan. He received his BS degree in physics from Tokyo University of Science in 2010 and his MS degree and PhD in engineering from Tokyo Institute of Technology, Yokohama, Japan, in 2011 and 2014, respectively. He was a postdoctoral researcher at Kobe University, Hyogo, Japan, from 2014 to 2018.
Biographies of the other authors are not available.