Frameworks of wavelength selection in diffuse reflectance spectroscopy for tissue differentiation in orthopedic surgery

Abstract. Significance Wavelength selection from a large diffuse reflectance spectroscopy (DRS) dataset enables removal of spectral multicollinearity and thus leads to improved understanding of the feature domain. Feature selection (FS) frameworks are essential to discover the optimal wavelengths for tissue differentiation in DRS-based measurements, which can facilitate the development of compact multispectral optical systems with suitable illumination wavelengths for clinical translation. Aim The aim was to develop an FS methodology to determine wavelengths with optimal discriminative power for orthopedic applications, while providing the frameworks for adaptation to other clinical scenarios. Approach An ensemble framework for FS was developed, validated, and compared with frameworks incorporating conventional algorithms, including principal component analysis (PCA), linear discriminant analysis (LDA), and backward interval partial least squares (biPLS). Results Via the one-versus-rest binary classification approach, a feature subset of 10 wavelengths was selected from each framework yielding comparable balanced accuracy scores (PCA: 94.8±3.47%, LDA: 98.2±2.02%, biPLS: 95.8±3.04%, and ensemble: 95.8±3.16%) to those of using all features (100%) for cortical bone versus the rest class labels. One hundred percent balanced accuracy scores were generated for bone cement versus the rest. Different feature subsets achieving similar outcomes could be identified due to spectral multicollinearity. Conclusions Wavelength selection frameworks provide a means to explore domain knowledge and discover important contributors to classification in spectroscopy. The ensemble framework generated a model with improved interpretability and preserved physical interpretation, which serves as the basis to determine illumination wavelengths in optical instrumentation design.


Introduction
Surgical procedures involving osteotomies pose risks of breaching into critical neurovascular and musculoskeletal structures, often leading to operative morbidity or potential mortality.One related clinical situation involves total hip and knee arthroplasty (THA/TKA) where periprosthetic fracture is considered a major complication that requires revision.Studies found that 40% and 54% of primary THA and TKA patients, respectively, encountered at least one major complication, such as fracture or one minor complication, such as neurovasculature damage, or both. 1,2Furthermore, with population ageing leading to rapidly growing incidence of primary arthroplasty, the number of revision surgeries will continue to increase substantially due to limited implant survivorship.The advanced surgical complexity and higher rates of associated complications have become a prevalent concern that remains to be improved.Within orthopedics, incidence projections of revision THA/TKA (rTHA/rTKA) to 2030 in the United States alone have indicated an increase of 43% to 70% and 78% to 182%, respectively, relative to the 50,000 rTHA and 72,000 rTKA cases in 2014.In particular, the patient group aged >55 years is anticipated to undergo the highest increase, suggesting that multiple revision surgeries would become necessary in the coming decades.Some common causes for rTHA include instability (28%), aseptic loosening (24%), periprosthetic fracture (18%), and prosthetic joint infection (18%). 36][7] We have examined the possibility of optical integration to guide cement removalone of the most technically challenging and time-consuming steps for surgeons during rTHA/ rTKA surgery, especially removal of the distal cement plug.Choices of intraoperative image guidance modalities, such as endoscopy, fluoroscopy, or ultrasound, [8][9][10] are usually surgeon dependent, which can be impractical in the operating room.Navigation through the procedure therefore relies on the surgeon's experience and the safety measures in the surgical tool.On the other hand, extraction of well-bound bone cement can be undesirably invasive when creation of cortical windows is needed via trochanteric osteotomy to visualize the distal plug. 11,12][15] DRS, as a means of integrated optical guidance in complement to standard imaging modalities, has the potential to assist the orthopedic surgeon with (1) differentiating bone cement 16 from biological tissues; (2) informing the cement-to-bone interface as a safety measure to signal pre-breaching; and (3) informing the bone-to-tissue interface as an indicator to signal postbreaching during rTHA involving cemented implants.8][19][20] However, the widely known properties of multicollinearity, redundancy, and noise affect the accuracy performance due to overfitting of the training data, as well as deteriorating interpretability and explainability of the ML model. 21Feature selection (FS), namely to select an optimal subset of wavelengths to the classification or regression problem, becomes crucial for understanding the knowledge space, reducing data dimensionality, and finding accurate ML models.There has been considerable research into wavelength selection in spectroscopy [22][23][24][25] as well as generic FS algorithms, [26][27][28][29] for which novel approaches have been proposed to process spectral data.For example, a previous study by Gunaratne et al. 30 used multiclass Fisher's linear discriminant analysis (LDA) to both select features and classify tissue types in ovine joint tissue specimens for orthopedic applications, achieving 100% accuracy with full DRS data of 2048 wavelengths from 190 to 1081 nm, 90% with the selected 10 wavelengths, and 70% with the selected single wavelength.Fanjul-Vélez et al. 18 examined the use of spectral characteristics extraction and principal component analysis (PCA) as a dimensionality reduction technique on DRS measurements from ex vivo porcine specimens, demonstrating specificity and sensitivity values of >98%.Another study using FS by Mamouei et al. 31 compared different variations of partial least squares (PLS) with a genetic optimization algorithm to identify important spectral regions for predicting lactate concentrations in blood using optical sensing, resulting in higher accuracy scores with a reduced number of wavelengths.
The advantage of DRS lies in the possibility of instrument miniaturization and easier integration to surgical workflows.The choice of discrete narrowband light sources in the device can be inadequate in different applications without a selection methodology.In this work, we present four distinct FS frameworks to methodologically select an optimal subset of DRS wavelengths for tissue differentiation in bone-related surgeries, achieving comparable classification accuracy using substantially fewer wavelengths.The focus is on the tailored FS frameworks to select wavelength features, which will be implemented as the light source in the optical device for reduced instrument complexity to guide bone cement removal.The primary aims are thereby defined as the following: (1) exploring domain knowledge in the orthopedics-related DRS dataset, (2) determining an optimal subset of DRS wavelengths with sufficient discriminative power between tissue types by comparing four FS frameworks, and (3) developing a suitable wavelength selection methodology for adaptation to various clinical scenarios.

Materials and Methods
All ex vivo measurements were approved and in compliance with the regulations at Tyndall National Institute, Cork, Ireland."Wavelength," namely the spectral bin width defined by the spectrometer, is referred to as "feature.""Sample" refers to each DRS data entry.All data preprocessing and analysis were implemented in the python programming language version 3.9 (Python Software Foundation 32 ) built from open-source libraries, packages, and dependencies, including scikit-learn, 33 SciPy, 34 and SHapley Additive exPlanations (SHAP). 35,36

Specimen Preparation and DRS Data Acquisition
The ex vivo ovine tissue specimens included bone marrow, cartilage, cortical bone, muscle, and trabecular bone sourced from a local butcher shop, which were sacrificed 2 days prior to delivery and immediately refrigerated until 2 h before measurements.All measurements were acquired at room temperature upon delivery, where tissue specimens were unprocessed and kept moist using saline spray.Muscle specimens were approximately 22 × 23 × 8 cm [Fig. 1  and bone cement (PALACOS ® R+G pro, Heraeus Medical GmbH, Wehrheim, Germany) specimens were cured and hand-molded into approximately 6 × 5 × 4-cm blocks [Fig.1(c)] 1 day earlier.DRS measurements of multiple unique locations were collected from one specimen with >5 mm apart in grid layout.The total number of tissue specimens and DRS measurements are summarized in Table 1.
Figure 1(d) shows the extended-wavelength DRS (EWDRS) system, which employed a dual spectrometer detection configuration across the visible/near-infrared/short-wave infrared (VIS/NIR/SWIR) spectral range.The 355-to 1100-nm and 1100-to 1850-nm ranges were measured by QE Pro spectrometer (Ocean Insight B.V., Duiven, The Netherlands) and NIR Quest spectrometer (Ocean Insight B.V., Duiven, The Netherlands), respectively, with a tungstenhalogen broadband light source (HL-2000-HP, Ocean Insight B.V., Duiven, The Netherlands) of emission from 350 to 2400 nm.The two spectrometers were set to acquire five repeats consecutively during one acquisition of 2 s for a single measurement location.The fiber optic probe was a 1-to-4 fan-out bundle with equidistant source-detector separations of 0.63 mm (BF46LS01, 600-μm core, Low OH, Thorlabs, Munich, Germany).

Data Preprocessing
Each EWDRS data entry was obtained by averaging over the five repeats followed by intensity calibration and splicing the two spectra together using spline interpolation.The wavelength bin widths were ∼0.76 nm in the 355-to 1100-nm range and 1.6 nm in the 1100-to 1850-nm range per channel.All measurements were calibrated against a reflectance standard (FWS-99-01c, Avian Technologies LLC, New London, United States) and corrected for dark counts pre and post each experiment using the standard method S raw −S bkgd S ref −S bkgd .Savitzky-Golay (SG) data smoothing filter was applied with a frame size of 5 and polynomial order of 2 for denoising. 37,38The EWDRS dataset, containing numerical features and categorical labels, was structured into an m × n matrix, where m represented the number of DRS data entries and n represented the number of features.There were 5215 data entries and 1531 features subjected to further analysis.The data entries were treated as mutually independent entities.

Classification Models
Classification models were trained via the one-versus-rest (OVR) binary approaches.The positive class (one) was defined as bone cement, and the negative class (rest) was collectively defined for bone marrow, cartilage, cortical bone, and trabecular bone in the first scenario (boneCement versus rest).In the second scenario, the positive class was cortical bone, and the rest included bone marrow, cartilage, muscle, and trabecular bone (cortBone versus rest).Six models were included to compute balanced accuracy using all features: • logistic regression (LogReg) representing a baseline linear regression model used for classification, • LDA representing a baseline linear classification model, • random forest (RF) representing an ensemble bagging classifier, • k-nearest neighbors (KNNs) representing an instance-based classifier, • Gaussian Naïve Bayes (GNB) representing a probabilistic classifier, and • support vector machine (SVM) representing a maximum margin classifier.

The balanced accuracy is the arithmetic mean of sensitivity and specificity in binary classification
Table 1 Total number of tissue specimens and DRS measurements used in the analysis.Balanced accuracy ¼ where TP is the true positive, TN is the true negative, FP is the false positive, and FN is the false negative.TP represented the instances of bone cement and cortical bone being correctly identified in the two scenarios, respectively.Default parameters were used.The classification model generating the highest balanced accuracy was chosen as the classifier to evaluate the quality of FS.Balanced accuracy was treated as a metric for quality check, where features of lower scores were discarded.Globally, the percentage of holdout dataset was stratified 20%.Global cross-validation (CV) was stratified 10-fold with 10 repeats.Stratified sampling was used to preserve the same class distribution of features in the training, validation, and holdout datasets as the complete dataset.The six models were trained, validated, and tested using the complete EWDRS dataset.

Feature Selection
FS techniques were implemented via the OVR binary approach to achieve the highest discriminative power.An overview of FS process is shown in Fig. 2. Algorithms involving data transformation were incorporated with optimization techniques to permit FS using the original features.The final selection subset was pre-defined to contain 10 features.The following subsections described the implementation of PCA, LDA, and backward interval PLS (biPLS) into independent FS frameworks.0][41][42] An ensemble framework of FS 43 was subsequently formulated for comparison.

Feature selection frameworkprincipal component analysis
PCA is an unsupervised technique that projects the original data based on maximized variance onto a lower dimension via an orthogonal linear transformation.The resulted principal components (PCs) from PCA transformation, ranked by descending variance, might not bear discriminative power in the same descending order for the classification problem. 44Simulated annealing (SA), 45 a stochastic optimization algorithm, was therefore implemented to search for a set of PCs that approximated the global optimum of a user-defined cost function.LDA was selected as the cost function due to its simplicity.Regularization parameters in SA were tuned by trial and error for the data domain.To minimize computational cost, the first 30 out of 1530 PCs were initially selected followed by SA optimization with 600 iterations to further select 10 PCs that jointly produced the highest LDA classification accuracy.SA optimization would be inactivated if a 100% accuracy score was generated during the first iteration.The next step involved selecting peaks from the 10 eigenvectors plotted against features based on user-defined inclusion criteria, which comprised of removal of duplicates, features within 20 nm horizontally and below 20% normalized intensity vertically.The final 10 features were determined by ranking and removing collinearity that was identifying the most relevant feature to the class label by ranked ANOVA

Feature selection frameworkbackward interval partial least squares
biPLS is one variation of backward variable elimination PLS methods for FS.The basis is to compare the PLS root mean square error of CV (RMSECV) of the N-1 intervals with the baseline RMSECV and eliminate the feature interval whose removal yields the lowest RMSECV in a recursive way. 46The baseline RMSECV was calculated using all features.Optimization of PLS components was performed for each PLS regression with 30 total components to minimize computation.RMSECV calculation was iterated over the number of intervals, the size of inter- The most relevant feature to the class label was calculated by ranked PLS-variable importance in prediction scores. 47The flowchart of biPLS FS framework can be found in Fig. S2 in the Supplementary Material.

Feature selection frameworkensemble
Ensemble FS (EFS) is a learning framework that combines different FS algorithms to achieve improved outcomes by avoiding bias, merging advantages, and compensating disadvantages of individual FS methods.The EFS framework incorporated three univariate filtering methods, which were minimum redundancymaximum relevance (mRMR) 48 that measures the most correlation with a class; mutual information (MI) 49 that measures the amount of information gain of one variable (feature) given the other known variable (class label); and ReliefF 50,51 that measures conditional dependencies between the k-nearest neighbors (KNNs) of one feature in an instance-based manner.Selected features from the union of outputs by mRMR, MI, and ReliefF were input into Boruta-RF, 52,53 which is a wrapper method around RF that utilizes shadow features to compute feature importance.The final feature subset could be determined by (1) selecting the top 10 features from Boruta-RF rank based on the ranked ANOVA F-values, resulting in a narrow spectral band (ensemble SB) or (2) selecting the top 10 least collinear features out of the Boruta-RF rank, which was computed via the same collinearity removal process in the PCA FS framework (ensemble LC).SHAP, [54][55][56][57][58][59] one of the state of the art tools in ML explainability, was performed to explain and quantify individual contributions of selected features to model prediction base on cooperative game theory.The SHAP value was proposed by Lundberg and Lee as a unified measure to represent additive feature importance, which was the average outcomes of marginal contributions from individual features over all possible feature permutations. 36The SHAP algorithm in the EFS framework was based on ensembles of trees.There were two optional optimization steps, including partition of the training dataset and aggregation of the final selection.Partition involved segmenting the training dataset into multiple packets vertically by features or horizontally by samples, or both, 60 and aggregation involved searching features from all vertical partitions to form a final subset that satisfied user-defined criteria.In the EFS framework, vertical partition by features was adopted to divide the training dataset into three spectral ranges that were approximately the VIS range of 355 to 700 nm, the NIR range of 700 to 1000 nm, and the SWIR range of 1000 to 1850 nm.Horizontal partition was indirectly performed by iterating over different compositions of CV.Aggregation was achieved by incrementally adjoining the VIS, NIR, and SWIR ranges, resulting in the VIS range of 355 to 700 nm, the VIS/NIR range of 355 to 1000, and the VIS/NIR/SWIR range of 355 to 1850 nm.The flowchart of EFS framework is shown in Fig. 3.

Results
The classification models were benchmarked on the complete dataset as a reference of the achievable maximal accuracy, followed by performance comparison of the top 1 and 10 wavelength features selected by the four FS frameworks.Further investigations examined FS in different spectral ranges and the tradeoff between the number of selected features versus model accuracy.Model explainability was explored to understand individual feature contributions to classification prediction.

Classification Models
The six classification models were trained with CV on the train/validation and tested on the holdout datasets containing all features [Table 2].Two decimal places were retained for balanced accuracy.LDA was subsequently used to assess the quality of FS due to outperformance.

Ensemble Framework of Feature Selection
The final 10 wavelength features overlaid on the EWDRS spectra in the VIS, VIS/NIR, and VIS/NIR/SWIR ranges are shown in Fig. 5 with balanced accuracy scores in Table 3 (top) and (bottom).Optimal performance was discovered using ensemble LC and biPLS FS frameworks based on the high averaged accuracy over the two holdout sets for the top 1 and 10 features.The latter framework, however, required excessive computational effort.The final subset generated by ensemble SB collectively yielded a narrow spectral band, making its deployment comparable to using the top feature.rest, the model prediction was more likely to be bone cement when the DRS signal intensity increased for all the 10 wavelength features.For cortBone versus rest, a positive correlation was illustrated at wavelength features 1209 and 900 nm whilst wavelengths 1629 and 1850 nm displayed an inverse correlation with DRS signal intensity.

Discussion
The EWDRS dataset, created by measuring DRS in ovine specimens of tissue types commonly encountered in orthopedics-related surgery, was explored to achieve the primary aims.In this study, we implemented PCA, LDA, and PLS into discovery-orientated FS frameworks and constructed an EFS framework for comparison to understand domain knowledge and determine an optimal subset of wavelengths with high discriminative power for the positive class.The feature inclusion criterion of 20-or 30-nm separation enforced selection of unique features by matching the full width at half maximum (FWHM) of an average light-emitting diode (LED), which was further refined by sequential ranking and removal of high correlation.The moving-window approach also alleviated the effect of multicollinearity by separating wavelength features using interval partitioning and manual deletions of adjacent features.By applying multicollinearity reduction, the final selected wavelengths were sufficiently distinguishable by LED source(s) of certain limited bandwidths.On the other hand, multicollinearity offers the advantage of selecting multiple different feature subsets with similar classification outcomes, enabling flexibility in hardware design.This behavior was demonstrated by clusters of the selected features from all four FS frameworks.The disadvantage is also evident that it deteriorates model interpretability especially in models involving data transformation due to lost physical representations and less inferential model coefficients.The wavelengths of light sources selected by transformed coefficients became less reliable in practice.The EFS framework comprising of three univariate filters and one tree model was created to resolve the issue.The 10 wavelength features by the EFS framework offered comparable balanced accuracy to using all features, which could be reduced to 3 to 4 wavelength features.An optimized FS framework for subsequent deployment should be implemented with feature engineering to merge collinear features into one new feature, therefore decreasing computational burden and redundant outcomes.The tradeoff between the number of wavelength features and classification accuracy should also be considered to optimize instrument complexity and cost.Other conventional methods, such as first and second derivatives, could also provide decisive information for FS in spectroscopy.
The four FS frameworks considered correlations among the final 10 selected wavelengths.In general, prominent absorption peaks of biomarkers were identified, such as collagen, lipid, and water in the SWIR range as well as different forms of hemoglobin (Hb) in the VIS/NIR range.For both scenarios, absorption regions for lipids at 1210 nm, collagen at 1200, 1500, and 1725 nm, and water at 1440 nm were recognized with some contribution from Hb at 576 nm. 61The lack of tissue chromophores in the non-biological specimens contributed to the selection of wavelength features in the spectral region with stronger absorption from biological specimens by amplifying the difference in signal intensity.Furthermore, FS frameworks implementing PCA, LDA, and biPLS systematically selected one absorption peak of lipid as the most relevant features in cortBone versus rest, including 1211, 1188, and 1207 nm of <25 nm apart and statistically insignificant difference due to limited resolution of the DRS system [Table 3(top)].The fine resolution calculated by FS frameworks might not be precisely characterized by hardware because of the limited FWHM of an average LED.Larger FWHM widths could be preferred over laser diodes of narrower bandwidths.By encompassing more wavelengths, such as the band in ensemble SB covering 1200 to 1216 nm [Fig.5(c)], the balanced accuracy was improved from 88% to 94%.For boneCement versus rest, PCA and LDA FS frameworks both selected wavelengths at around 1460 nm corresponding to one absorption peak of water.The biPLS FS framework chose the top wavelength in the farther SWIR region at 1800 nm with no selection around 1460 in the final subset.Balanced accuracy was nevertheless comparable.Such observations demonstrated multicollinearity and implied drawbacks of utilizing core algorithms based on data transformation.The initial pool of all-relevant features generated by the algorithms might not translate to high classification scores in the original data space even with optimization techniques.Second, collinear features were removed from the pool by discarding adjacent wavelengths of similar discriminative strength, leading to a final subset that contained relevant features of minimal predictive power.Wavelength selection in Fig. 4 included relevant features corresponding to non-specific absorbers in biological tissues presumably due to the difference in signal intensity in the original data space.Features selected by the LDA FS framework covered the 800 to 1000 nm range, over which prominent tissue chromophores demonstrate broader absorption spectra.Moreover, LDA was used in both classification and FS, contributing to the highest balanced accuracy for the 10-wavelength classification with improved model performance in Table 3 (top).The final subset of wavelength features tended to fall in the SWIR region of 1100 to 1800 nm, especially for boneCement versus rest, with some selection from the VIS region of 400 to 600 nm [Figs.4 and 5], suggesting that the higher level of signal intensity in addition to the absorption signatures contributed greater predictive power than absorption features alone.During orthopedic surgery with less active bleeding, this behavior demonstrated utility of the SWIR region as the illumination wavelengths over which Hb exhibited minimal absorption and offered advantages to reducing the effect of blood contamination on DRS measurements.
In a similar vein, the final subset from ensemble LC included relevant features with minimal predictive power, rendering the order of predictive features within the rank crucial.In Figs.6(a)-6(c), the balanced accuracy curves reached the first plateau with inclusion of the first 3 to 4 features though the increase was statistically insignificant in Fig. 6(c).For cortBone versus rest, the highest balanced accuracy was reached in the VIS/NIR/SWIR range.Increase of the classification score was observed as more features in the NIR and SWIR ranges were added stepwise for evaluation.A comparable pattern was shown for boneCement versus rest with a final 100% balanced accuracy individually achieved in the three spectral ranges.The least collinear features however formed clusters at certain wavelengths in the VIS and VIS/NIR ranges for both scenarios [Figs.5(b) and 5(d)], suggesting that these spectral ranges contained fewer distinct features of high discriminative power.The EFS framework likewise selected spectral bands at maximized signal intensity inside the VIS (top) and VIS/NIR (middle) ranges for both scenarios in Figs.5(a) and 5(c).Contribution of Hb absorption at around 550 nm was only deemed important with multicollinearity removal as seen in the VIS and VIS/NIR ranges by comparing ensemble LC with ensemble SB.In the literature, Gunaratne et al. 30 concluded that wavelengths with the most discriminative power largely existed in the spectral range of 370 to 470 nm and 800 to 1010 nm where Hb, water, and lipid could be major contributors to classification, corresponding to the findings in our study.
Limitations of this work stemmed from dataset creation, which was designed for data mining and measured in a static laboratory environment with two detection fibers measuring different acquisition volumes.The exact results might not be generalizable to other situations, such as an ongoing surgery or a different experimental setup generating new DRS datasets from other species of different ages.Under such circumstances, rigorous data preprocessing and experimental setup will be required to standardize and normalize across different datasets.New features or class labels will also need to be engineered to address external environmental factors.It is nevertheless believed that the FS frameworks can be adapted to new scenarios of changed class labels, given that assumptions of the algorithms are met.For future directions, wavelengths selected from the EFS framework, including 1210 and 1460 nm, will be validated in ex vivo studies and subsequently implemented in the optical probe as the light sources for further translational experiments in in vivo subjects.An ML system with continual learning will be established using one-class classification to identify bone cement.The initial train/validation dataset of bone cement in various conditions will be collected including ageing, blood contamination, and hydration at body temperature.

Conclusions
The present work described four different FS frameworks to select important wavelength features for tissue differentiation in orthopedics.Three FS frameworks were constructed by implementing conventionally used algorithms, including PCA, LDA, and PLS, while the other was formulated via an ensemble approach.All frameworks generated comparable results and model performance.The EFS framework produced more interpretable models with efficiency.The final subset of 10 selected features contained wavelengths corresponding to prominent absorption peaks of major tissue chromophores, such as Hb, water, lipid, and collagen at around 600, 1200, 1400, and 1500 nm, respectively, and wavelengths at maximized signal intensity difference.FS results set the groundwork to choose adequate light source(s) in the optical device for bone cement removal guidance in rTHA surgery.In the future, the frameworks will be adapted to various clinical applications to facilitate the determination of important wavelengths and biomarker sensitivity.

Fig. 1
Fig. 1 Ex vivo ovine tissue specimens, including (a) muscle mass; (b) femur containing bone marrow, cartilage, cortical bone, and trabecular bone; and (c) bone cement specimen.The EWDRS system in panel (d) shows the fiber optic probe, connecting to a broadband light source and two spectrometers, in contact with one tissue specimen.White circle: fiber channel to light source, green circles: fiber channels to spectrometers, and black circle: not in use.
T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 7 ; 7 3 6

Fig. 2
Fig.2Overview of a typical FS workflow.

F
-values and sequentially discarding the most correlated feature to the most relevant feature based on Pearson correlation coefficients.The flowchart of PCA FS framework can be found in Fig. S1(a) in the Supplementary Material.

2. 4 . 2
Feature selection frameworklinear discriminant analysisLDA is a supervised technique that searches for a linear combination of features based on maximized class separation.To implement LDA in an FS workflow, a moving-window approach was utilized by setting a series of feature intervals of custom lengths, including 25, 50, 75, 100, 150, 200, and 300 nm.The initial feature subspace was constructed by selecting the top feature within the interval based on the ranked LDA coefficients of this interval.The computation process was iterated over all interval sizes and across the full feature domain.Duplicates and features within 30 nm of each other were discarded.The top 75% of initial feature subspace was chosen similarly by sequential removal of ranked feature collinearity, where the most relevant feature to the class label was determined by ranking LDA coefficients of the initial feature subspace.The final feature subset was calculated by performing SA optimization to search for the 10 features collectively producing the highest LDA classification accuracy.SA optimization was always activated with 2000 iterations to reach convergence.The flowchart of LDA FS framework can be found in Fig.S1(b) in the Supplementary Material.
vals, and the different CV shuffles of the training dataset across the full feature domain.The custom sizes of feature intervals included 20, 40, 60, 70, 80, and 95 nm.Five different CV shuffles of the training dataset were generated randomly and kept consistent.The retained feature intervals from each iteration received one vote.The features with the highest votes were deemed important.The final feature set was determined by searching the top 10 least collinear features.

Fig. 3
Fig. 3 EFS framework.Optional steps are labeled with asterisks, including training dataset partition, sequential removal of collinear features by ranking, and aggregation of results from all partitions.Acronyms: mRMR for minimum redundancymaximum relevance, MI for mutual information, Boruta-RF for Boruta random forest, SHAP for SHapley Additive exPlanations, and clf for classification.

3. 2
PCA, LDA, and biPLS Feature Selection Frameworks The final 10 selected wavelength features from FS frameworks of PCA, LDA, and biPLS overlaid on the EWDRS spectra for boneCement versus rest and cortBone versus rest are shown in Fig.4.Balanced accuracy scores are shown in Table3(top) with the most relevant features to classification and the average computation time to execute one framework.

Fig. 4
Fig. 4 The final 10 selected features from FS frameworks of PCA (top), LDA (middle), and biPLS (bottom) on the EWDRS spectra for boneCement versus rest (left) and cortBone versus rest (right).Standard deviations are shown in shaded error bands.The arrows highlight spectra of different class labels in subfigures for each scenario.Acronyms: boneCement for bone cement, boneMarrow for bone marrow, cortBone for cortical bone, traBone for trabecular bone, PCA for principal component analysis, LDA for linear discriminant analysis, and biPLS for backward interval partial least squares.

Figures 6 (
Fig. 5 The final 10 selected wavelength features from ensemble SB as spectral band (a,c) and ensemble LC as the least collinear features (b,d) for boneCement versus rest (a,b) and cortBone versus rest (c,d) in the VIS (top), VIS/NIR (middle), and VIS/NIR/SWIR (bottom) spectral ranges, respectively, on the EWDRS spectra.Standard deviations are shown in shaded error bands.Acronyms: boneCement for bone cement, boneMarrow for bone marrow, cortBone for cortical bone, and traBone for trabecular bone.

Fig. 6
Fig. 6 Balanced accuracy computed by LDA with CV plotted against sequential inclusion of the ranked least collinear wavelength features for boneCement versus rest (top x -axis) and cortBone versus rest (bottom x -axis) in panels (a) VIS, (b) VIS/NIR, and (c) VIS/NIR/SWIR ranges.The underscored features indicated the two wavelengths approaching the first plateau.Distribution of individual feature contributions on the prediction by SHAP analysis is shown for (d) boneCement versus rest and (e) cortBone versus rest.The 10 features were ranked on the vertical axis in descending order of their contributions to prediction.The horizontal axis illustrates the impact level of individual features of high or low value related to positive or negative prediction.Acronyms: LDA for linear discriminant analysis, CV for cross-validation, and SHAP for SHapley Additive exPlanations.

Table 2
Summary of the balanced accuracy scores on the holdout dataset generated by LogReg, LDA, RF, KNNs, GNB, and SVM and the average computation time to train one model.

Table 3
Summary of the balanced accuracy scores using the top 1 and 10 wavelength features generated from PCA, LDA, biPLS, and EFS frameworks for boneCement versus rest and cortBone versus rest, the averaged computation time for one entire framework (top), and the scores generated by EFS framework on the holdout dataset in the VIS and VIS/NIR ranges (bottom).Accuracy using the top 1 wavelength was not applicable for ensemble SB.