Saliva-based detection of COVID-19 infection in a real-world setting using reagent-free Raman spectroscopy and machine learning

Abstract. Significance The primary method of COVID-19 detection is reverse transcription polymerase chain reaction (RT-PCR) testing. PCR test sensitivity may decrease as more variants of concern arise and reagents may become less specific to the virus. Aim We aimed to develop a reagent-free way to detect COVID-19 in a real-world setting with minimal constraints on sample acquisition. The machine learning (ML) models involved could be frequently updated to include spectral information about variants without needing to develop new reagents. Approach We present a workflow for collecting, preparing, and imaging dried saliva supernatant droplets using a non-invasive, label-free technique—Raman spectroscopy—to detect changes in the molecular profile of saliva associated with COVID-19 infection. Results We used an innovative multiple instance learning-based ML approach and droplet segmentation to analyze droplets. Amongst all confounding factors, we discriminated between COVID-positive and COVID-negative individuals yielding receiver operating coefficient curves with an area under curve (AUC) of 0.8 in both males (79% sensitivity and 75% specificity) and females (84% sensitivity and 64% specificity). Taking the sex of the saliva donor into account increased the AUC by 5%. Conclusion These findings may pave the way for new rapid Raman spectroscopic screening tools for COVID-19 and other infectious diseases.


Fig. S1.
A Raman spectrum from lipstick contamination in a saliva supernatant sample from a COVID-19 negative volunteer. Although the saliva donor was wearing a red lipstick from Max factor, the Raman spectrum is remarkably similar to that of red Bourjois 15 (pure and not in saliva) which has peaks at or one wavenumber away from 747, 1183747, , 1229747, , 1266747, , 1366747, , 1492747, and 1604 Table S1. Available viral load data of COVID positive samples for all samples for which we were able to obtain viral load data from the testing centre

Fig. S3.
Raman spectra from a) ammonium nitrate, b) potassium chloride and c) glucose. Spectra were taken using 785 nm excitation with a Renishaw InVia Raman spectrometer from solid compounds placed on an aluminum slide.

Fig. S4.
Raman spectra from a) bovine serum albumin, b) potassium phosphate and c) lactic acid. Spectra were taken using 785 nm excitation with a Renishaw InVia Raman spectrometer from solid compounds placed on an aluminum slide.

Fig. S5.
Raman spectra from a) sodium chloride, b) urea and c) potassium citrate. Spectra were taken using 785 nm excitation with a Renishaw InVia Raman spectrometer from solid compounds placed on an aluminum.

Fig. S6.
Raman spectra from a) bovine submaxillary mucin and b) human mucin I. Spectra were taken using 785 nm excitation with a Renishaw InVia Raman spectrometer from solid compounds placed on an aluminum slide.  Table S4.
Characteristics of spectra used in generation of predictive models in this study. "Spectra type" refers to whether all or a cropped region of the spectra was used in the model. "Crop" refers to spectra that have had the region with high variance before 1100 cm -1 removed. "Region" refers to which region of the dried droplet the spectra were acquired from. In the column for sex, "M" refers to males and "F" refers to females. "People" refer to number of volunteers and hence the number of saliva samples involved in each model.  Table S5. Area under curve (AUC) values for receiver operating characteristic (ROC) curves generated using both MILES and MILDM in the study, produced for predictive models listed in Table S4.
Mean AUC with the true label differs from the AUC given in Table S4 because AUCs in S4 are using optimized hyperparameters .  S7. Histograms plotting each area under curve (AUC) calculated from receiver operating characteristic (ROC) curves for classification models with random and true data labels. 96 classification models were run using random labels (orange bars) compared to true labels (blue bars). These models discriminate between Raman spectra from dried saliva droplets from males based on COVID status using spectra taken between 1100 and 1726 cm -1 . (A) Model 1 used spectra taken from the "edge" region of a dried droplet, (B) Model 2 used spectra from the "on crystal" region. Figures on the left side show results using multiple instance learning (MILES) and figures on the right side show results using multiple instance learning with discriminative mapping (MILDM).

Fig. S8.
Histograms plotting each area under curve (AUC) calculated from receiver operating characteristic (ROC) curves for classification models with random and true data labels. 96 classification models were run using random labels (orange bars) compared to true labels (blue bars). These models discriminate between Raman spectra from dried saliva droplets from males based on COVID status using spectra taken between 1100 and 1726 cm -1 . (A) Model 3 used spectra taken from the "edge" region of a dried droplet, (B) Model 4 used spectra from the "on crystal" region. Figures on the left side show results using multiple instance learning (MILES) and figures on the right side show results using multiple instance learning with discriminative mapping (MILDM).

Fig. S9.
Histogram plotting each area under curve (AUC) calculated from receiver operating characteristic (ROC) curves for classification models with random and true data labels. 96 classification models were run using random labels (orange bars) compared to true labels (blue bars). Model 5 discriminated between Raman spectra from saliva samples based on COVID status using spectra from the "edge" region for both sexes taken between 602 and 1726 cm -1 . Figure on the left side shows results using multiple instance learning (MILES) and figure on the right side shows results using multiple instance learning with discriminative mapping (MILDM).

Fig. S10.
Histograms plotting each area under curve (AUC) calculated from receiver operating characteristic (ROC) curves for classification models with random and true data labels. 96 classification models were run using random labels (orange bars) compared to true labels (blue bars). These models discriminate between Raman spectra from dried saliva droplets from both sexes based on COVID status using spectra taken between 1100 and 1726 cm -1 . (A) Model 6 used spectra taken from the "edge" region, (B) Model 7 used spectra from the "on crystal" region and (C) Model 8 used spectra from the "off crystal" region. Figures on the left side show results using multiple instance learning (MILES) and figures on the right side show results using multiple instance learning with discriminative mapping (MILDM).

Fig. S11.
Histograms plotting each area under curve (AUC) calculated from receiver operating characteristic (ROC) curves for classification models with random and true data labels. 96 classification models were run using random labels (orange bars) compared to true labels (blue bars). These models discriminate between Raman spectra from dried saliva droplets from COVID-negative volunteers based on sex at birth taken between 1100 and 1726 cm -1 . (A) Model 9 used spectra taken from the "edge" region of a dried droplet from both sexes, (B) Model 10 used spectra from the "on crystal" region. Figures on the left side show results using multiple instance learning (MILES) and figures on the right side show results using multiple instance learning with discriminative mapping (MILDM).

Fig. S12.
Histograms plotting each area under curve (AUC) calculated from receiver operating characteristic (ROC) curves for classification models with random and true data labels. 96 classification models were run using random labels (orange bars) compared to true labels (blue bars). These models discriminate between Raman spectra from dried saliva droplets from COVID-negative volunteers based on whether symptoms were classed as respiratory or non-respiratory taken between 1100 and 1726 cm -1 . (A) Model 11 used spectra taken from the "edge" region of a dried droplet from both sexes, (B) Model 12 used spectra from the "on crystal" region. Figures on the left side show results using multiple instance learning (MILES) and figures on the right side show results using multiple instance learning with discriminative mapping (MILDM).

Figure S13:
Machine learning model discriminating between COVID-negative and positive saliva supernatant from males using "off crystal" Raman spectra from dried droplets. (A) Upper frame shows SNV-normalized, baseline corrected Raman spectra from all volunteers. Mean COVIDnegative spectra (n = 20, at least 8 spectra per volunteer) are shown in black and COVIDpositive spectra (n = 15, at least 8 spectra per volunteer) are shown in red. Bottom frame shows the standardized Raman spectra, where each individual feature has 0 mean and unit variance. (B) Receiver operating curve (ROC) for these models with sensitivity and specificity.

Figure S14:
Machine learning model discriminating between COVID-negative and positive saliva supernatant from females using "off crystal" Raman spectra from dried droplets. (A) Upper frame shows SNV-normalized, baseline corrected Raman spectra from all volunteers. Mean COVID-negative spectra (n = 18, at least 9 spectra per volunteer) are shown in black and COVIDpositive spectra (n = 16, at least 9 spectra per volunteer) are shown in red. Bottom frame shows the standardized Raman spectra, where each individual feature has 0 mean and unit variance. (B) Receiver operating curve (ROC) for these models with sensitivity and specificity.

Fig. S15.
Machine learning model discriminating between female and male saliva supernatant from COVID-negative volunteers using (A-C) "edge" and (D-F) "on crystal" Raman spectra from dried droplets: (A, D) Upper frame shows SNV-normalized, baseline corrected Raman spectra from all volunteers. Variance is shown by pale lines (variance of mean spectrum from each individual) and main features used in model building designated by dotted lines. Mean female spectra (n = 18, at least 9 spectra per volunteer) are shown in black and mean meale spectra (n = 20, at least 9 spectra per volunteer) are shown in red. Bottom frame shows the standardized Raman spectra, where each individual feature has 0 mean and unit variance. (B, E) Receiver operating curve (ROC) for these models with sensitivity and specificity. (C, F) List of features used in model building and their assignments as determined using compounds in model saliva and from literature.

Fig. S16.
Machine learning model discriminating between respiratory and non-respiratory saliva supernatant from volunteers using (A-B) "edge", (C-D) "on crystal" and (E-F) "off crystal" Raman spectra from dried droplets: (A, C, E) Upper frame shows SNV-normalized, baseline corrected Raman spectra from all volunteers. Variance is shown by pale lines (variance of mean spectrum from each individual) and main features used in model building designated by dotted lines. Mean non-respiratory spectra (n = 23, at least 9 spectra per volunteer) are shown in black and respiratory spectra (n = 44 for edge, 43 for on crystal at least 9 spectra per volunteer) are shown in red. Bottom frame shows the standardized Raman spectra, where each individual feature has 0 mean and unit variance. (B, D, F) Receiver operating curve (ROC) for these models with sensitivity and specificity.