The pluripotency and ability to proliferate indefinitely make human embryonic stem cells (hESCs) attractive to many applications, including regenerative medicine, tissue engineering, drug screening, and even cancer treatments. More recently, the first clinical trials with hESC derivatives have started in humans to evaluate the safety issues and tolerance for human patients with thoracic spinal cord injuries.1 Despite these advances, high efficiency differentiation and/or enrichment to produce homogeneous populations of the desired cell types still remains a challenge. This is of considerable concern since any contaminating undifferentiated hESCs have the potential to give rise to tumor-like growths after transplantation. Indeed, it is well known that undifferentiated hESCs form teratomas when injected into immuno-compromised mice.2 In addition, the presence of unwanted cell types may compromise the suitability of the transplants. For example, transplant of skeletal muscle cells instead of cardiomyocytes into the left ventricular wall of the heart can cause harmful arrhythmias.3 These concerns highlight the immediate need for noninvasive techniques capable of phenotypic identification and sorting of hESC progeny. Such label-free techniques could deliver high-purity cell populations with a desired phenotype suitable for therapy.
Fluorescence and magnetic activated cell sorting (FACS, MACS) are the standard tools of choice in stem cell sorting application. However, if the cells are to be used for therapeutic purposes, the applications of these techniques are restricted to cells that exhibit lineage-specific surface markers. For many cell types, such as cardiomyocytes (CMs), there are no appropriate surface markers, therefore FACS and MACS are not suitable for providing viable cells as required for clinical applications.
A unique feature of Raman micro-spectroscopy (RMS) is the ability to measure biochemical properties and provide functional imaging of live cells without requiring labeling or sample preparation.4, 5, 6 Compared to lasers in the visible range, using near-infrared laser (e.g., 785 nm) with powers below 150 mW, allows measurements of Raman spectra from live cells with reduced probability of inducing photodamage. Thus, cells can be investigated over extended periods of times under sterile physiological conditions (culture medium, temperature, etc).7 RMS was used to discriminate between undifferentiated hESCs and hESC-derived CMs with 66% accuracy.8 More recently, it was also demonstrated that RMS can identify spectral markers which can be used for identification of CMs within highly-heterogeneous cell populations as derived during differentiation of hESCs toward cardiac phenotype.9 The Raman spectral markers consisted of bands attributed mainly to glycogen and myofibril proteins, enabling classification of hESC-derived CMs with >97% specificity and >96% sensitivity against the immuno-fluorescence staining gold-standard (alpha-actinin).9
However, the main limitation of the RMS technique is the low efficiency of Raman scattering effect. Acquisition types of 2 to 10 min per cell have been reported,8, 9 which are indeed not practical for cell sorting applications. In this paper we investigated the feasibility of two measurement strategies that could dramatically reduce the acquisition time. These strategies are based on two key factors: intracellular spatial distribution of the Raman spectral markers and the level of uncorrelated measurement noise in the Raman spectra. These factors affect both the measurement time and the prediction accuracy of the RMS. The current analysis is performed on a database of Raman spectra collected from 50 CMs and 40 non-CMs derived from hESCs and maintained under physiological conditions during the measurements.9 The measurements were carried out on more than 20 cell culture flasks and both CMs and non-CMs were measured for each flask.
The first objective of this paper was to carry out an analysis that quantifies the effect of noise in the Raman spectra of individual cells on the prediction accuracy for the cardiac phenotype. For this purpose, the principal component analysis (PCA) model described previously9 was modified to determine the effect of acquisition time for Raman spectra on the classification sensitivity and specificity. For each cell included in the model, the Raman spectrum was calculated by averaging point spectra measured by raster scanning the cell through the laser focus at 2 μm steps (total of 625 data points) with an integration time of 1 s per point. These Raman spectra had a high signal-to-noise ratio (SNR) but required a total acquisition time of 10 min per cell. However, for prediction purposes, the sorting time could be reduced considerably if the Raman spectra of the sorted cells were acquired at the lowest acquisition time (lowest SNR) which would still provide the required classification accuracy (e.g., 95% sensitivity/specificity). Such a model is valid in a cell sorting scenario, where the prediction model is constructed well in advance using a database of high SNR spectra, while the sorting is performed at a high speed on low SNR Raman spectra.
To evaluate the effect of the noise in the Raman spectra of sorted cells, a 5-fold cross-validation (CV) algorithm was applied on the modified PCA model: 80% of Raman spectra (10 min per cell acquisition time) were used for building a PCA model while the remaining 20% of Raman spectra corresponding to acquisition times of 1 to 10 s per cell were used for prediction. Raman spectra corresponding to a typical CM and a non-CM obtained are shown in Fig. 1a, alongside the spectrum corresponding to the Raman spectral marker of CMs identified by the PCA model.9 Figure 1b shows the Raman spectrum of the same CM but with various signal-to-noise ratio values to simulate acquisition times between 1 to 20 s per cell. The main Raman bands at 860, 938, 1084, and 1123 cm−1 can be visually identified even in the spectrum with the shortest acquisition time of ∼1 s per cell.
The CV was performed for various acquisition times. The specificity and sensitivity for discrimination CMs at a target 100% specificity are shown in Fig. 2. The gray shaded areas represent the ± standard deviation of the sensitivity/specificity as calculated using different partitions for the CV. The specificity of 97.54% and sensitivity of 96.3% depicted as horizontal lines were obtained when the acquisition time of the Raman spectra for the prediction cells was similar to the acquisition time for the Raman spectra of the cells used for building the classification model. Despite imposing a highly specific regime with target 100% specificity as required for regenerative medicine (achieve cell populations with high levels of phenotypic purity), the spectral model shows a high resilience to noise up to the point corresponding to 6 s per cell acquisition time, after which the average predicted specificity drops below the 95% mark. However, the prediction sensitivity remains mostly unchanged regardless of the noise level.
Apart from acquisition time, the intracellular distribution of the Raman spectral markers can also affect the predictive performance of the classification model. Ideally, the measurement system should be optimized such as the sampling volume to match the volume of the cells (∼20 μm). However, such conditions are difficult to achieve in practice. A common approach is to use a low numerical aperture objective; however since the depth of the laser focus becomes larger than the thickness of the cell, the laser power density interacting with the cell molecules decreases thus the signal to noise of the Raman spectra becomes lower. To avoid under-sampling when using high numerical aperture objectives, a common practice involves raster scanning the cells through the laser focus at step sizes similar to the laser spot (2 μm), requiring ∼600 points to ensure full sampling. However, if the Raman spectral markers are distributed inside the cells in regions larger than the laser spot, the number of steps and the acquisition time could be reduced.
A surface angle mapping algorithm,10 was used to determine the spatial distribution of the Raman spectral marker for individual CMs. The maps corresponding to the Raman spectral marker show three typical spatial distributions (Fig. 3). Approximately 50% of CMs showed a circular distribution of the Raman marker around a focal center. For the remaining CMs, the Raman spectral marker was either concentrated in one place at one extremity of the cell (∼30% of CMs) or the distribution was split up in several smaller regions of higher concentrations toward the edge of the cells. Point Raman spectra from selected positions inside the three cells are also shown in Fig. 3. Raman bands corresponding to proteins (853 cm−1 assigned to C–C in proline and ring-breathing mode in tyrosine), and 936 cm−1 associated with C–C stretching of protein backbones) can be identified at cell positions where the alpha-actinin staining indicates a high concentration of myofibrils. Raman bands corresponding to DNA (O–P–O phosphodiester bonds at 788 cm−1) can also be identified in the point Raman spectra corresponding to the cell nuclei (DAPI staining). Certain cells also show regions for which Raman spectra have strong bands associated to lipids (C=C stretching at 1659 cm−1, CH2 deformation 1441 cm−1, CH2 twisting 1303 cm−1), cytoplasmic regions likely to correspond to lipid bodies.
Based on the Raman spectral marker distributions in Fig. 3, focusing the laser beam in a line across the cell and focusing the Raman scattered photons on the entrance-slit of the spectrometer6, 7, 8, 9, 10, 11 would significantly reduce acquisition time as no raster scanning of the cell or laser spot would be required. This approach is also better suited for cells passing continuously through micro-fluidic channels as in flow cytometry. To determine the feasibility of this method, laser-line sampling was simulated by selecting only 10 Raman spectra corresponding to a single row in the original raster scans (the equivalent acquisition time for such line Raman spectra is 10 s). The PCA model was modified and a 5-fold CV was carried out to evaluate the classification accuracy if each cell in the classification group were scanned only in a single line across the cell at a randomly chosen position. The CV model suggests that this sampling method would enable identification of CMs with 93.62±3.18% specificity and 90.36±3.22% sensitivity at a target specificity of 99%.
As it appears that glycogen could be used for discrimination of hESC-derived CMs, measurements in the lower frequency spectral range have been carried where glycogen has a strong band at 482 cm−1 while proteins show no contributions. Raman spectral map of PC1 and the map corresponding to the 482 cm−1 band area are very similar in hESC-derived CMs Figs. 4a, 4b, 4c. In addition, this band is not observed in the Raman spectra of non-CMs. Figure 4d shows that the 482 cm−1 can be visually identified in a Raman spectrum measured in a line scan (cell scanned through the laser spot) in only 1 s. Considering that this band is isolated from the other major Raman bands of the cells, it may be used in a single wavelength detection setup, which could greatly simplify and speed up the technique.
The results in this paper suggest that high classification accuracy (>95% specificity and sensitivity) for hESCs-derived cardiomyocytes can be achieved by single measurements of ∼5 s acquisition time per cell if the laser beam was focused to a ∼2 μm×20 μm line. This sampling model ensures detection of the Raman spectral markers specific to hESC-derived CMs and avoids the time-consuming raster scanning. However, in the future, this acquisition time could be reduced even further if higher power lasers were used. For example, continuous wave Ti:sapphire lasers with output power of >5 W at 785 nm are commercially available (e.g., 3900S Spectra-Physics, UK). The output power of such lasers is ∼20 times higher than in the current study, thus the laser beam could be expanded into a line of 1 μm×800 μm while maintaining the same laser power density as in the current study. The focused laser line could cover ∼40 parallel sorting flow columns enabling sorting speeds of approximately ∼8 cells per s. Such sorting speeds compare favorably with the speeds of FACS during its developmental phase.12 In addition, the continuous flow of cells through the laser line would also provide an increase in sampling volume, particularly for CMs for which the molecular markers are concentrated in one or several smaller regions at the edge of the cells, thus increasing the discrimination accuracy.
The authors acknowledge the financial support from the Biotechnology and Biological Sciences Research Council UK (BB/G010285/1).