Molecular component distribution imaging of living cells by multivariate curve resolution analysis of space-resolved Raman spectra

Abstract. Label-free Raman microspectroscopy combined with a multivariate curve resolution (MCR) analysis can be a powerful tool for studying a wide range of biomedical molecular systems. The MCR with the alternating least squares (MCR-ALS) technique, which retrieves the pure component spectra from complicatedly overlapped spectra, has been successfully applied to in vivo and molecular-level analysis of living cells. The principles of the MCR-ALS analysis are reviewed with a model system of titanium oxide crystal polymorphs, followed by two examples of in vivo Raman imaging studies of living yeast cells, fission yeast, and budding yeast. Due to the non-negative matrix factorization algorithm used in the MCR-ALS analysis, the spectral information derived from this technique is just ready for physical and/or chemical interpretations. The corresponding concentration profiles provide the molecular component distribution images (MCDIs) that are vitally important for elucidating life at the molecular level, as stated by Schroedinger in his famous book, “What is life?” Without any a priori knowledge about spectral profiles, time- and space-resolved Raman measurements of a dividing fission yeast cell with the MCR-ALS elucidate the dynamic changes of major cellular components (lipids, proteins, and polysaccharides) during the cell cycle. The MCR-ALS technique also resolves broadly overlapped OH stretch Raman bands of water, clearly indicating the existence of organelle-specific water structures in a living budding yeast cell.


Introduction
Raman microspectroscopy, Raman spectroscopy under a microscope, is now widely used in molecular-level investigations in various fields of bioscience and biotechnology. [1][2][3][4][5] It is now well established as a strategic analytical tool in these fields. From its nature, Raman spectroscopy does not require any sample pretreatment such as dye labeling or genetic manipulation and is low invasive. It is therefore highly suitable for in vivo analysis of living cells. For example, it possesses a high potential in lowinvasive screening of living cells in regenerative medicine, where the safety in the re-use of screened cells is the most crucial issue. With sub-μm spatial resolution, Raman microspectroscopy provides space-resolved information on a molecular structure and their distribution inside the cell. In the past two decades, a number of publications have shown the successful application of Raman microspectroscopy to label-free molecular-level analysis of living cells and to discrimination of cell types. [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] Although Raman spectra contain rich information on molecular structure, detailed interpretation of measured spectra is often difficult because of their complexity. Each Raman spectrum obtained from space-resolved mapping measurements is usually interpreted as a superposition of several spectral components of biomolecules, as well as a background and fluorescence. In order to decompose the complicated spectra into tractable component spectra, a number of chemometric methods have been developed and applied to the analysis of Raman spectra and images. Cluster analysis (CA) is one of the widely used decomposition methods. 9,[24][25][26][27][28] With the CA method, by statistically analyzing the spectral variations, distinct subsets of similar spectra are obtainable. It has been applied to diagnostic tissue discrimination 24 and subcellular structure imaging. 9 In the CA method, the number of clusters is important. 27,28 A small cluster number may result in false allocation of some raw spectra. A large number of clusters are often needed to achieve a relevant segmentation, which complicates the interpretation. Principal component analysis (PCA) is also widely used. 25,[27][28][29][30][31] This method gives an orthogonal set of dominant spectral components, called principal components (PCs), as a result of matrix factorization. Each spectrum of original data can be expressed as a linear combination of PCs. PCA method has also been proved useful for the construction of molecular images from decomposed spectral component 1,31,32 as well as for the discrimination of tissue or cell types. 2,13,33,34 However, physical bases of this method are rather obscure; the PCs exhibit positive and negative values, thus the physical meanings of decomposed spectral components are not clear just as they are.
Recently, the multivariate curve resolution-alternating least squares (MCR-ALS) method, also known as self-modeling curve resolution or non-negative matrix factorization, 35 has been developed and applied to the spectral decomposition of Raman spectra. 25,29,31,[36][37][38][39][40][41] It has also been exploited in many kinds of spectroscopy, high performance liquid chromatography, 42,43 gas chromatograph/mass spectrometry, 44 UV-VIS, 45 near-infrared, 46,47 FT-IR, 48,49 fluorescence imaging, 50 etc. In the MCR method, the experimental data is approximated by a linear combination of several spectral components. 51,52 The decomposition of superposed spectral data sets is done with ALS calculation, under appropriate model constraints, such as non-negativity of spectral profiles and their concentrations. Due to these constraints, this method easily provides physically interpretable spectral components, without any a priori information on chemical components in the sample specimen such as a living cell. Using this advantage, we have successfully applied the MCR-ALS method to the analysis of molecular component distribution imaging (MCDI) in living cells, whose raw spectra contain a number of unknown spectral components and are hard to interpret without a priori information. We have developed an in-house software for the MCR-ALS of numerous Raman spectral data. In the following, we first show the capability of the MCR-ALS method by using a model system consisting of two different types of crystal polymorphs of titanium oxide (TiO 2 ), anatase and rutile. MCDI of TiO 2 polymorphs is successfully obtained. Then, we show the MCR-ALS analysis of the time-and space-resolved Raman spectra of a dividing fission yeast cell. Unexpected dynamic changes of major cellular molecular components (lipids, proteins, and polysaccharides) during the cell cycle have been elucidated. Finally, we show the results of the study of intracellular water in a living budding yeast cell. We have successfully resolved so far unknown organelle-specific water structures by the MCR-ALS method. Highly important and otherwise unobtainable MCDI information has thus been obtained for the two living cell systems in vivo.

Method
In the MCR, the experimental data is approximated by a linear combination of several spectral components. In the matrix form, this approximation can be written as (1) where A is an experimental data matrix of spectra acquired at different measuring points, written as where m is the number of data points per spectrum along with the wavenumber axis and n is the number of spectra in the whole data set. A is decomposed to m × k matrix W, whose columns represent pure component spectra, and k × n matrix H, whose rows represent the intensity profiles of individual corresponding spectral components. k is the number of underlying constituents, which should be set a priori by the user or estimated by singular value decomposition (SVD) or PCA. In an MCR analysis, W and H are usually obtained by iterative refinement with MCR-ALS so that the Frobenius norm jjA − WHjj 2 is minimized, under the non-negativity constraints W ≥ 0 and H ≥ 0. These constraints come from the fact that, in physical terms, Raman spectra and their concentration profiles must be non-negative. Unlike other factorization methods such as SVD and PCA, the MCR-ALS does not require the orthogonality of each component but only requires their non-negativity. This results in the advantage of MCR-ALS providing solutions that are more straightforward to interpret. In practice, the initialization method and the additional constraints should be determined in advance. Several initialization methods are proposed. 25,[53][54][55][56][57] For example, the initial guess of the matrix W or H can be determined by random non-negative values, by an SVD-based manner, or SIMPLISMA. The number of components, k value, is determined by a priori information of the sample species or is estimated by the number of dominant singular values from an SVD analysis. The initialization method should be appropriately chosen by the variance of spectral data set and/or signal-to-noise ratio. In order to attain sufficient decomposition ability, in addition to non-negativity constraints, further constraints for ALS optimization can be helpful. 58,59 When ill-conditioned or singular W are given, where W contains similar spectral components, then H can be easily affected by the noise of raw data A, even though the optimization of Eq. (1) is achieved. Additional constraint terms can be applied to the ALS optimization; (jjA − WHjj 2 þ jjΓHjj 2 ) is minimized instead of jjA − WHjj 2 . In practice, L2-norm (ridge regression) and/or L1-norm (lasso regression) penalty term can be used for this purpose. With L2-norm penalty term β, the following equations are solved: where I is a k × k identity matrix. L1-norm penalty term α can be applied as follows: where E is a k × k matrix all of whose elements are unity. These equations are iteratively solved to obtain the optimized matrices H and W, respectively. The L2-norm regularization can provide preferable solutions even though the W T W or HH T matrix is singular, whereas the L1-norm regularization can provide sparser solutions. These L2-and L1-norm regularizations are efficacious in obtaining pure spectral decomposition and sparse MCDI, especially from complex sample species and low signalto-noise ratio Raman spectral data sets.

Example of TiO 2 Crystal Polymorphs Discrimination
Here, as a model system, type discrimination of TiO 2 crystal polymorphs, anatase and rutile, is performed using the MCR-ALS method. Anatase and rutile TiO 2 powder were mixed and placed on quartz cover slips. Space-resolved Raman spectra were obtained by using a 785-nm excitation Raman microspectroscopic system. Raman spectra of the powder mixture were collected over a 20 × 20 μm region with a 0.25-μm scanning interval, hence, 6400 spectra were obtained. Each Raman spectrum consisted of 1340 wavenumber points corresponding to the 1340 elements of a charge-coupled device detector. Figure 1 shows the representative Raman spectra. These spectra have a number of overlapped Raman bands, interpreted as the superposition of the two intrinsic Raman spectra of anatase and rutile TiO 2 shown in Fig. 2.
In the present MCR-ALS analysis, the Raman spectra sets were combined to make a 1340 × 6400 matrix. The number of pure spectral components was set to k ¼ 2, and the initial guess of 1340 × 2 matrix W was set by random numbers. In the ALS optimization, the constraints were set as follows: (1) W ≥ 0 and H ≥ 0, (2) L2-norm penalty term in Eq. (2) was set to be β ¼ 0.002, (3) L1-norm penalty term in Eq. (6) was set to be α ¼ 0.002. These equations were iteratively solved with non-negative matrix factorization algorithm, and in every iteration step, column vectors of the matrix W were all normalized. 60 After the 500 iteration cycle, ensuring that jjA − WHjj 2 converged to a sufficiently small value constant value, the pure spectral components were obtained as shown in Fig. 3.
The MCR-ALS based factorization successfully decomposes the raw spectral data sets into two pure component spectra, i.e., the components 1 and 2 spectra are identical to the Raman spectra of anatase and rutile, respectively [ Figs. 2 and 3(a)]. Based on this discrimination, each MCDI is constructed by rearranging the row vectors of H. As shown in Fig. 3(c), a clear distribution image has been obtained, providing the qualitative and quantitative information about the TiO 2 mixture sample. As shown here, the MCR-ALS technique automatically resolves, without a priori spectral information, the observed set of space-resolved Raman spectra into physically interpretable spectra of the two polymorphs.

Analysis of Living Cells
Taking advantage of the physically interpretable factorization, the MCR-ALS technique can be effectively applied to molecular-level analysis of living cells. Living cells are highly complicated molecular systems and contain a large number of spectral components with no a priori information. This is particularly the case with in vivo analysis. Furthermore, they contain many compounds that have similar molecular structures. Consequently, Raman spectra tend to show many overlapped bands. This situation makes it difficult to analyze living cell Raman spectra at a detailed molecular level. The MCR-ALS analysis has great advantages to overcome these difficulties: (1) It does not need a priori information on the spectral and concentration profiles. (2) Decomposed spectra are ready for physical and/or chemical interpretation. (3) Sparseness constraints can be effectively used to achieve high contrast MCDI.
The MCR-ALS analysis is capable of extracting dynamic information from living cells. Figure 4 shows the time-lapse MCDI of a single dividing Schizosaccharomyces pombe, fission yeast cell. 61 Raman mapping measurements were performed at 600 to 800 points (depending on the image size) at an interval of 0.5 μm and at nine different times (1, 2, 4, 6, 6.5, 10, 14, 18, and 22 h after inoculation of yeast cells into medium) in the cell cycle. The resultant 6885 Raman spectra were assembled to construct one A matrix; two spatial and one temporal dimensions were combined to a single dimension. The ALS optimization   was conducted with an SVD-based initialization (six SVD spectral components were used as the initial guess of W matrix) and L1-norm regularization, yielding sparse solutions. The resulting six components are denoted 1 to 6 as given in Figs. 4(a) and 4(b). It should be noted that the MCR-ALS optimization started with random initialization was in vain for decomposition in this complicated cell system. In order to avoid falling into a false local minimum rather than the global minimum, the initialization of W or H was a key step.
By the MCR-ALS method, the separation of background signals is easily carried out; the component 1 is interpreted as due to the background because it shows a featureless spectral profile [ Fig. 4 4 and 5] show time dependence that is totally different from that of lipids. The fact that we have two spectral components for proteins means that we have two groups of protein molecules with different structures (and hence different spectra) that show distinct time-and space-dependence during the cell cycle. We need additional information on those protein groupings in order to resolve this set of Raman spectra into more physically meaningful protein spectra. The origin of the component 6 is still unclear, although the MCR-ALS analysis ends up with much less clear results without this component. In this way, the intrinsic spectra and MCDI obtained from MCR-ALS has elucidated unknown and unexpected molecular-level dynamics taking place during the process of cell division.
Another study with the MCR-ALS shows the existence of organelle-specific water structures in a living budding yeast cell. 62 Water molecules inside a cell are believed to play critical roles in physiological processes, creating distinct structural and chemical properties as compared to bulk water. 63 A detailed intracellular water structural information in living yeast cells (diploid Saccharomyces cerevisiae) has been obtained using Raman microspectroscopy, in which the OH stretch Raman band of water is sensitive to the changes in the hydrogenbonding networks. In the following MCR-ALS analysis, 247 mapping measured Raman spectra ranging from 3100 to 3800 cm −1 (572 wavenumber points) were used as an input. From an SVD analysis, the number of components was determined to be five. For the initial guess of W, the bulk water spectrum was used as the only "fixed" component and the other spectral profiles were randomly set. The ALS iteration was performed with L2-norm regularization, which is known to be effective for MCR-ALS analysis of data sets including like component spectra.
The resultant five components are shown in Fig. 5. The five pseudocolor MCDIs are almost mutually exclusive and the five spectral components show varying relative intensities of the OH stretch bands. The component 1 corresponds to bulk water whose spectrum is fixed. Its spatial distribution shows high value outside the cell. The component 5 has a significantly lower intensity in the 3200 cm −1 OH stretch region as compared to the intensity in the 3400 cm −1 region, indicating a lower proportion of stable hydrogen-bonding network in component 5 than in bulk water. From the analysis of the average spectra in the fingerprint region (data not shown here), the components 2 to 5 have been indicated to originate from the cell wall, cytoplasm, nuclear, and lipid bodies, respectively. Thus, organellespecific water structures in living yeast cells are successfully retrieved and elucidated by using the MCR-ALS method. It is well recognized that water is the most difficult molecule to study in living cells. Only the combination of Raman microspectroscopy and MCR-ALS can provide this unique information of organelle-specific water structures in a cell in vivo.

Conclusion
As Raman microspectroscopy has been exploited in biomedical analysis, thorough interpretations of complicated spectra have often been beset with difficulties. Even a single cell has complicated subcellular structure containing a large number of components. Various multivariate methods like CA and PCA have thus been attempted to resolve the observed spectra into a pure spectral component, but with only limited success. Due to appropriate constraints, non-negativity and L1-norm regularization for fission yeast and non-negativity and L2-norm regularization for water, the MCR-ALS method can provide physically sound spectra and high-contrast MCDI of living cells as has been shown by the two examples described here.