*in vivo*fluorescence spectra.

## 1.

## Introduction

Medical diagnostic systems based on fluorescent imaging are envisioned to be noninvasive, easy to use, and cost effective. Fluorescent markers are injected into a patient, and bind specifically to targeted compounds, such as tumors.^{1} Several specific markers can be injected at once, and bind to different compounds or organs; that method is used to survey different biological processes or organs, such the evolution of carcinoma, or, for example, to measure blood flow. The region of interest is illuminated with near-infrared (NIR) light; an optimal wavelength range can be defined between 600 and
$900\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
, where tissue absorption is lower. The excitation wavelength is thus selected to ease the tissue penetration, and to optimally excite the injected markers. Finally, the emitted back fluorescence signal is measured and the fluorescent source is localized. So far, NIR fluorescence imaging is mainly used on small animals where some markers are available for injection, and where the layer of biological tissues to explore does not exceed a few centimeters. In such a case, a biological tissues intrinsic fluorescence, called autofluorescence,^{2} exists, but is insignificant compared to the fluorescent marker signal. To investigate thick media for medical diagnostic applications
$(\cong 4\phantom{\rule{0.3em}{0ex}}\mathrm{cm})$
, the autofluorescence must be considered. As the fluorescence signal becomes exponentially weak with the distance light travels, an auto-fluorescence signal remains constant and becomes a limiting factor. The analysis of a fluorescence signal impaired by autofluorescence may lead to a wrong localization of the markers; the signal must be preprocessed to remove autofluorescence.

Many algorithms to unmix fluorescence spectra and to filter autofluorescence have already been developed and tested on small animal examination equipments: methods such as nonlinear least squares,^{3} spectra subtraction,^{4} principal component analysis (PCA), independent component analysis, (ICA) and singular value decomposition (SVD) were proposed.^{5, 6, 7, 8} In particular, the ICA method requires sources to be statistically independent, which is not appropriate hypothesis for our fluorescence unmixing problem, and SVD considers orthogonal sources and allows negative values. Much *a priori* knowledge about the nature of the sources, such as sparsity or independence, is taken into account in these methods. But a principal that must be considered is nonnegativity. Many real-world data are nonnegative, as fluorescence data are, and the resulting unmixed fluorescence spectra have a physical meaning only when nonnegative.

In 1987, Henry^{9} raised the issue of the nonexistent nonnegativity constraint in the factor analysis algorithms (SVD, PCA, etc.). In light of this observation, many original algorithms were developed. One of them was the positive matrix factorization (PMF), developed by Paatero and Tapper^{10} in 1994, which uses alternative least squares (ALS) to minimize a chosen cost function. Expectation maximization^{11} (EM) also minimizes a cost function by the use of an auxiliary function. Finally, from those methods the nonnegative matrix factorization (NMF) was forged, notably investigated by Paaetero and Tapper, which gained popularity in 2001 through the works of Lee and Seung.^{12}

NMF is a useful matrix decomposition for multivariate data, that differs from the methods already cited (SVD, PCA) in that it forces the matrix factors to be nonnegative. Thanks to this distinctive feature, NMF plays a major role in a wide range of applications, such as in chemometrics (commonly known as self–modeling curve resolution^{13}), bioinformatics,^{14} neuroscience,^{15} text mining,^{16} or spectral analysis.^{17} In 2006, Gobinet
^{18} used the NMF decomposition on spectroscopic data to unmix several pure fluorescence spectra, and later, as a preprocessing step for Raman spectra analysis of biomedical samples.^{19} Since 2008, Xu
^{17} used and Xu and Rice^{20} have a multivariate curve resolution (MCR) method to separate different fluorescence spectra, including the autofluorescence, based on multispectral imaging data. The MCR model is also based on matrix factorization, and constrained to nonnegativity; Xu couple this method to the ALS optimization approach. Their method is probably the closest to our work, but sill differs from experimental setup used (selection of few emission wavelengths, and often several excitation wavelengths while, we use unique excitation wavelength and use the total emission spectrum from
$700\phantom{\rule{0.3em}{0ex}}\text{to}\phantom{\rule{0.3em}{0ex}}1000\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
, thanks to an imaging spectrometer) to optimization method suggested (ALS versus multiplicative update rules). As blind source separation problems are often depicted and solved under a matrix factorization form, the result strongly depends on the *a priori* knowledge we introduce. Work on matrix factorization and spectroscopy run by Gobinet
^{18} inspired our work, but fluorescence spectroscopy introduces fluorescent markers and thus *a priori* knowledge that was not taken into account in Gobinet’s work. Xu
^{17} work on *in vivo* small-animal imaging is close to our problem, but still different from deep imaging problem where autofluorescence may have same intensity level than specific fluorescence signal. Finally, as already underlined, both teams use different resolution methods.

For *in vivo* fluorescence spectroscopy, the unmixing problem is referred to as a blind source separation problem since fluorescence spectra may vary according to the fluorescent dye biological environment. Fluorescence spectra to separate are also supposed statistically dependent, which filters out many methods (such as ICA). Finally, the NMF algorithm seems to be, in many ways, more suitable in blind positive spectra separation than other separation methods. We propose to test this method on spectroscopic data and to define a new regularized NMF algorithm that may better suit fluorescence spectroscopy data in particular cases.

First, we introduce the spectra unmixing problem: from a mixed fluorescence signal composed of a known number of sources, we want to obtain separated contributions for each fluorescence source; the NMF decomposition is proposed to unmix fluorescence spectra, and the method is explained. Several NMF algorithms exist, based on diverse criteria to minimize and optimization methods: we present in the second part a classical NMF algorithm that deals with multiplicative update rules.^{12} As with all blind source separation methods, it is impossible to find a unique NMF decomposition. In a second part we study the nonuniqueness issue and suggest regularizations and *a priori* knowledge considerations to refine the solution set. Two axes are examined: the influence of initialization of matrices and regularization on initialization. Both issues are studied on a breast simulation case; a regularized NMF algorithm that selects appropriate initialization or takes into account some *a priori* knowledge on the fluorescence spectra we want to unmix is presented; associate original update rules are also detailed. Finally, we ran our NMF algorithm on experimental *in vivo* data with success. Two fluorescent markers—ICG loaded into nanoparticules (ICG-LNP) and Alexa 750—were placed on mice to simulate marked targets. Adding the autofluorescence signal, the NMF algorithm achieved the separation of three overlapping fluorescent sources. A second example deals with detection of a multidepth target: the same marker (ICG-LNP) at different depths is placed on a mouse, and deeper markers are mixed up with autofluorescence signal. The results of both experiments are presented in the last section.

## 2.

## NMF

## 2.1.

### NMF and Spectroscopy

Let us consider a fluorescence spectrum $\mathbf{m}$ measured in a medium that for example contains two kinds of fluorescence sources: fluorescence spectra ${\mathbf{s}}_{1}$ and ${\mathbf{s}}_{2}$ may overlap, but have distinct emission peaks. Thus, the measured fluorescence $\mathbf{m}$ is a mixture of both sources of the medium; if we call ${a}_{1}$ and ${a}_{2}$ the amount of respectively spectra ${\mathbf{s}}_{1}$ and ${\mathbf{s}}_{2}$ in $\mathbf{m}$ , all those quantities being nonnegative, we can write

## Eq. 1

$$\begin{array}{ccc}\mathbf{m}={a}_{1}{\mathbf{s}}_{1}+{a}_{2}{\mathbf{s}}_{2}=& \left({a}_{1}\phantom{\rule{1em}{0ex}}{a}_{2}\right)& \left(\begin{array}{ccc}{s}_{1,1}& \dots & {s}_{1,{N}_{\lambda}}\\ {s}_{2,1}& \dots & {s}_{2,{N}_{\lambda}}\end{array}\right)=\mathbf{a}\mathsf{S}\\ (1\times {N}_{\lambda})& (1\times 2)& (2\times {N}_{\lambda})\end{array}.$$It can be easily generalized to a mixing model with $P$ distinct sources present in the medium. The fluorescence spectrum $\mathbf{m}$ is now a linear combination of the $P$ sources spectra ${s}_{i},i\u220a(1,P)$ , in quantities ${a}_{i},i\u220a(1,P)$ . Vector $\mathbf{a}=({a}_{1},\dots ,{a}_{P})$ is considered as the weight vector, and $S$ as the spectra matrix, both containing as much elements $P$ as fluorescent sources to separate. The following equation depicts the decomposition obtained for $P$ sources:

We now finally enlarge the example to a set $M$ of ${N}_{s}$ spectra acquired on a ${N}_{\lambda}$ wavelength range, for $P$ sources present in the medium. We have thus:

This reasoning led to a matrix factorization, that only deals with nonnegative components, which brings us close to the NMF definition. NMF proposes to find a couple of matrices
$(\mathsf{A},\mathsf{S})$
with nonnegatives coefficients, whose product approaches the best possible the initial non-negative data
$\mathsf{M}$
. Indeed, the classical NMF definition says^{12}

Given a nonnegative matrix $\mathsf{M}\u220a{\mathbb{R}}^{{N}_{s}\times {N}_{\lambda}}$ , find nonnegative matrix $\mathsf{A}\u220a{\mathbb{R}}^{{N}_{s}\times P}$ and $\mathsf{S}\u220a{\mathbb{R}}^{P\times {N}_{\lambda}}$ such that

where nonnegative matrices are matrices whose all factors are nonnegative.To find the particular matrices
$\mathsf{A}$
and
$\mathsf{S}$
that satisfy Eq. 4, two distinct steps must be considered. First, a distance between
$\mathsf{M}$
and model
$\mathsf{A}\mathsf{S}$
is defined. Then, in a second step, optimizing this distance under the nonnegativity constraint would lead to the a factorization solution. Several “distances” (the Eunclidean distance, the Kullback-Leibler divergence,^{10, 12} etc.) and different optimization methods (ALS, update rules) may be chosen to obtain the NMF decomposition.

## 2.2.

### NMF Algorithm

Many NMF algorithms have been proposed since Paatero and Tapper,^{10} but in 2001, NMF popularity increased after Lee and Seung published two original NMF algorithms,^{12} based on the use of multiplicative update rules to minimize square of the euclidean distance and the Kullback-Leibler divergance between and
$\mathsf{M}$
and
$\mathsf{A}\mathsf{S}$
.

## 2.2.1.

#### Cost function definition

Here, we chose our cost function $F$ as the square of the Euclidean distance between $\mathsf{M}$ and $\mathsf{A}\mathsf{S}$ (Refs. 12, 21), and defined as

## Eq. 5

$$F=\sum _{n=1}^{{N}_{s}}\sum _{\lambda =1}^{{N}_{\lambda}}{({m}_{s\lambda}-\sum _{p=1}^{P}{a}_{sp}{s}_{p\lambda})}^{2}={\Vert \mathsf{M}-\mathsf{A}\mathsf{S}\Vert}^{2}.$$## 2.2.2.

#### Optimization

The following optimization problem is thus considered:

### Problem 1

Find couple $(\mathsf{A},\mathsf{S})$ such as $(\mathsf{A},\mathsf{S})={\mathrm{argmin}}_{(\mathsf{A},\mathsf{S})\u2a7e0}{\Vert \mathsf{M}-\mathsf{A}\mathsf{S}\Vert}_{2}^{2}$ .

Many methods can be implemented to solve this problem. The gradient descent is probably the simplest, and most famous one, but it is also known for a slow convergence. Other faster methods, such as the conjugate gradient, are usually more complicated to implement.^{22} ALS is also commonly used in such problems. In 2001, Lee and Seung^{12} proposed multiplicative update rules to minimize
$F$
: it offers a good compromise between speed and ease of implementation to solve Problem.

### Theorem 1

The distance ${\Vert \mathsf{M}-\mathsf{A}\mathsf{S}\Vert}^{2}$ is nonincreasing under the update rules:

## Eq. 6

$${\mathsf{S}}_{p\lambda}\leftarrow {\mathsf{S}}_{p\lambda}\frac{{\left({\mathsf{A}}^{t}\mathsf{M}\right)}_{p\lambda}}{{\left({\mathsf{A}}^{t}\mathsf{A}\mathsf{S}\right)}_{p\lambda}}\phantom{\rule{1em}{0ex}}{\mathsf{A}}_{np}\leftarrow {\mathsf{A}}_{np}\frac{{\left(\mathsf{M}{\mathsf{S}}^{t}\right)}_{np}}{{\left(\mathsf{A}\mathsf{S}{\mathsf{S}}^{t}\right)}_{np}},$$^{12}

We became interested in these update rules precisely because of their ease of implementation and speed, for which they were initially created. Finally, they are easily convertible if some regularization is required. We thus defined original regularized update rules adapted to our fluorescence data which take into account *a priori* knowledge on the fluorescence spectra.

## 2.3.

### Nonuniqueness of the NMF Decomposition

The chosen cost function $F$ is not jointly convex in matrices $\mathsf{A}$ and $\mathsf{S}$ : there are numerous local minima to the function and nonuniqueness of the NMF factorization. Let us assume a factorization of $\mathsf{M}$ by the product of some matrices $\mathsf{A}$ and $\mathsf{S}$ exists. If we now consider any invertible matrix $\mathsf{T}$ of size $P\times P$ , then a new couple $(\mathsf{A},\mathsf{S})$ of solutions is easily found:

Without any constraint and *a priori* information brought to the optimization step, there is an infinity of factorizations of matrix
$\mathsf{M}$
. The solution range may nevertheless be imposing regularization to the algorithm, as for example non-negativity imposed to
$\mathsf{A}$
and
$\mathsf{S}$
coefficients.^{23}

Techniques to moderate ambiguity of factorization are required. Adding constraints to original cost function
$F$
and incorporating *a priori* information restrains the solutions set, and may be sufficient to solve the NMF problem uniquely.^{24} In next section, we propose to study breast simulations of fluorescence detection and autofluorescence removal by NMF, before to test our algorithm on *in vivo* data in last part of this paper. Studies will focus on initialization problem and use of the *a priori* on fluorescence sources.

## 3.

## Simulation Studies

We chose to run simulations based on breast data: many clinical experiments on this organ enabled us to design a computer breast model with realistic optical properties of tissues. A fake marked tumor is introduced to the model, and consistent modeling fluorescence acquisitions of the simulated breast are obtained, with modified depth of the marked tumor.

## 3.1.

### Definition of Simulated Matrices $\mathsf{A}$ and $\mathsf{S}$

The breast fluorescence acquisition model is composed of an homogeneous autofluorescence distribution and of a specific fluorescence contribution (to mimic the tumor pointed out by an injected fluorescent marker). Signal ratio between healthy tissue and tagged tumor depends on biomarkers. From bibliography, and from experience, ratios from 3 to 15 (Ref. 25) (for more specific-to-tumor markers) are usual, while new generation of activatable fluorescent markers reach ratios from 24 to 180 (Refs. 26, 27, 28, 29), depending on wavelength range, and conditions of experimentation (*ex vivo* or *in vivo*, and site of tumor). We chose for our simulation study a ratio between tumor and healthy tissue approximately equal to 10 when tumor is
$1\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$
deep in tissues.

The specific fluorescence part is the product of a weight vector
${\mathbf{A}}_{1}$
by a fluorescence spectrum
${\mathbf{S}}_{1}$
[see Fig. 1
]. *A priori*, the autofluorescence part is the product of the weight vector
${\mathbf{A}}_{2}$
by a fluorescence spectrum
${\mathbf{S}}_{2}$
[see Fig. 1]. Simulated spectra are Gaussian models chosen close to usually used and observed fluorescence spectra of autofluorescence and fluorescent markers (indocyanine green, for example). Finally the total simulated acquisition is obtained by adding the specific fluorescence and the autofluorescence parts [Fig. 1].

## 3.2.

### Contrast

We introduce straight away the contrast ${c}_{T,N}$ , measured between tumorous area $T$ and normal tissues area $N$ : it characterizes the improvement of tumor detection after autofluorescence removal, on simulation and experimental results. Average intensity of fluorescence signal is measured on both concerned regions of interest (ROIs); $\overline{T}$ and $\overline{N}$ are, respectively, the average intensities in photons per pixel of areas $T$ and $N$ defined in Fig. 1:

The closer to one the contrast value gets, the better will be the detection.## 3.3.

### Optical Parameters

Clinical studies allowed us to have measurement of breast tissues optical parameters at our disposal. In 2001, Cerussi conducted clinical measurements and obtained average reduced scattering and absorption coefficients of breast, depending on wavelength.^{30} From those results, we defined average simulation values of absorption coefficient
${\mu}_{a}$
(in inverse centimeters) and reduced scattering coefficient
${\mu}_{s}^{\prime}$
(in inverse centimeters) on wavelength range
$700\phantom{\rule{0.3em}{0ex}}\text{to}\phantom{\rule{0.3em}{0ex}}1000\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
(see Fig. 2
).

## 3.4.

### Light Propagation

Propagation of light in turbid media has been extensively discussed. To simulate decrease of intensity emitted by fluorescent markers embedded in diffusive tissues, and to estimate evolution of contrast between specific fluorescence and autofluorescence depending on depth of fluorescent markers in tissues, classical diffusion approximation is considered. Thus, for homogeneous medium and continuous illumination, the photon density $\varphi $ (in $\mathrm{W}\phantom{\rule{0.2em}{0ex}}{\mathrm{m}}^{-2}$ ) satisfies the following derivative equation:

## Eq. 9

$$\Delta \varphi \left(r\right)-k{\left(\lambda \right)}^{2}\varphi \left(r\right)=-\frac{S\left(r\right)}{D\left(\lambda \right)}$$Wavelength-dependent absorption and reduced scattering coefficients of breast tissue are responsible for modification of markers emission spectrum. After fluorescent markers are excited, the emitted photons propagate back in tissues following Eq. 9 to finally reach detectors, as depicted in Fig. 3
. Thus, the emission spectrum of detected fluorescent markers
${S}_{d}\left(\lambda \right)$
varies from *ex vivo* fluorescence spectrum
$S\left(\lambda \right)$
:

## Eq. 11

$${S}_{d}\left(\lambda \right)\propto S\left(\lambda \right)\frac{\mathrm{exp}[-k\left(\lambda \right){r}_{\mathrm{md}}]}{4\pi D\left(\lambda \right){r}_{\mathrm{md}}},$$We can now simulate detected fluorescence signal, depending on depth $r$ of markers in breast tissues. Decreasing intensity and spectral distortions resulting from the emitted markers light travel in tissues are shown in Fig. 4 for markers moved from the surface to $10\phantom{\rule{0.3em}{0ex}}\mathrm{cm}$ deep in breast tissues.

## 3.5.

### Source Number Determinacy

When running NMF on our spectroscopic data, we assume that the number of sources we are looking for is equal to the number of the different specific markers injected, plus one for the autofluorescence contribution. Distortion of fluorescence spectra could lead to an increase in the number of sources to unmix; in a case where a same marker is present at different depths in a medium, fluorescence spectrum emitted by the deepest markers appear distorted compared to the less deeply embedded markers.

When the number of sources to unmix is not empirically chosen, a method to define it is to compute the SVD of initial data $\mathsf{M}$ :

where matrix $\mathsf{U}$ contains spatial information of fluorescence sources, matrix $\mathsf{V}$ contains spectral information, and $\Sigma $ gives the ordered singular values. We thus assume our data can be expressed as a separable set of orthonormal spatial and wavelength components. By looking at the singular values, we can define number of sources in the mixed data by selecting the first nonnegligible singular values.We ran the NMF algorithm on our simulated breast data, and give the SVD of data for each depth of markers tested, from $0.1\phantom{\rule{0.3em}{0ex}}\text{to}\phantom{\rule{0.3em}{0ex}}10\phantom{\rule{0.3em}{0ex}}\mathrm{cm}$ . The results are presented in Fig. 5 . While spectrum distortion should lead us to consider a third source in our unmixing model, the exponential loss of marker signal intensity does actually not allow time for spectra to become distorted, and SVD only finds two sources.

Since spectrum distortion is insignificant in front of intensity loss of deep embedded markers, looking for an average spectrum
$S$
for a same family of markers, but at different depths, is sufficient in diffusive optical imaging. In last section, an *in vivo* unmixing example will confirm that result.

## 4.

##
*A Priori* Information and Regularization

*A Priori*

Many hybrid NMF algorithms, most often dealing with sparsity and smoothness constraints, were developed in the last
$5\phantom{\rule{0.3em}{0ex}}\mathrm{y}$
, most of them trying to directly adapt from Lee and Seung’s multiplicative update rules.^{31, 32} Kim and Park^{33} became interested in sparse NMFs by
${L}_{1}$
-norm constraint term minimization; Cichocki
^{34} presented cost functions no longer based on the Kullback-Leibler divergence but on Csiszár’s
$\phi $
-divergence; while other approaches use alternative cost functions formulations.^{35, 36} We propose in this section to study the effect of initialization of matrices
$\mathsf{A}$
and
$\mathsf{S}$
on NMF solutions, and we present regularized NMF update rules, adapted to spectral imaging, that deal with the *a priori* knowledge on the fluorescence spectra considered.

## 4.1.

### Influence of Initialization on NMF Decomposition

Choice of initialization is once more fundamental and NMF decomposition directly depends on the initial guess on matrices $\mathsf{A}$ and $\mathsf{S}$ . In that part, we study the influence of initialization on our simulated example. Gaussian spectra similar to simulation spectra of matrix $\mathsf{S}$ are chosen to initialize the NMF algorithm. We observe the influence of wavelength translation of initialization spectra on the NMF decomposition. For this specific example, initialization spectra for matrix ${\mathsf{S}}_{0}$ are translated on a range of $100\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ , on both sides of simulation spectra, as depicted Fig. 6 .

To underline the dependence of NMF result to initialization, three cases are presented. For each case, resulting contrast for different depths of fluorescent markers in tissue (from $0.1\phantom{\rule{0.3em}{0ex}}\text{to}\phantom{\rule{0.3em}{0ex}}4\phantom{\rule{0.3em}{0ex}}\mathrm{cm}$ ) is obtained.

First we observe healthy tissue/tumor contrast on raw data, without any unmixing processing: contrast and detection are naturally decreasing with depth [see Fig. 7 ]. Then NMF processing is applied on data but with random initialization (random nonnegative values for matrices $\mathsf{A}$ and $\mathsf{S}$ , 30 draws per depth): unmixing processing improves detection [see Fig. 7]. Finally, NMF algorithm with this time Gaussian initialization for $\mathsf{S}$ (Gaussian models are translated in a $100\text{-}\mathrm{nm}$ range) is tested: even with less appropriate initialization (in that case, when both simulation spectra are translated $50\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ up that simulation models for initialization), contrast is improved compared to both prior cases [see Fig. 7].

NMF processing without any *a priori* information still improves resulting contrast compared to results on raw data. Then, Gaussian initialization refines results.

But the initialization guess is not obvious. When the initialization spectrum is translated $50\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ from the expected spectra, this initialization leads to better result than if initialization is equal to simulated spectra. One explanation may be that we move away from the crossing area where fluorescence spectra overlap and where indeterminacy between autofluorescence and fluorescence of markers is high. Moreover, the obtained contrast is not symmetric on both sides of the 0-shifted spectra: the fluorescence of markers is spatially restraint compared to the autofluorescence background (see Fig. 1), and moving away to an initialization zone where the fluorescence marker spectrum does not clash with the autofluorescence spectrum emission is advantageous.

From that example, we show initialization choice is very sensitive, but necessary to improve marker detection. Even if an approximate initialization already improves contrast between tumor area and healthy tissue compared to the algorithm with random initialization (see Fig. 7), a more accurate initialization selection may push back the detection limits.

Such selection can be obtained with a multistart initialization step prior to the NMF algorithm we proposed earlier.

## 4.2.

### New Multiplicative Update Rules

As explained in previous section, *a priori* knowledge of fluorescence sources enables refining the range of solutions. By choosing appropriate initialization, detection of marked tumors can be improved. Another way to restrain the solutions set is to constrain the initial cost function
$F$
; we propose in this section to lightly modify the cost function, and find a new regularized algorithm to minimize updated cost function.

## 4.3.

###
*A Priori* Knowledge on the Fluorescence Sources

In the optical spectroscopy context, injected markers are known and could thus ease the unmixing problem. But whether it refers to the specific markers, or to the autofluorescence of biological tissues, we can actually not define a precise model of the fluorescence spectra. Spectra of the specific markers may vary from the *ex vivo* known spectra once injected *in vivo* and illuminated. In the *in vivo* medium, the markers may create new products that are able, in turn, to emit unknown fluorescence signals. Fluorescence spectra may also vary with the pH values in the medium, and their intensity may decline with time due to the photobleaching phenomenon. Even if chemical modifications appear on fluorescent molecules under illumination or due to the receiver medium components, we usually observe after that a constant emission spectrum (except in the case of distortion of spectra due to depth of tissue, explained later). Finally, the optical parameters of the biological tissues—diffusion and absorption—cause emissions fluorescence spectra to vary with the depth of the fluorescent source. This last problem was treated in simulation and presented in previous section. Regarding the autofluorescence of tissues, as for the specific markers, the illumination may be responsible for the creation of new products. Furthermore, autofluorescence spectrum may vary according to the pH of analyzed area,^{37} or according to the patient.^{38, 39, 40} But once measured, the *in vivo* autofluorescence spectrum usually not differs across the organism being observed. An example of mouse autofluorescence acquisition is given Fig. 8
, where normalized spectra remain constant across the mouse body. We assume this property should also be true for specific human area observed (prostate, breast, etc.).

For slightly different autofluorescence emission spectra, the average spectrum is sufficient for NMF decomposition (a similar case is that for distorted fluorescent markers with depth).

Even if fluorescence spectra may not be initially perfectly defined, we still have some *a priori* information concerning them, from *ex vivo* and empirical measurements. The emission wavelength range, and the shape of the expected spectra may be globally known and used to refine solutions set.

## 4.4.

### NMF Initialization Step

The initialization step uses *a priori* models to guide the solutions, but lets the algorithm, thanks to its blind specificity, to adapt solutions depending on the original experimental data. We would like the algorithm to take into account that piece of information: we thus define a new cost function to minimize, that directly depends on
${\mathsf{S}}_{0}$
.

## 4.5.

### New Cost Function and Regularized Update Rules

The new cost function ${F}_{2}$ is now defined as the sum of the square of the Euclidean distance between $\mathsf{M}$ and $\mathsf{A}\times \mathsf{S}$ plus a regularization term on the initialization ${\mathsf{S}}_{0}$ , weighted by a regularization vector $\mathbf{\alpha}$ of length $P$ :

## Eq. 13

$${F}_{2}=\sum _{n=1}^{{N}_{s}}\sum _{\lambda =1}^{{N}_{\lambda}}{({m}_{n\lambda}-\sum _{p=1}^{P}{a}_{\mathrm{np}}{s}_{p\lambda})}^{2}+\sum _{p=1}^{P}{\mathbf{\alpha}}_{p}\times \sum _{\lambda =1}^{{N}_{\lambda}}{({s}_{p\lambda}-{s}_{{0}_{p\lambda}})}^{2},$$## Eq. 14

$${F}_{2}={(\mathsf{M}-\mathsf{A}\mathsf{S})}^{t}(\mathsf{M}-\mathsf{A}\mathsf{S})+{(\mathsf{S}-{\mathsf{S}}_{0})}^{t}{\mathsf{D}}_{\mathbf{\alpha}}(\mathsf{S}-{\mathsf{S}}_{0}),$$The regularization term lets $\mathsf{S}$ vary from ${\mathsf{S}}_{0}$ , according to the confidence we have in the initialization. Then ${F}_{2}$ is minimized by alternatively updating matrices $\mathsf{A}$ and $\mathsf{S}$ , with respect to $(\mathsf{A},\mathsf{S})\u2a7e0$ . We propose to use original multiplicative regularized update rules, defined as follows.

### Theorem 2

The function ${F}_{2}$ is nonincreasing under the update rules:

## Eq. 15

$${\mathsf{S}}_{p\lambda}\leftarrow {\mathsf{S}}_{p\lambda}\frac{{({\mathsf{A}}^{t}\mathsf{M}+{\mathsf{D}}_{\mathbf{\alpha}}{\mathsf{S}}_{0})}_{p\lambda}}{{({\mathsf{A}}^{t}\mathsf{A}\mathsf{S}+{\mathsf{D}}_{\mathbf{\alpha}}\mathsf{S})}_{p\lambda}}\phantom{\rule{1em}{0ex}}{\mathsf{A}}_{\mathrm{np}}\leftarrow {\mathsf{A}}_{\mathrm{np}}\frac{{\left(\mathsf{M}{\mathsf{S}}^{t}\right)}_{\mathrm{np}}}{{\left(\mathsf{A}\mathsf{S}{\mathsf{S}}^{t}\right)}_{\mathrm{np}}},$$The value of
$\mathbf{\alpha}$
must be set according to the level of confidence in the *a priori* information on the initial spectra. We can also use different degree of confidence on the different spectra by using a vector
$\mathbf{\alpha}$
.

We prove Theorem 2 in the appendix{ label needed for app[@id='x0'] }, by using an auxiliary function as in Refs. 11, 12, and conserving the maximum of the mathematical notations of Lee and Seung in Ref. 12. To illustrate the efficiency of NMF on spectroscopic data, we propose now an example on *in vivo* experimental data, where up to three different fluorescence sources need to be unmixed.

## 5.

## Spectral Unmixing on Experimental Data

For *in vivo* experiments, an autofluorescence signal is necessarily measured. Then several specific markers may be used to simulate marked targets, such as tumors. In this section, we test NMF to unmix three overlapping different fluorescence sources, including the autofluorescence on mice.

To acquire spectrally resolved measurements, the animal was illuminated with a planar laser at $690\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ . The emitted back fluorescence signal was collected along a line of ${N}_{x}$ points by a spectrometer coupled with a charge-coupled device camera (Andor Technologies): a ${N}_{x}\times {N}_{\lambda}$ acquisition was measured (see Fig. 9 ). For this experiment, ${N}_{x}$ was equal to 250 and ${N}_{\lambda}$ to 1024, which corresponds to a wavelength range around $600\phantom{\rule{0.3em}{0ex}}\text{to}\phantom{\rule{0.3em}{0ex}}975\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ . A translation stage, covering ${N}_{y}$ steps, was then used to get a scanning of the whole animal: ${N}_{y}$ acquisitions were obtained, each of size ${N}_{x}\times {N}_{\lambda}$ . Before to run the NMF algorithm, mixed data of size $({N}_{y}\times {N}_{x}\times {N}_{\lambda})$ were reordered as a 2-D array of size $({N}_{y}\times {N}_{x},{N}_{\lambda})$ ( ${N}_{s}$ is defined in NMF equation is now equal to ${N}_{y}\times {N}_{x}$ ). Separately, a black and white picture of the animal may be taken, to superimpose fluorescence signal and environment image if necessary.

For each *in vivo* experiment, a precise protocol is defined. All protocols have in common the following cares. The animal procedure is in compliance with the guidelines of the European Union (regulation No. 86/609), taken in the French law (decree 87/848) regulating animal experimentation. All efforts are made to minimize animal suffering. The animal manipulation is performed with sterile techniques and approved by the Grenoble Animal Care and Use Committee (France) (registration number 20_iRTSV Léti-FNG-02). An adult female nude mouse (Janvier, Le Genest saint-isle, France) are used throughout the experiments. They are housed in approved facilities, at
$21\pm 1\phantom{\rule{0.2em}{0ex}}\xb0\mathrm{C}$
under diurnal lighting conditions. The mice arrive at the animal facility
$2\phantom{\rule{0.3em}{0ex}}\text{weeks}$
before the experiments start and had free access to food and water.

## 5.1.

### Multimarker Experiment

## 5.1.1.

#### Presentation

We present a first feasibility experiment on a mouse. Two glass capillary tubes respectively filled with
$5\phantom{\rule{0.3em}{0ex}}\mu \mathrm{l}$
of indocyanine green loaded into lipid nanoparticules^{41} (ICG-LNP) at
$0.35\phantom{\rule{0.3em}{0ex}}\mu \mathrm{mol}\u2215\mathrm{l}$
and
$5\phantom{\rule{0.3em}{0ex}}\mu \mathrm{l}$
of Alexa 750 at
$0.1\phantom{\rule{0.3em}{0ex}}\mu \mathrm{mol}\u2215\mathrm{l}$
were inserted subcutaneously to simulate marked targets [see Fig. 10
]. Three distinct fluorescent sources—autofluorescence, ICG-LNP, and Alexa 750—whose emission spectra are overlapping had to be unmixed. To draw a parallel between acquisitions with or without specific fluorescence, a first acquisition of the animal was run as a reference, without any specific markers inserted.

In this precise example, we chose $\alpha ={10}^{10}$ to constraint the autofluorescence spectrum to remain close to initialization and have a plausible profile, and $\alpha =0$ for ICG-LNP, and Alexa 750 spectra. The regularization term helped to smooth over aberrations due to the two components crosstalk on resulting spectra. We chose initialization from empirical results, but could not use multistart initialization: regularization on initialization asks for coherent initialization spectra, while multistart initialization may use translated spectra unconnected to real expected fluorescence spectra. The choice has to be made between different regularization methods, depending on application.

## 5.1.2.

#### Results

Figure 10 shows the reference acquisition of the mouse without any capillary: the autofluorescence is the only fluorescence signal measured. Autofluorescent areas may be attributed to some specific organs, known to emit natural fluorescence, such as the stomach, the liver, the intestine, or the kidneys of the animal, if we compare the autofluorescence scanning to the anatomy scheme of a mouse [Fig. 10]. In the same figure, we present the resulting scanning of the mouse with the two capillary tubes of ICG-LNP and Alexa 750 [Fig. 10], on which the regularized NMF algorithm is run.

Figure 11 presents the overlapping spectra that NMF successfully unmixed. The three normalized spectra of matrix $\mathsf{S}$ presented in Fig. 11 are weighted by coefficients of matrix $\mathsf{A}$ [Fig. 11]. The three resulting contributions of fluorescence are presented in Fig. 12 : the autofluorescence [Fig. 12], the ICG-LNP [Fig. 12], and the Alexa 750 [Fig. 12]. Three sources unmixing allowed improving detection for each marker, Alexa 750 and ICG LNP. Indeed, initial contrast values (measured on mixed data, cf. Fig. 10 between markers area and normal tissue area were equal to 0.2123 for ICG-LNP and 0.3631 for Alexa 750). After unmixing, contrast values respectively reached 0.6366 (ICG-LNP) and 0.6763 (Alexa 750) (see Figs. 12 and 12].

To conclude concerning this feasibility experiment, a last comparison is made, between the original autofluorescence scanning (without any capillary tube inserted to the animal) and the separated autofluorescence results obtained for both past experiments (see Fig. 13 ). First, a consistent intensity level is obtained after the NMF decomposition. Indeed, the intensity obtained after NMF decomposition for the autofluorescence part [Fig. 13] is close to the intensity observed on the reference autofluorescence acquisition [Fig. 13]. Finally, the principal autofluorescent area observed on the autofluorescence scanning [Fig. 13] are also present on autofluorescence contribution obtained after the NMF decomposition [Fig. 13].

## 5.2.

### Multidepth Experiment

In a second experiment, we proposed to test the NMF algorithm robustness on a precise case of the same fluorescent marker at two different depths in mouse tissues. In such a case, the deeper the markers are embedded in tissues, the more their fluorescence spectra are distorted and their intensity is exponentially decreased.

Two glass capillary tubes both filled with $5\phantom{\rule{0.3em}{0ex}}\mu \mathrm{l}$ of ICG-LNP (Ref. 41) at $5\phantom{\rule{0.3em}{0ex}}\mu \mathrm{mol}\u2215\mathrm{l}$ were inserted to simulate marked targets. The first tube was inserted subcutaneously (around $1\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ deep), while the second one was placed in the rectum of the animal (around $6\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ deep from the surface), as depicted in Fig. 14 . The same experimental setup was used to scan the mouse (see Fig. 9), and the obtained data are presented Fig. 14. The $6\text{-}\mathrm{mm}$ -deep marker signal was mixed with the autofluorescence signal and masked by the $1\text{-}\mathrm{mm}$ -deep intense marker signal, and the $5\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$ -deep-difference was responsible for light distortion (emission peak translation) and loss of intensity between both emitted spectra, measured in the rectum or subcutaneously [see Fig. 14].

## 5.2.1.

#### Results

As for the previous experiment, we ran the NMF algorithm on the mixed data to obtain separated fluorescence contributions of autofluorescence and the specific ICG-LNP fluorescence. The results are presented Fig. 15 , with a unique average spectrum obtained in matrix $\mathsf{S}$ after NMF decomposition for the ICG-LNP despite the slightly different spectra emitted from the subcutaneous and rectum capillaries (see Fig. 15, ICG-LNP dotted line), and accurate unmixing results were obtained [see Figs. 15 and 15]. Deep markers that were originally mixed with autofluorescence signal are now easily detectable. Indeed, after autofluorescence removal, detection of $6\text{-}\mathrm{mm}$ -deep markers was considerably improved. The initial contrast value between the ICG-LNP rectum area and the normal tissue area was equal to 0.36 before unmixing [see Fig. 14], and reached 0.77 once the fluorescence contributions were separated [see Fig. 15]. Moreover, in the obtained ICG-LNP contribution, all the autofluorescence critical zones (the stomach or intestine, for example) were quenched.

## 6.

## Conclusions

We presented a blind positive source separation method applied to *in vivo* spectrally resolved data. Beyond the specific fluorescence signal of specific markers used in optical imaging, the autofluorescence of biological tissues was also detected in the wavelength range we used, and must be removed to achieve accurate detection results. This property increased with the depth of biological tissues explored, since the specific signal decreases exponentially while the autofluorescence signal remains constant. To remove autofluorescence and unmix different fluorescent markers whose fluorescence spectra are not perfectly known, a blind source separation method, the NMF, was chosen. By notably taking into account the nonnegativity of the data, it is particularly suitable for fluorescence imaging data. We briefly presented this classical method, from the choice of the cost function and the optimization step that minimizes it and leads to a nonunique NMF decomposition. Indeed, when using blind source separation, we estimated fluorescence sources without knowing the mixing process; without some *a priori* knowledge, it is not possible to uniquely estimate the sources. We presented studies of simulated breast data that underline interest of *a priori* information for initialization choice and additional constraints that may be applied to cost functions. An original regularized NMF algorithm that takes into account some prior knowledge about the fluorescence spectra was proposed. We did not discuss a method to chose the value of the regularization parameter, which is currently empirically chosen. Finally, our theory was successfully validated with *in vivo* experiments on mice. The aim of those experiments was to remove the autofluorescence contribution from the experimental data, and to unmix upto two specific markers, or the same fluorescent marker at different depths in tissues. NMF computed satisfactory results and enabled us to localize specific marker contributions initially lost in the autofluorescence signal.

As optical imaging tries to detect deeper and deeper embedded targets, NMF is a useful preprocessing step to remove unwanted autofluorescence and unmix different spectra of interest. By returning separated fluorescence contribution data, the method presents the possibility to perform accurate tomographic reconstructions and thus to confirm the 3-D position of marked tumors.

## Appendices

## { label needed for app[@id='x0'] }

#### Appendix: Theorem 1: Proofs of Convergence

In this section, we propose a proof of convergence of Theorem 2. Note that a proof for convergence of the update rule for
$\mathsf{A}$
was already given by Lee and Seung.^{12} We introduce the following definition:

### Definition 1

$G(s,{s}^{i})$ is an auxiliary function for ${F}_{2}\left(s\right)$ if the following conditions are satisfied:

## Eq. 16

$$G(s,{s}^{i})\u2a7e{F}_{2}\left(s\right)\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}G(s,s)={F}_{2}\left(s\right).$$^{12}

### Lemma 1

If $G$ is an auxiliary function for ${F}_{2}$ , then ${F}_{2}$ is nonincreasing under the update:

Lemma 1 is illustrated in Fig. 16 .By defining an appropriate auxiliary function $G$ for ${F}_{2}$ , as defined earlier, the update rules presented in Theorem 1 simply follow from Lemma 1 [Eq. 17]. The auxiliary function $G$ is presented in the following lemma.

### Lemma 2

If $\mathsf{K}\left({s}^{i}\right)$ is the diagonal matrix

## Eq. 18

$${\mathsf{K}}_{ab}\left({s}^{i}\right)=\frac{{\delta}_{ab}{({\mathsf{A}}^{t}\mathsf{A}{s}^{i}+\mathbf{\alpha}\times {s}^{i})}_{a}}{{s}_{a}^{i}},$$### Proof of lemma 2

$G(s,{s}^{i})$ is an auxiliary function for ${F}_{2}\left(s\right)$ if the conditions defined in Eq. 16 are verified. If $G(s,s)={F}_{2}\left(s\right)$ is obvious, the second condition $G(s,{s}^{i})\u2a7e{F}_{2}\left(s\right)$ must be demonstrated.

Let $\nabla {F}_{2}$ be the gradient of ${F}_{2}$ and $\mathsf{Hess}\left({F}_{2}\right)$ the Hessian matrix. We obtain

## Eq. 21

$${F}_{2}\left(s\right)={F}_{2}\left({s}^{i}\right)+\Delta {s}^{t}\nabla {F}_{2}\left({s}^{i}\right)+\frac{1}{2}\Delta {s}^{t}\mathrm{Hess}\left({F}_{2}\right)\Delta s,$$## Eq. 22

$${F}_{2}\left({s}^{i}\right)=\frac{1}{2}{\Vert m-\mathsf{A}{s}^{i}\Vert}^{2}+\frac{\alpha}{2}{\Vert {s}^{i}-{s}_{0}^{i}\Vert}^{2}=\frac{1}{2}{(m-\mathsf{A}{s}^{i})}^{t}(m-\mathsf{A}{s}^{i})+\frac{\alpha}{2}{({s}^{i}-{s}_{0}^{i})}^{t}({s}^{i}-{s}_{0}^{i}),$$## Eq. 23

$$\nabla {F}_{2}\left({s}^{i}\right)={\mathsf{A}}^{t}(\mathsf{A}{s}^{i}-m)+\alpha ({s}^{i}-{s}_{0}^{i}),$$Thus,

## Eq. 25

$$G(s,{s}^{i})-{F}_{2}\left(s\right)={F}_{2}\left({s}^{i}\right)+\Delta {s}^{t}\nabla {F}_{2}\left({s}^{i}\right)+\frac{1}{2}\Delta {s}^{t}\mathsf{K}\left({s}^{i}\right)\Delta s-{F}_{2}\left({s}^{i}\right)-\Delta {s}^{t}\nabla {F}_{2}\left({s}^{i}\right)-\frac{1}{2}\Delta {s}^{t}\mathrm{Hess}\left({F}_{2}\right)\Delta s=\frac{1}{2}\Delta {s}^{t}[\mathsf{K}\left({s}^{i}\right)-\mathrm{Hess}\left({F}_{2}\right)]\Delta s=\frac{1}{2}\Delta {s}^{t}(\mathsf{K}\left({s}^{i}\right)-{\mathsf{A}}^{t}\mathsf{A}-\alpha \mathsf{I})\Delta s.$$Hence, the following equivalence is given

## Eq. 26

$$G(s,{s}^{i})\u2a7e{F}_{2}\left(s\right)\iff \Delta {s}^{t}[\mathsf{K}\left({s}^{i}\right)-{\mathsf{A}}^{t}\mathsf{A}-\alpha \mathsf{I}]\Delta s\u2a7e0.$$We can now demonstrate that $\Delta {s}^{t}[\mathsf{K}\left({s}^{i}\right)-{\mathsf{A}}^{t}\mathsf{A}-\alpha \mathsf{I}]\Delta s$ is positive:

## Eq. 27

$$\Delta {s}^{t}[\mathsf{K}\left({s}^{i}\right)-{\mathsf{A}}^{t}\mathsf{A}-\mathbf{\alpha}\mathsf{I}]\Delta s=\sum _{ab}\Delta {s}_{a}{[\mathsf{K}\left(s\right)-{\mathsf{A}}^{t}\mathsf{A}-\alpha \mathsf{I}]}_{ab}\Delta {s}_{b}=\sum _{ab}\Delta {s}_{a}\left[\frac{{\delta}_{ab}{({\mathsf{A}}^{t}\mathsf{A}{s}^{i}+\alpha {s}^{i})}_{a}}{{s}_{a}^{i}}\right]\Delta {s}_{b}-\sum _{ab}\Delta {s}_{a}\left[{({\mathsf{A}}^{t}\mathsf{A}+\alpha \mathsf{I})}_{ab}\right]\Delta {s}_{b}=\sum _{ab}\Delta {s}_{a}\left\{\frac{{\delta}_{ab}[\sum _{c}\sum _{d}\left({\mathsf{A}}_{da}{\mathsf{A}}_{dc}{s}_{c}^{i}\right)+\alpha {s}_{a}^{i}]}{{s}_{a}^{i}}\right\}\Delta {s}_{b}-\sum _{ab}\Delta {s}_{a}(\sum _{d}{\mathsf{A}}_{da}{\mathsf{A}}_{db}-\alpha {\delta}_{ab})\Delta {s}_{b}=\sum _{a,c,d}{\mathsf{A}}_{da}{\mathsf{A}}_{dc}(\Delta {s}_{a}\Delta {s}_{a}\frac{{s}_{c}^{i}}{{s}_{a}^{i}}-\Delta {s}_{a}\Delta {s}_{c})+\sum _{a}\Delta {s}_{a}(\frac{\alpha \times {s}_{a}^{i}}{{s}_{a}^{i}}-\alpha )\Delta {s}_{a}=\frac{1}{2}\sum _{a,c,d}{\mathsf{A}}_{da}{\mathsf{A}}_{dc}{[\Delta {s}_{a}{\left(\frac{{s}_{c}}{{s}_{a}}\right)}^{1\u22152}-\Delta {s}_{c}{\left(\frac{{s}_{a}}{{s}_{c}}\right)}^{1\u22152}]}^{2}\u2a7e0.$$### Proof of theorem 1

Lemma 1 gives ${s}^{i+1}={\mathrm{argmin}}_{s}G(s,{s}^{i})$ .

Since

## Eq. 28

$$\forall a,\frac{\delta G(s,{s}^{i})}{\delta {s}_{a}^{i}}={[\nabla {F}_{2}\left(s\right)]}_{a}+{\left[\mathsf{K}\left(s\right)\Delta s\right]}_{a}=0\Rightarrow \Delta s=-\mathsf{K}{\left(s\right)}^{-1}\nabla {F}_{2}\left(s\right),$$## References

*in vivo*optical imaging and measurement,” (2009). Google Scholar

*In vivo*fluorescence imaging with a multivariate curve resolution spectral unmixing technique,” J. Biomed. Opt., 14 (6), 064011 (2009). https://doi.org/10.1117/1.3258838 1083-3668 Google Scholar

*In vivo*optical imaging of integrin alphaV-beta3 in mice using multivalent or monovalent cRGD targeting vectors,” Molec. Cancer, 41 1 –9 (2007). Google Scholar

*In vivo*imaging of tumors with protease-activated near-infrared fluorescent probes,” Nat. Biotechnol., 17 375 –378 (1999). https://doi.org/10.1038/7933 1087-0156 Google Scholar

*In vivo*noninvasive optical imaging of receptor-mediated RGD internalization using self-quenched Cy5-labeled RAFT-c(-RGDfK-) (4),” Mol. Imaging, 6 43 –55 (2007). 1535-3508 Google Scholar

*in vivo*,” Progr. Biomed. Opt. Imaging, 10 (30), 1 –10 (2009). Google Scholar

*in vivo*fluorescent optical imaging," Journal of Biomedical Optics 15(5), 056009 (1 September 2010). https://doi.org/10.1117/1.3491796