## 1.

## Introduction

Recently, a new kind of feature extraction method, manifold learning algorithms, based on the local geometry analysis have drawn much attention. Unlike PCA, which aims to preserve the global Euclidean structure of the data space, manifold learning algorithms aim to preserve the inherent manifold structure. Locally linear embedding (LLE), Isomap, Laplacian eigenmap, and locality preserving projection (LPP)^{1} are the most important manifold learning algorithms. All of these four algorithms attempt to embed the original data into a submanifold by preserving the local neighborhood structure. They are based on the assumption that the manifold’s intrinsic geometry can be fully determined by the local metric and the neighborhood information. Different from LLE, Isomap, and Laplacian eigenmap, LPP is a linear algorithm. It is quite simple and easy to realize. LPP is originally derived by the discretely approximation of the Laplace Beltrami operator on compact Rimannian manifold. He ^{1} have applied LPP on face recognition problem. The application has shown the effectiveness of LPP in discovering the the intrinsic geometrical structure of the data.

Based on LPP, we propose a novel algorithm, called uncorrelated locality preserving projection (ULPP). It shares the same locality preserving character as LPP, but at the same time it requires the basis vectors to be statistically uncorrelated. Uncorrelated attributes are highly desirable in pattern analysis, because they contain minimum redundancy.^{2} However, the basis vectors of LPP are statistically correlated, then the extracted features contain redundant information. Moreover, the overlapped information can distort the distribution of the features and even dramatically degrade the performance of LPP. In this paper, we overcome this problem by putting an uncorrelated constraint on the computation of the basis vectors. This makes the proposed ULPP algorithm much more robust.

## 2.

## Locality Preserving Projection

LPP^{1} can be explained by spectral graph theory. Let us assume that the
$n$
-dimensional data set
$\{{\mathrm{x}}_{1},{\mathrm{x}}_{2},\cdots ,{\mathrm{x}}_{N}\}$
is distributed on a low-dimensional submanifold. We desire to find a set of
$d$
-dimensional
$(d<n)$
data points
$\{{\mathrm{y}}_{1},{\mathrm{y}}_{2},\cdots ,{\mathrm{y}}_{N}\}$
with the same local neighborhood structure as the data
$\{{\mathrm{x}}_{1},{\mathrm{x}}_{2},\cdots ,{\mathrm{x}}_{N}\}$
. In order to do so, we can construct a weighted graph
$\mathbb{G}=(\mathbb{V},\mathbb{E},W)$
, where
$\mathbb{V}$
is the set of all points;
$\mathbb{E}$
is the set of edges connecting the points; and
$W=\left({w}_{ij}\right)$
is a similarity matrix with weights characterizing the likelihood of two points. The criterion function of LPP is as follows:

## 1

$$\mathrm{min}\phantom{\rule{0.2em}{0ex}}\sum _{ij}{\parallel {\mathbf{y}}_{i}-{\mathbf{y}}_{j}\parallel}^{2}{w}_{ij}.$$To make the mapping (embedding) not defined only for the original data set, LPP defines a transformation. That is, ${\mathrm{y}}_{i}={\Phi}^{T}{\mathrm{x}}_{i}$ , where $\Phi =[{\varphi}_{1},{\varphi}_{2},\cdots ,{\varphi}_{d}]\u220a{\mathcal{R}}^{n\times d}$ . Following some algebraic steps, we see that

## 2

$$\frac{1}{2}\sum _{ij}{\parallel {\mathbf{y}}_{i}-{\mathbf{y}}_{j}\parallel}^{2}{w}_{ij}=\frac{1}{2}\sum _{ij}{({\Phi}^{T}{\mathbf{x}}_{i}-{\Phi}^{T}{\mathbf{x}}_{j})}^{T}({\Phi}^{T}{\mathbf{x}}_{i}-{\Phi}^{T}{\mathbf{x}}_{j}){w}_{ij}=\sum _{ki}{\varphi}_{k}^{T}{\mathbf{x}}_{i}{D}_{ii}{\mathbf{x}}_{i}^{T}{\varphi}_{k}-\sum _{kij}{\varphi}_{k}^{T}\mathrm{Diag}({\mathbf{x}}_{i},\cdots ,{\mathbf{x}}_{i}){w}_{ij}\mathrm{Diag}({\mathbf{x}}_{j}^{T},\cdots ,{\mathbf{x}}_{j}^{T}){\varphi}_{k}=\text{trace}\left({\Phi}^{T}X(D-W){X}^{T}\Phi \right),$$^{1}

In order to remove the arbitrary scaling factor in the embedding, LPP imposes a constraint as follows:

This constraint sets the mapping (embedding) scale and makes the vertices with high similarities to be mapped nearer to the origin. Finally, the minimization problem reduces to:## 4

$$\underset{\begin{array}{c}\Phi \\ {\Phi}^{T}XD{X}^{T}\Phi =I\end{array}}{\mathrm{argmin}}\phantom{\rule{0.3em}{0ex}}\text{trace}\left({\Phi}^{T}XL{X}^{T}\Phi \right).$$The transformation matrix $\Phi $ that minimizes the objective function can be obtained by solving the generalized eigenvalue problem:

Note that the Laplacian matrix can be viewed as the discretely approximation of the Laplace Beltrami operator on compact Rimannian manifold.^{1}

Based on the matrix $\Phi =[{\varphi}_{1},\cdots ,{\varphi}_{d}]$ , for any instance $\mathbf{u}={[{u}_{1},{u}_{2},\cdots ,{u}_{n}]}^{T}\u220a{\mathcal{R}}^{n}$ , we can define the following linear transform from ${\mathcal{R}}^{n}$ to ${\mathcal{R}}^{d}$ :

## 6

$$\left[\begin{array}{c}{v}_{1}\\ {v}_{2}\\ \u22e5\\ {v}_{d}\end{array}\right]=\left[\begin{array}{c}{\varphi}_{1}^{T}\\ {\varphi}_{2}^{T}\\ \u22e5\\ {\varphi}_{d}^{T}\end{array}\right]\mathbf{u}.$$Equation 6 with the constraint 3 forms the LPP transformation. It is easy to obtain the following theorem.

**Theorem 1**: Any two features
${v}_{i}$
and
${v}_{j}$
$(j\ne i)$
in Eq. 6 are correlated.

**Proof**: Let the covariance matrix of the data
${\left\{{\mathrm{x}}_{i}\right\}}_{i=1}^{\mathrm{N}}$
be
$\Xi $
. It is easy to obtain the following equation:

## 3.

## Uncorrelated Locality Preserving Projection

The algorithmic procedure of uncorrelated locality preserving projection (ULPP) is stated below:

1. Perform PCA projection. We project the data set into a PCA subspace to make the matrix $XD{X}^{T}$ become nonsingular. Let the transformation matrix of PCA be ${\Phi}_{PCA}$ .

2. Construct the similarity of any two data points. For each sample ${x}_{i}$ , if ${x}_{j}$ is among the $m$ -nearest neighbors of ${x}_{i}$ , set the similarity ${w}_{ij}={w}_{ji}={\mathrm{x}}_{i}^{T}{\mathrm{x}}_{j}$ ; otherwise, set the similarity ${w}_{ij}={w}_{ji}=0$ . The similarity reflects the local structure of the data space.

3. Obtain the diagonal matrix $D$ and the Laplacian matrix $L$ .

4. Compute the uncorrelated locality preserving projections. Let ${S}_{L}=XL{X}^{T}$ , ${S}_{D}=XD{X}^{T}$ and $I=\mathrm{diag}(1,1,\cdots ,1)$ for concision. Let $\{{\varphi}_{1},{\varphi}_{2},\cdots ,{\varphi}_{k}\}$ be the uncorrelated locality preserving projections, thus we define

The uncorrelated locality preserving projections $\{{\varphi}_{1},{\varphi}_{2},\cdots ,{\varphi}_{k}\}$ can be iteratively computed as follows:

5. Perform the ULPP transformation. Let ${\Phi}^{*}=[{\varphi}_{1},{\varphi}_{2},\cdots ,{\varphi}_{d}]$ . The embedding is as follow:

where $\mathbf{v}$ is a $d$ -dimensional representation of $\mathbf{u}$ and ${\Phi}_{ULPP}={\Phi}_{PCA}{\Phi}^{*}$ .

## 3.1.

### Justification

In this subsection, we provide the justification of our algorithm.

According to the following theorem, we can compute the $k\text{-}\text{th}$ uncorrelated direction ${\varphi}_{k}$ as the eigenvector of ${P}^{\left(k\right)}$ [Eq. 12] associated with the the smallest eigenvalue of ${P}^{\left(k\right)}$ .

**Theorem 2**: The
$k\text{-}\text{th}$
uncorrelated direction
${\varphi}_{k}$
is the eigenvector corresponding to the smallest eigenvalue of the following eigenfunction:

**Proof**: Recall the minimization problem of LPP:

## 14

$$L\left({\varphi}_{k}\right)={\varphi}_{k}^{T}{S}_{L}{\varphi}_{k}-\lambda ({\varphi}_{k}^{T}{S}_{D}{\varphi}_{k}-1)-\sum _{i=1}^{k-1}{\gamma}_{i}{\varphi}_{k}^{T}\Xi {\varphi}_{i}.$$## 15

$$2{S}_{L}{\varphi}_{k}-2\lambda {S}_{D}{\varphi}_{k}-\sum _{i=1}^{k-1}{\gamma}_{i}\Xi {\varphi}_{i}=0.$$## 16

$$\lambda =\frac{{\varphi}_{k}^{T}{S}_{L}{\varphi}_{k}}{{\varphi}_{k}^{T}{S}_{D}{\varphi}_{k}}.$$Multiplying the left-hand side of Eq. 15, respectively, by ${\varphi}_{j}^{T}\Xi {S}_{D}^{-1}$ $(j=1,2,\cdots ,k-1)$ , we have

## 17

$$2\left[\begin{array}{c}{\varphi}_{1}^{T}\\ {\varphi}_{2}^{T}\\ \u22e5\\ {\varphi}_{k-1}^{T}\end{array}\right]\Xi {S}_{D}^{-1}{S}_{L}{\varphi}_{k}-\left[\begin{array}{c}{\varphi}_{1}^{T}\\ {\varphi}_{2}^{T}\\ \u22e5\\ {\varphi}_{k-1}^{T}\end{array}\right]\Xi {S}_{D}^{-1}\Xi \left[\begin{array}{c}{\varphi}_{1}\\ {\varphi}_{2}\\ \u22e5\\ {\varphi}_{k-1}\end{array}\right]\left[\begin{array}{c}{\gamma}_{1}\\ {\gamma}_{2}\\ \u22e5\\ {\gamma}_{k-1}\end{array}\right]=0.$$## 18

$${\Gamma}^{\left(k\right)}=2{\left({\Phi}^{\left(k\right)}\Xi {S}_{D}^{-1}\Xi {\left({\Phi}^{\left(k\right)}\right)}^{T}\right)}^{-1}{\Phi}^{\left(k\right)}\Xi {S}_{D}^{-1}{S}_{L}{\varphi}_{k}.$$## 19

$$2{S}_{L}{\varphi}_{k}-2\lambda {S}_{D}{\varphi}_{k}-\Xi {\left({\Phi}^{(k-1)}\right)}^{T}{\Gamma}^{\left(k\right)}=0,$$## 4.

## Experiment Results

## 4.1.

### Experiments on FERET Data Set

FERET^{3} is a very popular database. Face image variations in the FERET database include pose, illumination, facial expression, and aging. In our experiment, we use a subset of the FERET data set. The data set we selected contains 72 people and 6 images for each individual. All images are aligned by the centers of the eyes and mouth and then scaled with resolution
$112\times 92$
. We randomly select 3 images from each person for training, while the rest of the images of each individual are selected for testing. The experiments are repeated 20 times and the average accuracies of rank 1 to rank 3 are recorded and shown in Table 1. The results show ULPP gives the best performance. The recognition accuracy of ULPP increases from 85.90% with rank 1 to 91.07% with rank 3, which is about 10% higher than the recognition accuracy of LPP.

## Table 1

Recognition accuracies (%) comparison of FERET data set.

Eigenface | LPP | ULPP | |
---|---|---|---|

Rank 1 | 58.61 | 72.64 | 85.90 |

Rank 2 | 63.70 | 78.22 | 89.49 |

Rank 3 | 68.22 | 81.39 | 91.07 |

## 4.2.

### Experiments on AR Data Set

AR^{4} is also a popular database. Face image variations in the AR database include illumination, facial expression, and occlusion. In our experiment, 952 images (119 individuals, 8 images per person), which involve large variations in facial expression, are selected. We manually cropped the face portion of the images. All the cropped images are aligned at the centers of eyes and mouth and then normalized with resolution
$112\times 92$
. We randomly select 4 images from each person for training, while the rest of the images of each individual are selected for testing. The experiments are repeated 30 times and the average accuracies of rank 1 to rank 3 are recorded and shown in Table 2. From Table 2, it can be seen that ULPP outperforms the other two methods. The recognition accuracy of ULPP increases from 80.32% with rank 1 to 86.51% with rank 3, which is about 5% higher than the recognition accuracy of LPP.

## Table 2

Recognition accuracies (%) comparison on AR data set.

Eigenface | LPP | ULPP | |
---|---|---|---|

Rank 1 | 70.58 | 72.42 | 80.32 |

Rank 2 | 79.07 | 77.96 | 84.44 |

Rank 3 | 83.40 | 80.43 | 86.51 |

## 5.

## Conclusion

This paper presents a novel feature extraction method, called uncorrelated locality preserving projection (ULPP). ULPP can extract uncorrelated features which contain minimum redundancy and are highly desirable for many applications. In this paper, we present a theoretical study on ULPP and provide an implementation of it. The experiments on face image data show the superiority of ULPP over the other both PCA and LPP.

## Acknowledgment

This project is partially supported by National Natural Science Foundation of China (60502042) and 2010 Shanghai EXPO Special Project (2004BA908B07, 04DZ05807).

## References

*CVC Technical Report*#24 (June 1998). Google Scholar