## 1.

## Introduction

Face recognition (FR) has received increasing attention in recent years due to its potential applications. Many FR systems have been devised in the past few decades. One of the most popular techniques for FR is subspace analysis. Two of the most important and representative ones are principal component analysis (PCA)^{1} and linear discriminant analysis (LDA).^{2} Many other methods such as 2DPCA,^{3} 2DLDA,^{4} ICA,^{5}
${\left(2\mathrm{D}\right)}^{2}\mathrm{LDA}$
,^{6}
$\mathrm{BDPCA}+\mathrm{LDA}$
,^{7} and
$\mathrm{IPCA}-\mathrm{ICA}$
^{8} are all extended from them. LPP^{9} is also a subspace analysis method. As stated in,^{9} it performs better than LDA in terms of error rate. However, all methods [including locality preserving projections (LPP)] mentioned above are inefficient, because they identify a test face image by comparing it with all training images. To overcome this disadvantage, we propose to combine LPP with affinity propagation (AP)^{10} to give a new FR framework, APLPP, which uses much fewer training images, free of noise, to identify a test image. Thus APLPP is more computationally efficient than LPP.

The reason why we propose to combine AP with LPP is as follows: AP can detect a representative sample for each class. Therefore, we can use AP to detect a representative face image for each subject and use only the representative images rather than all training ones for identification. However, the original AP regards the negative Euclidean distance between two images as their measure of similarity, which is directly computed on the gray pixel values. As we stated in Ref. 11, such a definition of similarity is unreasonable. To solve this problem, we propose to compute the similarity on low-dimensional features derived from LPP, not only because LPP can be used for dimensionality reduction and best detect the essential face manifold structure, but because it can reduce the unwanted variations resulting from changes in lighting, facial expression, and pose. By using LPP on the high-dimensional face data, more efficient low-dimensional features can be obtained. Obviously, the similarity on these features should be better than that directly computed from the gray pixel values. In addition, it needs much less computational time, because the dimension has been reduced greatly. In view of this analysis, we propose to combine AP with LPP and make full use of their advantages.

The rest of this paper is as follows: AP and LPP are reviewed in Secs.
2, ^{3}, respectively. In Sec. 4, we summarize the APLPP method. Experimental results are given in Sec. 5. The final section gives the conclusions.

## 2.

## Affinity Propagation

AP^{10} is an efficient clustering method. It first builds a similarity matrix
$s$
, in which
$s(i,k)$
between data points
${x}_{i}$
and
${x}_{k}$
is their negative Euclidean distance. Before clustering, each point also needs to be assigned a number
$P\left(B\right)$
, which characterizes the prior knowledge of how good point
$B$
is as a representative example. The data points with larger values of
$P\left(B\right)$
are more likely to be chosen as representative examples. These values are referred to as *preferences*. In fact, all data points are equally suitable as representative examples; thus the preferences should be set to the same value, which can be varied to produce different numbers of clusters. Generally, that value is the median of
$s$
. After the construction of similarity matrix and the setting of preferences, two kinds of messages (responsibility, availability) are passed between data points. The responsibility
$r(i,k)$
, sent from the data point
${x}_{i}$
to the candidate representative example data point
${x}_{k}$
, reflects the accumulated evidence for how proper it would be for point
${x}_{k}$
to serve as the representative example for point
${x}_{i}$
. It is updated using the rule

## Eq. 1

$$r(i,k)\leftarrow s(i,k)-\underset{{k}^{\prime}:{k}^{\prime}\ne k}{\mathrm{max}}\{a(i,{k}^{\prime})+s(i,{k}^{\prime})\}.$$## Eq. 2

$$a(i,k)\leftarrow \mathrm{min}\{0,r(k,k)+\sum _{{i}^{\prime}:{i}^{\prime}\notin \{i,k\}}\phantom{\rule{0.2em}{0ex}}\mathrm{max}\{0,r({i}^{\prime},k)\}\}.$$Availabilities and responsibilities can be combined to recognize representative examples at any time during affinity propagation. For point ${x}_{i}$ , the $k$ that maximizes $a(i,k)+r(i,k)$ indicates that point ${x}_{k}$ serves as the representative example for point ${x}_{i}$ .

## 3.

## Locality Preserving Projections

LPP, or Laplacianface,^{9} is also a subspace analysis method for recognition. Here, it is formulated as follows: Suppose a set of face images
$\{{x}_{1},{x}_{2},\dots ,{x}_{N}\}$
. Let
$S$
be a similarity matrix defined on the data points. Laplacianface can be obtained by solving the minimization problem

## Eq. 3

$${W}_{\mathrm{opt}}=\mathrm{arg}\phantom{\rule{0.2em}{0ex}}\underset{W}{\mathrm{min}}\phantom{\rule{0.2em}{0ex}}\sum _{i=1}^{N}\sum _{j=1}^{N}{({W}^{T}{x}_{i}-{W}^{T}{x}_{j})}^{2}{S}_{ij}=\mathrm{arg}\phantom{\rule{0.2em}{0ex}}\underset{W}{\mathrm{min}}\phantom{\rule{0.2em}{0ex}}{W}^{T}XL{X}^{T}W$$Once the eigenvectors are computed, let $W=[{w}_{1},{w}_{2},\dots ,{w}_{k}]$ be the transformation matrix. Thus the low-dimensional feature can be obtained by using $Y={W}^{T}X$ .

## 4.

## Proposed Affinity Propagation Locality Preserving Projections

Suppose there are $N$ training images $\{{x}_{1},{x}_{2},\dots ,{x}_{N}\}$ belonging to $c$ different subjects. Firstly, select $n$ $(n>1)$ images per person (hence $nc$ in total) to form the training set, and use LPP on the training set to get the transformation matrix $W$ and the $nc$ corresponding reduced features. Secondly, use AP to cluster these $nc$ reduced features into $c$ different classes and obtain a representative reduced feature for each class. Thus there will be $c$ representative features ${A}_{1},{A}_{2},\dots ,{A}_{c}$ , such that each feature is assigned an identity ${\omega}_{i}$ $(1\le i\le c)$ , and each class has just one representative feature. The reason why these features should be clustered into $c$ different classes can be found in Ref. 11. Finally, convert the test face image $T$ into the low-dimensional feature $L$ using $L={W}^{T}\times T$ , and then identify it using a nearest-neighbor classifier (NNC) or AP. Thus two methods APLPP(NNC) and APLPP(AP) are generated. APLPP(NNC) uses NNC to find which representative face image is nearest to the test face image. In this case, Euclidean distance is used. On the other hand, APLPP(AP) uses AP again to cluster $c+1$ features ( $c$ representative features plus a test feature) into $c$ classes and find which representative features the test one belongs to. It should satisfy two conditions: (a) the number of classes after clustering must equal $c$ ; (b) the representative features achieved in the training stage must remain representative ones after clustering. How to achieve this goal? As described before, the points with larger preference are more likely to be chosen as representative examples, and each test feature $B$ is supposed to be clustered into one of these representative features; thus we assign large values (e.g., 1) to ${A}_{1},{A}_{2},\dots ,{A}_{c}$ and very small values (e.g., $-1000$ ) to $L$ , and then perform the clustering.

Since APLPP(NNC) and APLPP(AP) use only $c$ representative features for recognition, their recognition time mainly depends on the number of subjects. It increases linearly with $c$ , while the recognition time of LPP increases linearly with the number of training images $\left(nc\right)$ . Obviously, both our methods are more computationally efficient than LPP.

## 5.

## Experiments

In this section, experiments are described that validate the effectiveness of APLPP and compare it with LPP, using the extended YaleB face database.^{12} It contains around 64 near-frontal images for each of 38 distinct subjects. All images were preprocessed^{11} so that the final images have size
$32\times 32$
, with 256 gray levels per pixel. Different numbers of images per individual were taken to form the training set. The rest of the database was considered to be the testing set. The experiments were repeated 20 times, and the average accuracy was recorded. In general, the performance of the LPP and APLPP varies with the number of dimensions. We only show the best results obtained by them in Table 1. The values in parentheses denote the dimensions to attain top recognition rate. As can be seen, APLPP(AP) performs uniformly better than APLPP(NNC), followed by LPP. This means AP is more suitable for use as a classifier than NNC in these cases. Moreover, both of our methods improve the FR performance by large margins over LPP.

## Table 1

Performance comparison on extended YaleB database. Here n is the number of training face images per person.

Face recognitionapproach | Accuracy and (no. of dimensions needed) | ||||
---|---|---|---|---|---|

n=10 | 20 | 30 | 40 | 50 | |

LPP | 76.4602% (38) | 70.8313% (36) | 82.1154% (86) | 93.9597% (44) | 96.3521% (38) |

APLPP(NNC) | 76.674% (62) | 71.058% (60) | 82.2135% (38) | 94.8937% (38) | 96.5564% (38) |

APLPP(AP) | 79.8869% (62) | 72.3519% (80) | 82.5432% (38) | 96.0011% (38) | 98.1907% (38) |

The experiments reveal some interesting points:

1. The recognition rates of all methods generally increase with the number of training samples. The more training samples, the better performance can be achieved.

2. Both APLPP(NNC) and APLPP(AP) perform better then LPP by a large margin. The underlying reason is that AP can detect a good representative face image for each subject. The more training images per person, the better the representative face images that can be detected, and thus the better the performance attained. Another important reason is that APLPP uses only representative face images, free of noise, for identification, which means it avoids the effects of the noise.

3. Since the APLPP uses only the representative samples in the testing stage, APLPP is more computationally efficient than LPP. Moreover, the speed difference between APLPP and LPP would become more significant with increase of the number of training images per person.

## 6.

## Conclusions and Further Work

LPP has been proposed to be combined with AP to give a new FR framework. In contrast with LPP, which identifies a test image by comparing it with all training images, our proposed method uses only representative face images. We also apply the AP clustering method for identification. The new framework has been evaluated on the extended YaleB database. Comparative experimental results show the effectiveness of the new framework. Further work will include applying the proposed method to human gait recognition and digital character recognition.

## Acknowledgments

The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. This research has been supported by the National Natural Science Foundation of China (No. 60675023, No. 60602012) and the China 863 project (No. 2007AA01Z164).