## 1.

## Introduction

Face recognition (FR) has recently received extensive attention due to its broad applications. Many techniques have been developed over the past decades for FR. Subspace analysis is one of the most efficient techniques. The most popular subspace techniques are principal component analysis (PCA),^{1} linear discriminant analysis (LDA),^{2} and independent component analysis (ICA).^{3} Besides directly processing image appearance, subspace methods can also be combined with the Gabor feature^{4} to derive the Gabor-based methods.^{5, 6, 7} However, the recognition time of these methods depends heavily on the size of the training set because a test image is compared to all training images. Such a recognition scheme is not efficient, especially when the training set is too large. Thus, we propose integrating a novel clustering method, affinity propagation (AP),^{8} with LDA to form AP-LDA for FR. By using AP on the low-dimensional features obtained from LDA, a representative face image for each subject can be achieved. Thus, AP-LDA needs to use only representative face images rather than all training face images for recognition.

We combine AP with LDA for the following reasons: AP can cluster data points into different clusters and detect a representative example for each cluster. By using AP, we intend to achieve a representative face image for each subject. If so, we need to use only representative face images for recognition. However, directly using AP on gray pixel values is computationally expensive and inefficient. For example, the difference between two pixels on the same positions in two images of the same subject is obvious when the illumination is different. However, they should be considered to be similar to each other in fact. Thus, more discriminating and efficient low-dimensional features should be extracted before using AP for clustering. Therefore, LDA is adopted, as it can not only be used for dimensionality reduction but also extract discriminative features.

## 2.

## Review of the AP Clustering Method and the LDA Method

## 2.1.

### AP Clustering Method

AP^{8} first builds a similarity matrix
$s$
, in which
$s(i,k)$
between data points
${x}_{i}$
and
${x}_{k}$
is their negative Euclidean distance
$\left[s\right(i,k)=-\parallel {x}_{i}-{x}_{k}{\parallel}^{2}]$
. Before clustering, each data point also needs to be assigned a number
$P\left(B\right)$
, which describes the *a priori* knowledge of how good point
$B$
is as a representative example. The data points with larger values of
$P\left(B\right)$
are more likely to be chosen as representative examples. These values are referred to as *preferences*. In fact, the probability of each point being the representative example is the same; thus, the preferences should be set to the same value, which can be varied to produce different numbers of clusters. Generally, such a value takes the median of the
$s$
. After the construction of a similarity matrix and the setting of preferences, two kinds of messages (responsibility and availability) are passed between data points. The responsibility
$r(i,k)$
, sent from
${x}_{i}$
to candidate representative example
${x}_{k}$
, reflects the accumulated evidence for how proper it would be for
${x}_{k}$
to serve as the representative example for
${x}_{i}$
. It is updated using the rule:

## 1

$$r(r,k)\leftarrow s(i,k)-\underset{{k}^{\prime}s.t.{k}^{\prime}\ne k}{\mathrm{max}[a(i,{k}^{\prime})+s(i,{k}^{\prime})]}.$$The availability $a(i,k)$ , sent from candidate representative example ${x}_{k}$ to ${x}_{i}$ , reflects the accumulated evidence for how well-suited ${x}_{i}$ is to choose ${x}_{k}$ as its representative example. It is computed by the rule:

## 2

$$a(i,k)\leftarrow \mathrm{min}\{0,r(k,k)+\sum _{{i}^{\prime}s.t.{i}^{\prime}\notin \{i,k\}}[0,r({i}^{\prime},k)]\}.$$It is clear that availabilities and responsibilities can be combined to recognize representative examples at any time. For ${x}_{i}$ , the $k$ that maximizes $a(i,k)+(i,k)$ indicates that ${x}_{k}$ serves as the representative example for ${x}_{i}$ .

## 2.2.

### LDA Method

Suppose there is a set of
$N$
$d$
-dimensional samples
$\{{x}_{1},{x}_{2},\dots ,{x}_{N}\}$
belonging to
$c$
classes
$\{{X}_{1},{X}_{2},\dots ,{X}_{N}\}$
. LDA^{2} aims to find
${W}_{opt}$
that maximizes the ratio of the between-class scatter matrix to the within-class scatter matrix, i.e.,

## 3

$${W}_{opt}=\mathrm{arg}\phantom{\rule{0.2em}{0ex}}\underset{W}{\mathrm{max}}\frac{\mid {W}^{T}{S}_{B}W\mid}{\mid {W}^{T}{S}_{W}W\mid}=({w}_{1},{w}_{2},\dots ,{w}_{m}),$$## 5

$${S}_{W}=\sum _{i=1}^{c}\sum _{{x}_{k}\u220a{X}_{i}}({x}_{k}-{\mu}_{i}){({x}_{k}-{\mu}_{i})}^{T},$$Note that there are at most $c-1$ nonzero generalized eigenvalues; thus, the dimension of the reduced space is $c-1$ . At the same time, to make the within-class scatter matrix ${S}_{W}$ nonsingular, PCA is first used to reduce the dimension of the feature to $N-c$ , and then LDA is applied to reduce the dimension to $c-1$ .

## 3.

## Summary of AP-LDA

Suppose that there is a face data set of
$N$
face images belonging to
$c$
different subjects. First,
$n$
face images per person (hence,
$n\times c$
in total) are selected for training using LDA, obtaining
$n\times c$
corresponding
$c-1$
dimensional features. Second, AP is used to cluster these features into different clusters and obtain a representative feature for each cluster. Note that the number of clusters obtained by using AP may not equal the number of subjects, as the former is influenced by the values of preference,^{8} while the latter is fixed. To solve this problem, we repeatedly vary the value of the preferences until two such numbers equal to each other. Last, each test image is converted to a low-dimensional feature, which is then compared to
$c$
representative features and identified using a nearest-neighbor classifier. Since AP-LDA uses only
$c$
representative features for recognition, the recognition time of AP-LDA depends mainly on the number of subjects. It increases linearly with the
$c$
, while the recognition time of the Fisherface method increases with the training image size
$(n\times c)$
. Obviously, our AP-LDA method is more computationally efficient than the Fisherface method.

## 4.

## Experiments and Discussion

In this section, experiments on three benchmark face data sets [Yale^{9}; extended Yale^{10}; and Pose, Illumination, and Expression (PIE)^{11}] are carried out to show the effectiveness of our AP-LDA method and also to compare it to Fisherface. All face images are cropped based on the centers of eyes such that facial areas contain only the face. All cropped images are then normalized to the size of
$32\times 32$
, with 256 gray levels per pixel. In the following experiments, different numbers of images per subject were randomly selected for training, and the rest were used for testing. To minimize the possible misleading results, the final recognition rates were obtained by averaging the results over five random splits.

The Yale database contains 165 images with 11 different images for each of the 15 distinct subjects. Comparative results are summarized in Table 1. It is clear that AP-LDA obviously outperforms Fisherface when the number of training images per person is more than 3.

## Table 1

Comparative results on Yale database.

FR method | Number of training face images per person | |||||
---|---|---|---|---|---|---|

3 | 4 | 5 | 6 | 7 | 8 | |

Fisherface | 60.17% | 69.33% | 72.00% | 77.33% | 79.33% | 84.89% |

AP-LDA | 55.33% | 69.90% | 76.89% | 78.13% | 83.33% | 87.56% |

The Extended Yale database is extended from the Yale database. It contains approximately 64 near-frontal images for each of 38 distinct subjects. We randomly selected 20 images per person for our experiments. Table 2 presents the results. We can see that AP-LDA remarkably outperforms Fisherface in all cases.

## Table 2

Comparative results on subset from Extended Yale database.

FR method | Number of training face images per person | ||||||
---|---|---|---|---|---|---|---|

4 | 5 | 6 | 7 | 8 | 9 | 10 | |

Fisherface | 59.61% | 63.30% | 70.08% | 71.01% | 74.08% | 77.18% | 80.32% |

AP-LDA | 65.03% | 70.49% | 74.40% | 76.88% | 80.53% | 81.24% | 84.21% |

The CMU Pose, Illumination, and Expression (PIE) database contains 41,368 facial images of 68 individuals. We randomly chose 30 images per person for our experiments. The results are shown in Table 3. Again, AP-LDA consistently outperforms Fisherface.

## Table 3

Comparative results on subset from CMU PIE database.

FR method | Number of training face images per person | ||||||
---|---|---|---|---|---|---|---|

4 | 5 | 6 | 7 | 8 | 9 | 10 | |

Fisherface | 50.84% | 58.09% | 63.39% | 64.59% | 68.69% | 68.43% | 70.66% |

AP-LDA | 59.24% | 65.96% | 70.92% | 71.71% | 73.68% | 72.61% | 74.15% |

Comparative experimental results show that:

1. The recognition rates of AP-LDA increase with the increase in the number of training face images per person because the more training images per person, the better the representative face images that can be detected.

2. Although AP-LDA performs better than Fisherface on the Yale face data sets in most cases, the improvement is not significant. Moreover, it performs not as well as Fisherface when the number of training images per person is 3. This is because the AP method could not find good representative face images in these cases.

3. AP-LDA remarkably outperforms Fisherface on the Extended Yale and CMU PIE databases. In fact, for these two databases, we chose only a fraction of images for our experiments. We found that our AP-LDA method performed significantly better than Fisherface when all images were used. However, such improved performance was achieved at the expense of increase of computational cost.

## 5.

## Conclusions

We have introduced in this letter a novel AP-LDA method for face recognition. Unlike Fisherface, which uses all training face images for recognition, our AP-LDA method uses only representative face images. This makes AP-LDA more efficient. Experiments also indicate that AP-LDA outperforms Fisherface in terms of recognition rate.

## Acknowledgments

The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. This research has been supported by the National Natural Science Foundation of China (Grant Nos. 60675023 and 60602012).