Human face clustering is an important research direction in the field of optical image processing. It has been successfully used in many fields. For example, in face image retrieval, if we can automatically cluster different kinds of face images, then it will beneficial to discover the latent similarities among various face images. The computational cost of searching for an interested face image will also reduce considerably.
Currently, a lot of research has been proposed for face image processing. However, most of it focuses on learning a subspace for face classification (i.e., classify the face images in the learned subspace with a certain number of labeled samples). The most prominent approach is called the Fisherface.1 It employs the supervised method, linear discriminant analysis (LDA),2 for face classification.
There is little work dedicated to learning a subspace for face image clustering. One direct way is to employ the traditional clustering techniques (e.g., K-means3) for clustering. Nevertheless, it is difficult for K-means to achieve a satisfied accuracy because the vector representations of face images have very high dimensionality (commonly, the number of pixels of a image). Then, one of the most famous works, which is named -means, is proposed. It employs the unsupervised principle component analysis (PCA)2 technique to compute the famous Eigenface 1 and then applies the K-means approach for clustering. Although it has achieved prominent performances in many applications, -means seems unsuitable for face image clustering. It takes into account few considerations about the special characters and manifold structures of face images.4 Recently, Ding extended the traditional supervised LDA method for clustering.5 The method, which is called LDA-Km, combines LDA with K-means and learns the subspace and clusters alternately. It has been reported that LDA-Km performs better than the K-means and the -mean approaches.5 However, it also gives no considerations to the manifold structure and the spatial smooth character6 of face images. Additionally, LDA-Km applies the traditional ratio trace criterion, not the more prominent trace ratio criterion.7 The accuracy of LDA-Km is not as high as expected for face image clustering.
In this paper, we propose a new method, which is named as subspace clustering via trace ratio (SCTR) for face image clustering. First, it applies the trace ratio criterion, which has been shown more effective for discriminative subspace learning.7 Second, we employ the spatial smooth regularizer, which represents the particular character of images. Finally, we consider the manifold structure by intentionally adding the graph-smooth constraint. After computing the optimal clustering and learning the subspace alternately, we can finally derive the subspace and clustering results simultaneously. Experiments show that SCTR performs better than the state-of-the-art approaches.
Leaning a Subspace for Face Image Clustering
Assume are vector representations of face images, which are all of resolutions. Here for and . The number of clusters is predefined as . The purpose of our method is to find the subspace in which these face images can be optimally clustered. More concretely, under a given criterion, we plan to find the transformation matrix and simultaneously, the indicator matrix . Here, is the dimensionality of the subspace. if belong to the cluster, and otherwise, where is the number of samples in the cluster.
It has been shown7 that the trace ratio criterion is much more effective than the ratio trace, which is widely used in the traditional LDA approach and its variants. We employ the trace ratio criterion in our approach. Mathematically, assume are the embddings of (i.e., ). The within scatter matrix and the total-scatter matrix are defined as follows:is the mean of the samples in the ’th cluster, is the mean of all samples. The corresponding and are is an -dimensional column vector with all ones. The trace ratio criterion proposed in Ref. 7 is
We now consider the spatial character of images. Because an image represented in the plane is intrinsically a matrix. The pixels spatially close to each other may be correlated. Although we have pixels per image, this spatial correlation suggests the real number of freedom is far less. We employ a spatial regularizer (i.e., the Laplacian penalty) to constrain the coefficients to be spatially smooth.6 In brief, we define the matrices that yield the discrete approximation of the second-order derivation as follows:for . The discrete approximation for the two-dimensional Laplacian is a matrix is the identity matrix for . ⊗ represents the Kronecker product of two matrixes.
For a image vector , is proportional to the sum of squared differences between nearby grid points in that image. It can be used to measure the smoothness of an image on a lattice. Because each column vector of can be regarded as a basis image, we add to where is a balance parameter that controls the smoothness of the estimator.
Finally, as in most learning-based face image processing approaches1 we assume that the face images, which belong to the same cluster, are nearly lying on a low dimensional manifold. The graph Laplacian4 (i.e., ) is employed to represent this character. Intuitively, the cluster indicators of face images belonging to the same cluster should be identical. In other words, if we regard that the cluster indicators are the output of a function defined on all face images, this intuition indicates that the predefined function should be smooth on these manifolds. Mathematically, it is equal to minimize .
In summary, we define SCTR as the solution to the following problem:is also a balance parameter.
There are totally two different kinds of variables that should be optimized (i.e., and ). It is difficult to compute them simultaneously. We alternately optimize them.
Stage one: fixing and optimizing
In this situation, the optimization problem in Eq. 6 becomes
It has the same form as the problem proposed in Ref. 7, except for a little differences in the formulation of the numerator. Thus, we can directly employ the same technique. It is an iterative procedure. For our problem, in the ’th step, we solve a trace difference problemis the trace ratio value calculated from the previous projection matrix in the previous step. Please see Ref. 7 for more details.
Stage two: fixing and optimizing
In this case, because and have no relation with , the optimization problem in Eq. 6 is equivalent to
This problem can be effectively solved by spectral decomposition technique. In fact, the optimal to the problem in Eq. 8 is formed by the eigenvectors corresponding to the largest eigenvalues of .
There are two essential parameters (i.e., and ) in our method. They control the smoothness of estimator and the balance of two objectives. When , SCTR totally ignores the spatial character of the face image. When , SCTR tends to only choose a spatial smoothest basis and ignore the discriminative information. When , we ignore the smooth constraint on . If , SCTR only concerns the smoothness of .
Parameter determination is an essential task in most of the learning algorithms.6 Among various kinds of methods, grid search is probably the simplest and most widely used one for unsupervised learning. Because we have constrained and , we apply the grid search technique in this paper. More concretely, we set and by searching the grid .
There are another three problems that should be indicated here. (i) Because SCTR is an iterative approach, a direct way to initialize the iteration is to first compute by -means and then optimize the problems in Eqs. 7, 8 alternately. (ii) The optimal that maximizes Eq. 8 may not have the form of the indicator matrix that we specified in the previous part. We apply the discretization procedure in Ref. 8 to solve this problem. (iii) Because we have derived and , it is easy to compute the embedding and the cluster of a new image by using and directly.
We employ four different kinds of face image data sets for illustration. They are the Yale data, the ORL data, the Umist data, and the first 20 classes of Feret data. All these images are rescaled to resolutions.
We compare SCTR with K-means, -means, and LDA-Km. Two standard measurements—the accuracy (Acc) and normalized mutual information (NMI)—are used. The parameters and are determined by grid search and . We randomly initialize K-means and repeat for 100 times. The average values of these Accs are listed in Table 1. Figure 1 shows the corresponding mean of these NMIs.
Mean accuracy (Acc) results of different methods on four face image data sets.
As seen from Table 1 and Fig. 1, SCTR performs the best. It achieves both the highest mean Acc and the largest mean NMI in all cases. Because all the images are rescaled to be low resolution and the dimensionality is not so high, it seems that K-means and -means have the similar performances on these data sets. LDA-Km performs better than K-means and -means.
We have also done some experiments on the Yale data set with different parameters, which are set by the grid search. The Acc results versus different and are shown in Fig. 2. It seems that for Yale, smaller and are more suitable. The best parameter and .
This work is partly supported by NSF China (Grants No. 60673090 and No. 60721003), the Hunan Provincial Innovation Foundation for Postgraduate and the Graduate School of National University of Defense Technology.