1 September 2006 Three-dimensional face pose estimation based on novel nonlinear discriminant representation
Author Affiliations +
Abstract
We investigate the appearance manifold of different face poses using manifold learning. The pose estimation problem is, however, exacerbated by changes in illumination, spatial scale, etc. In addition, manifold learning has some disadvantages. First, the discriminant ability of the low-dimensional subspaces obtained by manifold learning often is lower than traditional dimesionality reduction approaches. Second, manifold learning methods fail to remove the redundancy, such as high-order correlation, among original feature vectors. In this work, we propose a novel approach to address these problems. First, face images are transformed by Gabor filters to obtain a set of overcompleted feature vectors, which can remove intrinsic redundancies within images and provide orientation-selective properties to enhance differences among face poses as well. Second, supervised locality preserving projections (SLPPs) are proposed to reduce dimensionality and obtain the low-dimensional subspace, which has the ability to maximize between-class distance and minimize within-class distance. Finally, the support vector machine (SVM) classifier is applied to estimate face poses. The experimental results show that the proposed approach is effective and efficient.

1.

Introduction

Human face pose estimation has a variety of applications, such as face recognition, face tracking, and human-computer interaction (HCI). Due to the inadequacy of the quality of 3-D quantities estimated from 2-D data, it is very complex to estimate face poses from 2-D face images. In addition, many factors exacerbate the problem, for example, illumination conditions, face expressions, spatial scale, etc. More importantly the appearance of the human head can change drastically across different viewing angles, mainly caused by nonlinear deformations during in-depth rotations of the head.1 Many different approaches have been proposed to solve this problem. Generally, the existing pose estimation methods can be broadly classified into two categories: feature-based2 and appearance-based methods.3, 4

There are four major problems to be solved in the existing approaches mentioned before. The first is that the face region must be extracted from a whole face image. It is very difficult to locate the face region from a side or a profile face image. The second is that original face images are normalized manually. However, manual normalization is tedious work and its cost is very high. The third is the difficult problem of extracting the face features accurately. It is especially more difficult to extract features from a side face image than a frontal one. Lastly, face images with varying intrinsic features such as illumination, face pose, and face expression are considered to constitute highly nonlinear manifolds in the high-dimensional observation space. Therefore, some pose estimation systems using linear approaches [for example, principal components analysis (PCA)] will ignore subtleties of manifolds. Manifold learning algorithms are better alternatives. However, the discriminant ability of the low-dimensional subspaces obtained by manifold learning is often lower than those obtained by the traditional dimensionality reduction approaches. Furthermore, the original feature vectors may include high-order correlation, which cannot be removed by manifold learning algorithms. Therefore, a new approach based on manifold learning is proposed to address the four problems mentioned before. In our proposed approach, face images not removed from the background are first transformed by Gabor filters. Then, a novel supervised locality preserving projection (SLPP) is proposed to project Gabor-based data out of the samples into a common low-dimensional subspace. For simplicity, the two combinations of Gabor fiters (GF) and SLPP are abbreviated to GF+SLPP . Last, the support vector machine (SVM) classifier is applied to estimate the face pose.

2.

Proposed Combination Approaches of Gabor Filters and the Supervised Locality Preserving Projection

Gabor filters are particularly appropriate for use in face pose estimation because they incorporate smoothing and can reduce sensitivity to spatial misalignment and illumination change. GWT can also obtain image representations that are locally normalized in intensity and decomposed in spatial frequency and orientation.5 In addition, Gabor filters can enhance pose-specific face features. Moreover, Gabor filters transform the face images into frequency domain, where unnoticeable information in the spatial domain will become clear. The transformational results of face images do well improving the discriminant ability of SLPP.

In our studies, the system processes face images as follows. A set of Gabor kernels hmn(x,y) is specified and the original image I(x,y) is convolved with those kernels at each pixel. The result is a set of 2-D coefficient arrays,

1

Wmn=I(x,y)*hmn(x,y),
where Wm,n(x,y) is the convolution result corresponding to the Gabor kernel at scale m and orientation n . * denotes the convolution operator.

Since the outputs Wm,n(x,y) consist of different localities, scales, and orientation features, we concatenate all these features into a feature vector X . Without loss of generality, assume each output Wm,n(x,y) is a column vector, which can be constructed by concatenating the rows (or columns) of the output. Before the concatenation, each output Wm,n(x,y) is down-sampled by a factor ρ to reduce the dimensionality of the origin vector space. Then, it is normalized to zero mean and unit variance. Let Wm,nρ denote a normalized output, and then the feature vector X(ρ) is defined as:

2

X(ρ)=[W0,0(ρ)t,W0,1(ρ)t,,W4,7(ρ)t]t,
where t is the transpose operator. The feature vector thus encompasses all the output Wm,n(x,y) as important discriminating information.

After high-order information features are extracted by the Gabor filters, an immediate problem is to reduce the dimensionality and uncover the intrinsic low-dimensionality manifold. In this work, we propose a SLPP approach.

LPP seeks a transformation W to project high-dimensional input data X=[x1,x2,,xm] into a low-dimensional subspace Y=[y1,y2,,ym] . The linear transformation W can be obtained by minimizing an objective function as follows:6

3

mini,j(wTxiwTxj)2Sij,
where Sij evaluates the local structure of data space. It can be defined as follows:

4

Sij={exp(xixj2t)ifxiandxjareclose0otherwise},
where parameter t is a suitable constant. The minimization problem can be converted to solving a generalized eigenvalue problem as follows:

5

XLXTW=λXDXTW,
where Dii=jSji is a diagonal matrix, and L=DS . For a more detailed derivation and justifications of LPP, refer to Ref. 6.

The d -dimensional data from LPP are further mapped into d -dimensionality discriminant subspace through the linear discriminant analysis (LDA) algorithm. To minimize the intraclass distances while maximizing the interclass distances of the face manifold, the column vectors of discriminant matrix W are calculated by the eigenvectors of Sw1Sb associated with the largest eigenvalues,

6

Sw1SbW=λW,
where Sb is the between-class scatter matrix, and Sw is the within-class scatter matrix. Then the matrix W projects vectors in the low-dimensionality face subspace into the common discriminant subspace, which can be formulated as follows:

7

Z=WY=WWXZRd,YRd,WRd×d,
where Z encodes classification information.

3.

Experimental Results

In this section, we manually selected two collections of face images from the JDL-PEAL face database.7 They both include 130 subjects, which are selected randomly, each with seven differently posed face images varying intrinsic features such as pose, illumination, and expression. The difference between the two collections is that the first collection is used as a training set while the second one is used as a testing set. In the first collection, all face images were resized to 24×18 . Some samples are illustrated in Fig. 1. Before performing the proposed approach, several parameters need to be fixed. First, for the Gabor filters, we chose five scales and eight orientations, and the number of ρ is 4. Second, the two reduced dimensions d and d of the proposed method are fixed. d is defined as 20. The reduced discriminant dimension d is generally no more than L1 , where L denotes the number of face poses.

Fig. 1

Some samples of face images in JDL-PEAL face database.

090503_1_1.jpg

We compared our proposed GF+SLPP algorithm with PCALDA, GF+PCALDA , and SLPP. For PCALDA, the algorithm is exploited to obtain the subspace in the training set directly. For SLPP, we utilize the SLPP approach without Gabor filters to learn the subspace in the training set. For GF+PCALDA , the approach is similar to the GF+SLPP approach, but the dimensionality reduction approach is replaced by PCALDA.

In the GF+SLPP approach, the reduced discriminant dimension d influences the performance of the proposed approach. It can be seen from Fig. 2 that as d increases, the GF+SLPP has a higher accuracy rate.

Fig. 2

The influence of the reduced dimension d for accuracy rate of face pose estimation.

090503_1_2.jpg

The experimental results with the optimal reduced dimensions are listed in Table 1. It can be seen from Table 1 that the discriminant ability of the SLPP approach is better than the PCA+LDA approach, and the GF+SLPP method achieves the best performance.

Table 1

The accuracy rate (percent) of the combination of dimensionality reduction and SVM classification. d=20 and d′=6 .

Face pose −45deg −30deg −15deg 0deg +15deg +30deg +45deg
GF+SLPP accuracy rate96.2396.5395.9797.4995.9396.1196.29
GF+PCALDA accuracy rate64.1465.7669.8573.5268.5966.3564.62
SLPP accuracy rate75.2378.5178.8483.8579.1577.5873.39
PCALDA accuracy rate58.2159.6861.1864.3861.2358.9257.98

4.

Conclusions

We propose a combination approach of Gabor filters and supervised locality preserving projections. Experimental results show that GF+SLPP has the best performance among all the involved approaches.

Acknowledgments

The research is sponsored by the Fundamental Project of the Committee of Science and Technology, Shanghai, under contract 03DZ14015.

References

1.  B. Gokberk, L. Akarun, and E. Alpaydin, “Feature selection for pose invariant face recognition,” Proc. ICPR, pp. 306–309 (2002). Google Scholar

2.  N. Kruger, M. Potzsch, and C. von der Malsburg, “Determination of face position and pose with a learned representation based on labeled graphs,” Image Vis. Comput.0262-8856 15(10), 741–748 (1997). Google Scholar

3.  H. Murase and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” Int. J. Comput. Vis.0920-5691 10.1007/BF01421486 14(1), 5–24 (1995). Google Scholar

4.  B. Raytchev, I. Yoda, and K. Sakaue, “Head pose estimation by nonlinear manifold learning,” Proc. ICPR, pp. 462–466 (2004). Google Scholar

5.  C. Liu, “Gabor-based kernel PCA with fractional power polynomial models for face recognition,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 10.1109/TPAMI.2004.1273927 26(5), 572–581 (2004). Google Scholar

6.  X. He, S. Yan, Y. Hu, P. Niyogi, and H. Zhang, “Face recognition using Laplacian faces,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 10.1109/TPAMI.2005.55 27(3), 328–340 (2005). Google Scholar

7.  W. Gao, B. Cao, S. Shan, D. Zhou, X. Zhang, and D. Zhao, “The CAS-PEAL large Chinese face database and baseline evaluations,” Technical Report on CAS-PEAL. Google Scholar

© (2006) Society of Photo-Optical Instrumentation Engineers (SPIE)
Xinliang Ge, Xinliang Ge, Jie Yang, Jie Yang, Tianhao Zhang, Tianhao Zhang, Huahua Wang, Huahua Wang, Chunhua Du, Chunhua Du, } "Three-dimensional face pose estimation based on novel nonlinear discriminant representation," Optical Engineering 45(9), 090503 (1 September 2006). https://doi.org/10.1117/1.2355524 . Submission:
JOURNAL ARTICLE
3 PAGES


SHARE
Back to Top