19 August 2017 Learning toward practical head pose estimation
Author Affiliations +
Head pose is useful information for many face-related tasks, such as face recognition, behavior analysis, human–computer interfaces, etc. Existing head pose estimation methods usually assume that the face images have been well aligned or that sufficient and precise training data are available. In practical applications, however, these assumptions are very likely to be invalid. This paper first investigates the impact of the failure of these assumptions, i.e., misalignment of face images, uncertainty and undersampling of training data, on head pose estimation accuracy of state-of-the-art methods. A learning-based approach is then designed to enhance the robustness of head pose estimation to these factors. To cope with misalignment, instead of using hand-crafted features, it seeks suitable features by learning from a set of training data with a deep convolutional neural network (DCNN), such that the training data can be best classified into the correct head pose categories. To handle uncertainty and undersampling, it employs multivariate labeling distributions (MLDs) with dense sampling intervals to represent the head pose attributes of face images. The correlation between the features and the dense MLD representations of face images is approximated by a maximum entropy model, whose parameters are optimized on the given training data. To estimate the head pose of a face image, its MLD representation is first computed according to the model based on the features extracted from the image by the trained DCNN, and its head pose is then assumed to be the one corresponding to the peak in its MLD. Evaluation experiments on the Pointing’04, FacePix, Multi-PIE, and CASIA-PEAL databases prove the effectiveness and efficiency of the proposed method.
© 2017 Society of Photo-Optical Instrumentation Engineers (SPIE)
Gaoli Sang, Gaoli Sang, Feixiang He, Feixiang He, Rong Zhu, Rong Zhu, Shibin Xuan, Shibin Xuan, } "Learning toward practical head pose estimation," Optical Engineering 56(8), 083104 (19 August 2017). https://doi.org/10.1117/1.OE.56.8.083104 . Submission: Received: 15 December 2016; Accepted: 24 July 2017
Received: 15 December 2016; Accepted: 24 July 2017; Published: 19 August 2017


A comparison of 1D and 2D LSTM architectures for the...
Proceedings of SPIE (February 07 2015)
Audio-video feature correlation: faces and speech
Proceedings of SPIE (August 23 1999)
Efficient Segmentation Method For Range Data
Proceedings of SPIE (November 21 1982)

Back to Top