Paper
27 November 2019 RGB-D dynamic facial dataset capture for visual speech recognition
Naveed Ahmed
Author Affiliations +
Proceedings Volume 11321, 2019 International Conference on Image and Video Processing, and Artificial Intelligence; 1132108 (2019) https://doi.org/10.1117/12.2538762
Event: The Second International Conference on Image, Video Processing and Artifical Intelligence, 2019, Shanghai, China
Abstract
We present a new comprehensive RGB-D dynamic facial dataset capturing system that can be used for facial recognition, emotion recognition, or visual speech processing. Our facial dataset uses an RGB-D (Kinect) camera to record 20 individuals saying 20 common English words or phrases. Using Kinect facial tracking, we not only record the facial features, but also facial outline, RGB data, depth data, mapping between RGB and depth data, facial animation units, facial shape units, and finally 2D and 3D face representations of the face along with the 3D head orientation. The captured RGBD dynamic facial dataset can be employed in several applications. We demonstrate its effectiveness by presenting a new visual speech recognition that employs three-dimensional spatial and temporal data of different facial feature points. The results demonstrate the our RGB-D dynamic facial dataset can be effectively employed in a visual speech recognition system.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Naveed Ahmed "RGB-D dynamic facial dataset capture for visual speech recognition", Proc. SPIE 11321, 2019 International Conference on Image and Video Processing, and Artificial Intelligence, 1132108 (27 November 2019); https://doi.org/10.1117/12.2538762
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
RGB color model

Speech recognition

Visualization

Facial recognition systems

Data acquisition

Image segmentation

RELATED CONTENT

MPEG-7 audio-visual indexing test-bed for video retrieval
Proceedings of SPIE (December 15 2003)
Scene-based scalable video summarization
Proceedings of SPIE (November 26 2003)
Recognizing persons in images by learning from videos
Proceedings of SPIE (January 29 2007)

Back to Top