The recent proliferation of digital images captured by digital cameras and, as a result, the users’ needs for automatic
annotation tools to index huge multimedia databases arouse a renewed interest in face detection and recognition technologies. After a brief state-of-the-art, the paper details a model-based face detection algorithm for color images, based on skin color and face shape properties. We compare a stand-alone model-based approach with a hybrid approach in which this algorithm is used as a pre-processor to provide candidate faces to a supervised SVM classifier.
Experimental results are presented and discussed on two databases of 250 and 689 pictures respectively. Application to a system to automatically annotate the photos of a personal collection is eventually discussed from the human factors point of view.
This paper presents an object detection framework applied to cinematographic post-processing of video sequences. Post-processing is done after production and before editing. At the beginning of each shot of a video, a slate (also called clapperboard) is shown. The slate contains notably an electronic audio timecode that is necessary
for audio-visual synchronization. This paper presents an object detection framework to detect slates in video sequences for automatic indexing and post-processing. It is based on five steps. The first two steps aim to reduce drastically the video data to be analyzed. They ensure high recall rate but have low precision. The first step detects images at the beginning of a shot possibly showing up a slate while the second step searches in these images for candidates regions with color distribution similar to slates. The objective is to not miss any slate while eliminating long parts of video without slate appearance. The third and fourth steps are statistical classification and pattern matching to detected and precisely locate slates in candidate regions. These steps ensure high recall rate and high precision. The objective is to detect slates with very little false alarms to minimize interactive corrections. In a last step, electronic
timecodes are read from slates to automize audio-visual synchronization. The presented slate detector has a recall rate of 89% and a precision of 97,5%. By temporal integration, much more than 89% of shots in dailies are detected. By timecode coherence analysis, the precision can be raised too. Issues for future work are to accelerate the system to be faster than real-time and to extend the framework for several slate types.
In this paper we propose a photo browsing system that uses image classification results in an error tolerant manner. Images are hierarchically classified into indoor/outdoor and further into city/landscape. We employ simple classifiers based on global color histogram, wavelet subband energies and contour directions having medium recall rates around 85%. This paper delivers two contributions to cope with classification errors in the context of image browsing. The first contribution is a method to associate confidence measures to classification results. A second contribution is a browsing tool that does not reveal classification results to the user. Instead, browsing options are generated. These browsing options are thumbnails representing semantic topics such as indoor and outdoor. User studies showed that thumbnails and semantic topics are highly demanded features for a photo-browsing tool. The thumbnails are representative images from the database with high confidence values. The thumbnails are chosen context-based such that they have class labels in common with currently displayed images or usage history.
The estimation of the 2D shape of moving objects in a video image sequence is required for many applications, e.g. for so- called content-based functionalities of ISO/MPEG-4, for object-based coding, and for automatic surveillance. Many real sequences are taken by a moving camera and show moving objects as well as their cast shadows. In this paper, an algorithm for 2D shape estimation for moving objects is presented that considers for first time explicitly both, a moving camera and moving cast shadows. The algorithm consists of five steps: Estimation and compensation of possibly apparent camera motion, detection of possibly apparent scene cuts, generation of a binary mask by detection of temporal signal changes after camera motion compensation, elimination of mask regions corresponding to moving cast shadows and uncovered background, and finally, adaptation of the mask to luminance edges of the current frame. For identification of moving cast shadows, three criteria evaluate static background edges, uniform change of illumination, and shadow penumbra. The proposed algorithm yields accurate segmentation results for sequences taken by a static or moving camera, in absence and in presence of moving cast shadows. Parts of this algorithm have been accepted for the informative part of the description of the forthcoming international standard ISO/MPEG-4.