With the advent and proliferation of low cost and high performance digital video recorder devices, an increasing
number of personal home video clips are recorded and stored by the consumers. Compared to image data, video
data is lager in size and richer in multimedia content. Efficient access to video content is expected to be more
challenging than image mining. Previously, we have developed a content-based image retrieval system and the
benchmarking framework for personal images. In this paper, we extend our personal image retrieval system to
include personal home video clips.
A possible initial solution to video mining is to represent video clips by a set of key frames extracted from
them thus converting the problem into an image search one. Here we report that a careful selection of key
frames may improve the retrieval accuracy. However, because video also has temporal dimension, its key frame
representation is inherently limited. The use of temporal information can give us better representation for video
content at semantic object and concept levels than image-only based representation.
In this paper we propose a bottom-up framework to combine interest point tracking, image segmentation and
motion-shape factorization to decompose the video into spatiotemporal regions. We show an example application
of activity concept detection using the trajectories extracted from the spatio-temporal regions. The proposed
approach shows good potential for concise representation and indexing of objects and their motion in real-life
It is now common to have accumulated tens of thousands of personal ictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to Intel processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors. A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems.<sup>1</sup> For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples. The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in<sup>2</sup>:
- data sets
- query tasks
- ground truth
- evaluation measures
- benchmarking events.
This paper proposed a way to build these components to be representative of personal image databases, and of the corresponding usage models.