With the advent and proliferation of low cost and high performance digital video recorder devices, an increasing
number of personal home video clips are recorded and stored by the consumers. Compared to image data, video
data is lager in size and richer in multimedia content. Efficient access to video content is expected to be more
challenging than image mining. Previously, we have developed a content-based image retrieval system and the
benchmarking framework for personal images. In this paper, we extend our personal image retrieval system to
include personal home video clips.
A possible initial solution to video mining is to represent video clips by a set of key frames extracted from
them thus converting the problem into an image search one. Here we report that a careful selection of key
frames may improve the retrieval accuracy. However, because video also has temporal dimension, its key frame
representation is inherently limited. The use of temporal information can give us better representation for video
content at semantic object and concept levels than image-only based representation.
In this paper we propose a bottom-up framework to combine interest point tracking, image segmentation and
motion-shape factorization to decompose the video into spatiotemporal regions. We show an example application
of activity concept detection using the trajectories extracted from the spatio-temporal regions. The proposed
approach shows good potential for concise representation and indexing of objects and their motion in real-life
It is now common to have accumulated tens of thousands of personal ictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to Intel processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors. A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems.<sup>1</sup> For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples. The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in<sup>2</sup>:
- data sets
- query tasks
- ground truth
- evaluation measures
- benchmarking events.
This paper proposed a way to build these components to be representative of personal image databases, and of the corresponding usage models.
Using video analysis for detecting hazardous events such as fire/smoke activity, impending threats, or suspicious behaviors has spurred new research for security concerns. To make such detection reliable, researchers must overcome difficulties such as classification by the importance of consequences, imbalances of positive and negative data, environmental factors, and variation in camera capabilities. This paper puts forward a general framework for hazardous event detection which includes spatial-temporal feature extraction, statistical-based classification for biased data and calibration for environmental change. At the current stage of development, the framework can work effectively for detecting hazardous events like fire/smoke from video sequences.
How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client; different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employs a hybrid approach by integrating a query-based (database) mechanism with content- based retrieval (CBR) functions; its specific language (CAROL/ST with CBR) supports spatio-temporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the `histories' of various clients' query activities; such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.
Content-based video retrieval system is one of the important design issues of multimedia, mainly depending on its visual and spatio-temporal characteristic. But until now, well- defined model for video retrieval is still at rudimentary stage. We propose a unified video retrieval model to simulate human perception. Given an arbitrary video, considering all the factors existing in human vision perception, we can find similar ones from large video repository within time limitation. This kind of measurement simulates the rules in human being's judgement, so it can be close to the real need. Furthermore, integrating with feedback, the results can be adjusted according to user's preference. This learning strategy can emphasize the aspect user cares about, and embodies int in the next iteration of similarity computing. In this way, retrieval results can be optimized greatly.