Retrieval in current multimedia databases is usually limited to browsing and searching based on low-level visual features and explicit textual descriptors. Semantic aspects of visual information are mainly described in full text attributes or mapped onto specialized, application specific description schemes. Result lists of queries are commonly represented by textual descriptions and single key frames. This approach is valid for text documents and images, but is often insufficient to represent video content in a meaningful way. In this paper we present a multimedia retrieval framework focusing on video objects, which fully relies on the MPEG-7 standard as information base. It provides a content-based retrieval interface which uses hierarchical content-based video summaries to allow for quick viewing and browsing through search results even on bandwidth limited Web applications. Additionally semantic meaning about video content can be annotated based on domain specific ontologies, enabling a more targeted search for content. Our experiences and results with these techniques will be discussed in this paper.
This paper discusses the application of the Active Shape Model as a tool to compute the orientation of a specimen that has been subjected to CLSM imaging. The images have captured patterns of gene expression and these will have to be submitted to and stored in a database. Having all of these patterns available will enable spatio-temporal mining of image data and other linked data. The applications discussed in this paper focus on the zebrafish model system. The gene expression database employs a 3D digital atlas as a reference system in combination with an Active Shape Model. The Active Shape Model is applied with a few anatomical structures and will be scaled up to be usable with more structures in the future.
The latest research projects in the laboratory LIGIV concerns capture,
processing, archiving and display of color images considering the
trichromatic nature of the Human Vision System (HSV). Among these
projects one addresses digital cinematographic film sequences of high
resolution and dynamic range. This project aims to optimize the use of
content for the post-production operators and for the end user. The
studies presented in this paper address the use of metadata to
optimise the consumption of video content on a device of user's choice
independent of the nature of the equipment that captured the
content. Optimising consumption includes enhancing the quality of
image reconstruction on a display. Another part of this project
addresses the content-based adaptation of image display. Main focus is on Regions of Interest (ROI) operations, based on the ROI concepts of MPEG-7. The aim of this second part is to characterize and ensure the conditions of display even if display device or display media changes. This requires firstly the definition of a reference color space and the definition of bi-directional color transformations for each peripheral device (camera, display, film recorder, etc.). The complicating factor is that different devices have different color gamuts, depending on the chromaticity of their primaries and the ambient illumination under which they are viewed. To match the displayed image to the aimed appearance, all kind of production metadata (camera specification, camera colour primaries, lighting conditions) should be associated to the film material. Metadata and content build together rich content. The author is assumed to specify conditions as known from digital graphics arts. To control image pre-processing and image post-processing, these specifications should be contained in the film's metadata. The specifications are related to the ICC profiles but need additionally consider mesopic viewing conditions.
Graph based approaches succeed in producing meaningful regions in images when they are to be stored as entities in a database for content-based retrieval. Despite controlling various parameters, the bottom-up approach produces too many segments for an Internet search retrieval scheme. The top-down scheme can be adjusted for wide area searches, but has a high computational cost. In this work we combine the two approaches and retain the advantages of both approaches. The key idea is to use local approach for reducing the size of the problem that is fed to the normalized cut approach. Our algorithm runs in O(n log n) time.
In this paper we present a formalism for query rewriting in the presence of data types with multiple representation, such as images. We show that the formalism is consistent and that it allows to derive rewriting rules, and we argue that the algebraic level at which it is express is the appropriate level for image access systems distributed over the internet, in which the internal details of the individual data repositories are not accessible from outside the local environment.
Peer-to-peer (P2P) networks are overlay networks that connect independent computers (also called nodes or peers). In contrast to client/server solutions, all nodes offer and request services from other peers in a P2P network. P2P networks are very attractive in that they harness the computing power of many common desktop machines and necessitate little administrative overhead. While the resulting computing power is impressive, efficiently looking up data still is the major challenge in P2P networks. Current work comprises fast lookup of one-dimensional values (Distributed Hash Tables, DHT) and retrieval of texts using few keywords. However, the lookup of multimedia data in P2P networks is still attacked by very few groups. In this paper, we present experiments with efficient Content Based Image Retrieval in a P2P environment, thus a P2P-CBIR system. The challenge in such systems is to limit the number of messages sent, and to maximize the usefulness of each peer contacted in the query process. We achieve this by distributing peer data summaries over the network. Obviously, the data summaries have to be compact in order to limit the communication overhead. We propose an CBIR scheme based on a compact peer data summary. This peer data summary relies on cluster frequencies. To obtain the compact representation of a peer's collection, a global clustering of the data is efficiently calculated in a distributed manner. After that, each peer publishes how many of its images fall into each cluster. These cluster frequencies are then used by the querying peer to contact only those peers that have the largest number of images present in one cluster given by the query. In our paper we further detail the various challenges that have to be met by the designers of such a P2P-CBIR, and we present experiments with varying degree of data replication (duplicates of images), as well as quality of clustering within the network.
This paper describes how the web standards Synchronized Multimedia Integration Language (SMIL) and Scalable Vector Graphics (SVG) are used in teaching at the Vienna University of Technology. SMIL and SVG are used in courses on multimedia authoring. Didactically, the goal is to teach students how to use media objects and timing concepts to build interactive media applications. Additionally, SMIL is applied to generate multimedia content from a database using a content management system. The paper gives background information on the SMIL and SVG standards and sketches how teaching multimedia is organized at the Vienna University of Technology. Courses from the summer term 2003 are described and illustrated in two case studies. General design problems of SMIL-based presentations are modelled as patterns. Additionally, suggestions for improvement in the standards are given and shortcomings of existing user agents are summarized. Our conclusion is that SMIL and SVG are very well suited for teaching multimedia. Currently, the main problem is that all existing SMIL players lack some properties desired for teaching applications (stability, correctness, etc.).
Web-based Virtual Tour has become a desirable and demanded application, yet challenging due to the nature of web application's running environment such as limited bandwidth and no guarantee of high computation power on the client side. Image-based rendering approach has attractive advantages over traditional 3D rendering approach in such Web Applications. Traditional approach, such as VRML, requires labor-intensive 3D modeling process, high bandwidth and computation power especially for photo-realistic virtual scenes. QuickTime VR and IPIX as examples of image-based approach, use panoramic photos and the virtual scenes that can be generated from photos directly skipping the modeling process. But, these image-based approaches may require special cameras or effort to take panoramic views and provide only one fixed-point look-around and zooming in-out rather than 'walk around', that is a very important feature to provide immersive experience to virtual tourists. The Web-based Virtual Tour using Tour into the Picture employs pseudo 3D geometry with image-based rendering approach to provide viewers with immersive experience of walking around the virtual space with several snap shots of conventional photos.
In this paper a multi-user motion capture system is presented, where users work from separate locations and interact in a common virtual environment. The system functions well on low-end personal computers; it implements a natural human/machine interaction due to the complete absence of markers and weak constraints on users' clothes and environment lighting. It is suitable for every-day use, where the great precision reached by complex commercial systems is not the principal requisite.
My contribute should be a discussion on visual applications and how they can give an empirical and practical
support to who is studying social phenomena.
In these case-studies, the interest is to analize how the people are aware and react to the urban physical and social changes.
- The first aim of using images is to visualize the space and human behaviours,and suggest a sociological
approach of analysis. I'm referring to static visual data in double way: traditional front-pictures, with all
the personal information they can give and contain about the author, and the more neutral 360°pictures.
- The second aim is using video data to catch people opinions, political positions and other personal
characteristics involved in urban changes. At this phase of the research, sometimes video shooting is not manifested,and this for many reasons,of which the main one is the spontaneity of the people interviewed. For this reason is also possible to take some audio data and pictures and, in a second time, edit a video document
joining these two different experiences together.
- the conclusive, and third aim, is finding the appropriate method to pubblic on-line all multimedia data with the approprite considerations.
The discussion is mainly focused on the methods and the technics, needed to integrate the visual data of a sociological study in a didattical and formative approch.
I'm going to expose a few thesis works reflecting this direction.
We present methods and systems for authoring by linking---generating multimedia documents by creating richly typed links between component media assets. As an example we describe our Sticky Video functionality in the MEERCAT system for linking photo and video media. We do so within the framework of a hierarchy of possible link-based authoring systems from manual to programmatic link creation and document authoring. We discuss mechanisms for accurately situating links between rich media components and flexibly typing those links to allow both better human information browsing and searching and automatic authoring. We describe issues in realtime distributed authoring and the use of metadata channels. In particular we present the concept of authoring by meeting, the automatic creation of multimedia documents from business meetings.
Color image quality depends on many factors, such as the initial capture system and its color image processing, compression, transmission, the output device, media and associated viewing conditions. In this paper, we are primarily concerned with color image quality in relation to compression and transmission. We review the typical visual artifacts that occur due to high compression ratios and/or transmission errors. We discuss color image quality metrics and present no-reference artifact metrics for blockiness, blurriness, and colorfulness. We show that these metrics are highly correlated with experimental data collected through subjective experiments. We use them for no-reference video quality assessment in different compression and transmission scenarios and again obtain very good results. We conclude by discussing the important effects viewing conditions can have on image quality.
We propose a new image feature that merges color and shape information. This global feature, which we call color shape context, is a histogram that combines the spatial (shape) and color information of the image in one compact representation. This histogram codes the locality of color transitions in an image. Illumination invariant derivatives are first computed and provide the edges of the image, which is the shape information of our feature. These edges are used to obtain similarity (rigid) invariant shape descriptors. The color transitions that take place on the edges are coded in an illumination invariant way and are used as the color information. The color and shape information are combined in one multidimensional vector. The matching function of this feature is a metric and allows for existing indexing methods such as R-trees to be used for fast and efficient retrieval.
The automatic prediction of perceived quality from image data in
general, and the assessment of particular image characteristics or
attributes that may need improvement in particular, becomes an
increasingly important part of intelligent imaging systems. The
purpose of this paper is to propose to the color imaging community in
general to develop a software package available on internet to help
the user to select among all these approaches which is better
appropriated to a given application. The ultimate goal of this project
is to propose, next to implement, an open and unified color imaging
system to set up a favourable context for the evaluation and analysis
of color imaging processes. Many different methods for measuring the performance of a process have been proposed by different researchers. In this paper, we will discuss the advantages and shortcomings of most of main analysis criteria and performance measures currently used. The aim is not to establish a harsh competition between algorithms or processes, but rather to test and compare the efficiency of methodologies firstly to highlight strengths and weaknesses of a given algorithm or methodology on a given image type and secondly to have these results publicly available. This paper is focused on two important unsolved problems. Why it is so difficult to select a color space which gives better results than another one? Why it is so difficult to select an image quality metric which gives better results than another one, with respect to the judgment of the Human Visual System? Several methods used either in color imaging or in image quality will be thus discussed. Proposals for content-based image measures and means of developing a standard test suite for will be then presented. The above reference advocates for an evaluation protocol based on an automated procedure. This is the ultimate goal of our proposal.
In this paper we describe a new system for storing annotated images in a large database and querying by means of a dynamical retrieval of images through use of metadata. It is based on a three-tier architecture suitable for building a common gateway for accessing heterogeneous data. Based on XML schema of documents, the extraction of metadata is used for successive querying. We give an example on a database of astronomical and geographical images, but the method is quite general and can be applied to more general case of large heterogeneous databases.
We propose a search & retrieval (S&R) tool, which supports the combination of a text search with content-based search for video and image content. This S&R system allows the formulation of complex queries allowing the arbitrary combination of content-based and text-based query elements with logical operators. The system will be implemented as a client/server system. The entire S&R system is designed in such a way that the client system can be either a web application accessing the server over the Internet or a native client with local access to the server. The S&R tool is embedded into a system called MECiTV - Media Collaboration for iTV. Within MECiTV a complete authoring environment for iTV content will be developed. The proposed S&R tool will enable iTV authors and content producers to efficiently search for already existing material in order to reduce costs for iTV productions.
This paper describes a method for creating agents for locating images of specific categories such as sky, vegetation, fire, and smoke. The method uses only color information, and is based on vector quantization to build a category specific codebook from a set of training images. The method is shown to yield categorization of images collected from several web sites on Internet with a high success rate. The method can be used as an aid to image annotation or as a way to filter images in a content-based image retrieval system.
This paper provides an overview of Project RESCUE, which aims to enhance the mitigation capabilities of first responders in the event of a crisis by dramatically transforming their ability to collect, store, analyze, interpret, share and disseminate data. The multidisciplinary research agenda incorporates a variety of information technologies: networks; distributed systems; databases; image and video processing; and machine learning, together with subjective information obtained through social science. While the IT challenges focus on systems and algorithms to get the right information to the right person at the right time, social science provides the right context. Besides providing an overview of the nature of RESCUE research activities the paper highlights challenges of particular interest to the internet imaging community.
We present a system for the broadcast of hockey games over the internet. The system allows users to experience the hockey game while it is in progress. Our system uses generic content description servers that acquire information from an external source, process it, and serve the processed data to client systems. Dynamic configuration of the servers allows us to use them in a variety of roles. For example, video information servers, like an MPEG-7 camera, produce
XML documents that describe the motion of objects in the scene in addition to unprocessed video. Unlike an MPEG-7 camera, our video information servers interact with client systems, and can change their behavior through dynamic configuration. In an alternate configuration, a content description server acts as a game server in our hockey broadcast system. The game server forms an environment model that encapsulates the state of the hockey game and serves
data from the model to clients. We developed and tested our system using a 1/32-scale model of a hockey rink. Early results using data acquired at a real rink indicate that the system performs as expected.
Computer systems that have the capability of analyzing complex and dynamic scenes play an essential role in video annotation. Scenes can be complex in such a way that there are many cluttered objects with different colors, shapes and sizes, and can be dynamic with multiple interacting moving objects and a constantly changing background. In reality, there are many scenes that are complex, dynamic, and challenging enough for computers to describe. These scenes include games of sports, air traffic, car traffic, street intersections, and cloud transformations. Our research is about the challenge of inventing a descriptive computer system that analyzes scenes of hockey games where multiple moving players interact with each other on a constantly moving background due to camera motions. Ultimately, such a computer system should be able to acquire reliable data by extracting the players’ motion as their trajectories, querying them by analyzing the descriptive information of data, and predict the motions of some hockey players based on the result of the query. Among these three major aspects of the system, we primarily focus on visual information of the scenes, that is, how to automatically acquire motion trajectories of hockey players from video. More accurately, we automatically analyze the hockey scenes by estimating parameters (i.e., pan, tilt, and zoom) of the broadcast cameras, tracking hockey players in those scenes, and constructing a visual description of the data by displaying trajectories of those players. Many technical problems in vision such as fast and unpredictable players' motions and rapid camera motions make our challenge worth tackling. To the best of our knowledge, there have not been any automatic video annotation systems for hockey developed in the past. Although there are many obstacles to overcome, our efforts and accomplishments would hopefully establish the infrastructure of the automatic hockey annotation system and become a milestone for research in automatic video annotation in this domain.
We propose a robust video segmentation algorithm for video summary. Exact shot boundary detection and segmentation of video into meaningful scenes are important parts for the automatic video summary. In this paper, we present a shot boundary detection using audio and visual features defined in the MPEG-7 which provides software standard for multimedia description. By using Hidden Markov Model classifier based on statistics of the audio and visual features, exact shot boundary is detected and further over-segmentation could be reduced, which is a common problem in automatic video segmentation.
In theory, sensor systems scale well. If more data is required for a particular application, install another sensor. Problems begin to arise when the resulting enormous assemblage of loosely-coupled data must be perceived as information by a human being. Even worse, that human might wish to interact with some or all parts of the sensor system. More difficult still: there may be multiple human operators, all with different priorities, each desiring to perceive or control some or all of the system. Since it is possible to scale systems to the point at which there are far more sensors than users, we must assume that, without care, operator and client hardware overload is a near certainty. We must couple user requirements and architecture design at the outset or we face the probability of creating a system of maximum cost with minimal chance of fulfilling its users' requirements. The results of this work are encouraging: the largest installation of the LiveWave FirstView system is currently encoding and streaming in excess of 4000 frames per second.
In this contribution we present an interface for image processing algorithms that has been made recently available on the Internet (http://nibbler.uni-koblenz.de). First, we show its usefulness compared to some other existing products. After a description of its architecture, its main features are then presented: the particularity of the user management, its image database, its interface, and its original quarantine system. We finally present the result of an evaluation performed by students in image processing.
Many difficulties of color image processing may be resolved using specific color spaces. The problematic when discussing about image database is the same: in which color space a method will be the most effective. We present classical color spaces, and a tool able to represent images in these spaces in order to analyze which color space is the most relevant on the studied images. Secondly we will introduce hybrid color spaces. The basic idea of hybrid color spaces is to combine several color components from different color spaces in order to increase the effectiveness of color components to discriminate color data, and to reduce correlation rate between color components. Generally computed from an unique image we propose an extension of hybrid computation to generate Hybrid color space from image database. The main idea is to use a set of images as a unique image, and to realize statistical computation on this “virtual” image. Finally, we will present a system able to manage hybrid color space generation on images set, using Icobra and ColorSpace tools.
Stereo images provide an enhanced sense of presence, and have been found to be operationally useful in tasks requiring remote manipulation or judgment of spatial relationships in contrast to ordinary image. A conventional stereo system with a single left-right pair needs twice the raw data as a monoscopic imaging system. As a result there have been increasing attention given to image compression methods. As an important part of the stereo pair coding, disparity estimation influences the precision and efficiency of the coding system. The traditional disparity estimation methods for stereo pair coding are mostly fixed-size block matching (FSBM). But the disparity vectors estimated by this method are not very accurate. In order to find more accurate disparity vector, adaptive-size block matching (ASBM) algorithm was used in some stereo matching algorithms. And this kind of algorithms selected an appropriate window based on the content of image that improves the verity of estimation. But the primary problem of it is computational complexity that prevents its applying in stereo coding. In this paper, a novel hybrid block matching (HBM) disparity estimation algorithm is proposed. And on the basis of it, a complete stereo coding scheme is introduced. In this scheme, conventional ASBM is improved and integrated with FSBM. Improved ASBM of this algorithm only uses the predicted error of the intensity to control the size of the matching window, which results in a reduction in complexity in contrast to traditional ASBM algorithms. We provide experimental results that show that our HBM achieves more accurate disparity vectors as compared to a simple FSBM and reduces the complexity of the traditional ASBM. Results also demonstrate that the proposed coding scheme provides higher mean peak signal-to-noise ratio (PSNR), about 0.7-1.2 dB, as compared with fix-size blockwise coding algorithm.
In this paper, a novel interactive voice response (IVR) system is proposed, which is apparently different from the traditional. Using software operation and network control, the IVR system is presented which only depends on software in the server in which the system lies and the hardware in network terminals on user side, such as gateway (GW), personal gateway (PG), PC and so on. The system transmits the audio using real time protocol (RTP) protocol via internet to the network terminals and controls flow using finite state machine (FSM) stimulated by H.245 massages sent from user side and the system control factors. Being compared with other existing schemes, this IVR system results in several advantages, such as greatly saving the system cost, fully utilizing the existing network resources and enhancing the flexibility. The system is capable to be put in any service server anywhere in the Internet and even fits for the wireless applications based on packet switched communication. The IVR system has been put into reality and passed the system test.
During the last few years, image by content retrieval is the aim of
many studies. A lot of systems were introduced in order to achieve image indexation. One of the most common method is to compute a segmentation and to extract different parameters from regions. However, this segmentation step is based on low level knowledge, without taking into account simple perceptual aspects of images, like the blur. When a photographer decides to focus only on some objects in a scene, he certainly considers very differently these objects from the rest of the scene. It does not represent the same amount of information. The blurry regions may generally be considered as the context and not as the information container by image retrieval tools. Our idea is then to focus the comparison between images by restricting our study only on the non blurry regions, using then these meta data. Our aim is to introduce different features and a machine learning approach in order to reach blur identification in scene images.
In this paper we present the SIRBeC web site, designed and implemented by CNR - ITC for the Cultural Department of Lombardy in northern Italy. This site allows the consultation, through texts and images, of the cultural heritage present in the Region. The main characteristics of the SIRBeC system are shown, with particular attention for the procedure integrating geographic interrogation of georeferential data with a standard textual query environment.
As part of the Learning Medical Imaging Knowledge project, we are developing a knowledge-based, machine learning and knowledge acquisition framework for systematic feature extraction and recognition of a range of lung diseases from High Resolution Computed Tomography (HRCT) images. This framework allows radiologists to remotely diagnose and share expert knowledge about lung HRCT interpretation, which is then used to develop a Computer Aided Diagnosis (CAD) system for lung disease. In this paper, we describe the knowledge acquisition system LMIK, which is Internet-based and platform-independent. The LMIK utilises the Internet to provide users with secure access to patient and research data and facilitates communication among highly qualified radiologists and researchers. It is currently used by five radiologists and over 20 researchers and has proved to be an invaluable research tool. Research is underway to develop computer algorithms for automatic diagnosis of lung diseases. In future, these algorithms will be integrated into LMIK to equip it with CAD capabilities to improve diagnostic accuracy of radiologists and extend availability of expert clinical knowledge to wider communities.
This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.
The paper describes an innovative image annotation tool for classifying image regions in one of seven classes - sky, skin, vegetation, snow, water, ground, and buildings - or as unknown. This tool could be productively applied in the management of large image and video databases where a considerable volume of images/frames there must be automatically indexed. The annotation is performed by a classification system based on a multi-class Support Vector Machine. Experimental results on a test set of 200 images are reported and discussed.
We present new interaction and visualization paradigms relying on free form surfaces for studying and exploring human anatomy. We propose an interface for building three-dimensional anatomical scenes incorporating 3D anatomical organ models, freely orientated slices and free form surfaces extracted from the Visible Human dataset. Compared with planar slices, free form surfaces allow to follow curved anatomic structures such as the aorta tree or the pelvis. In the present paper, we describe in detail 3D interaction techniques for creating free form surfaces. The interactive placement of surface boundary curves relies on the combination of an interactive slice navigator and of a 3D visualization interface integrated within a single java applet. Surface boundary curve control points are placed with the mouse at the desired locations within the selected slices. The corresponding boundary curves are displayed in the 3D visualization interface as fat 3D cubic spline curves, which provide immediate feedback. Boundary curves may be easily duplicated, translated and modified. The specified boundary curves are interpolated by Coons patches, yielding a perfectly smooth surface. That surface may be visualized in combination with semi-transparent organ models. It may also be flattened and shown in a separate window. The presented application is available online as a Java applet (http://visiblehuman.epfl.ch).