This PDF file contains the front matter associated with SPIE-IS&T Proceedings Volume 7255, including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
We present a technique for genre-independent scene-change detection using audio and video features in a discriminative
support vector machine (SVM) framework. This work builds on our previous work by adding a
video feature based on the MPEG-7
"scalable color" descriptor. Adding this feature improves our detection rate over all genres by 5% to 15% for a fixed false positive rate of 10%. We also find that the genres that benefit the
most are those with which the previous audio-only was least effective.
The major drawback of interactive retrieval systems is the potential frustration of the user that is caused by an
excessive labelling work. Active learning has proven to help solving this issue, by carefully selecting the
examples to present to the user. In this context, the design of the user interface plays a critical role since it
should invite the user to label the examples elected by the active learning.
This paper presents the design and evaluation of an innovative user interface for image retrieval. It has been
validate using real-life IEEE PETS video surveillance data.
In particular, we investigated the most appropriate repartition of the display area between the retrieved video
frames and the active learning examples, taking both objective and subjective user satisfaction parameters into
The flexibility of the interface relies on a scalable representation of the video content such as Motion JPEG
2000 in our implementation.
In this paper, we are presenting an object-level video editing tool that provides automatic object removal and video summarizing capabilities based on a user selected object. The tool has three main modules; object selection, object detection and background completion. In the object selection module, user selects the required object to be removed or used as reference for video summarization. The object contour is calculated using Livewire algorithm. The object detection module uses a correlation technique to detect the object in all frames. In the background completion module, the background is filled using a novel and efficient algorithm that combines the advantages of texture synthesis algorithms and inpainting techniques. A detailed description of the tool is presented in this paper along with a variety of experimental results.
In many areas of commerce, government, academia, and medicine, large collections of digital images are being used. Usually, the only way of searching these collections is by their name, or by browsing which is unpractical for large number of images. ImageSeeker aims at providing an improved technique to enhance image searching. It focuses on extracting visual contents from images and annotating them; it is based on the concept of CBIR (Content Based Image Retrieval) to retrieve the contents of the image based on what it has learnt during past trainings. When a user requests an image for a certain object, all images containing the same object will show up. ImageSeeker maintains high accuracy in finding matching results to the user's query. The system was tested on images containing natural scenes, specifically, see, sand, grass, clouds and sky.
This paper presents a novel algorithm for extracting regions of interest (ROIs) from images in an unsupervised way. It relies on the information provided by two computational models of bottom-up visual attention, encoded in the form of the image's salient points-of-attention (POAs) and areas-of-attention (AOAs). The proposed method combines these POAs and AOAs to generate binary masks that correspond to the ROIs within the image. First, each AOA is binarized through an adapted relaxation algorithm where the histogram entropy of the AOA measurement is the stop criterion of the iterative process. The AOAs are also smoothed with a Gaussian pyramid followed by interpolation. Next, the binary representation of the AOAs, the smoothed version of the AOAs, and the POAs are converted in a mask that covers the salient ROIs of the image. The proposed ROI extraction algorithm does not impose any constraints on the number or distribution of salient regions in the input image. Qualitative and quantitative results show that the proposed method performs very well in a wide range of images, whether natural or man-made, from simple images of objects against a homogeneous background to complex cluttered scenes.
Research on stereoscopic image and video processing has been the new trend in the recent years. Measurement of visual quality is of fundamental importance for numerous stereoscopic image and video processing applications. The goal of quality assessment is to automatically assess the quality of images or videos in agreement with human quality judgments and optimize the stereoscopic image or video systems. Unfortunately, human are lack of the knowledge for the perception quality of stereoscopic images. In this paper, we present experimental results of an extensive subjective quality assessment experiment for stereoscopic images in which a total 400 distorted stereoscopic images were evaluated by about twenty human subjects. The stereoscopic images quality data obtained from 8,000 individual human quality judgments is used to build a database that can be exploited for understanding perception of stereoscopic images and provide data for objective assessment metrics designing. The experimental results indicated that the quality perception of the distorted stereoscopic images is content and distortion types dependent.
In the framework of multimedia applications image quality may have different meanings and interpretations. In this paper, considering the quality of an image as the degree of adequacy to its function/goal within a specific application field, we provide an organized overview of image quality assessment methods by putting in evidence their applicability and limitations in different application domains. Three scenarios have been chosen representing three typical applications with different degree of constraints in their image workflow chains and requiring different image quality assessment methodologies.
The success of the bag-of-words approach for text has inspired the recent use of analogous strategies for global representation of images with local visual features.
Many applications have been proposed for object detection, image annotation, queries-by-example, relevance feedback, automatic annotation, and clustering.
In this paper, we investigate the validity of the bag-of-words analogy for image representation and, more specifically,
local pattern selection for feature generation.
We propose a generalized document representation framework and apply it to the evaluation of two pattern selection strategies for images: dense sampling and point-of-interest detection.
We present empirical results that support our contention that text-based experimentation can provide useful insights into the effectiveness of image representations based on the bag-of-visual-words technique.
This paper proposes an alternative to formal annotation for the representation of semantics, and presents an
extension to it capable of handling multimedia (text and images) documents. The article argues that meaning
is not a property of a document, but an outcome of a contextualized and situated process of interpretation. The
consequence of this position is that one should not quite try to represent the meaning of a document (the way
formal annotation does), but the context of the activity of which search is part.
We present some general considerations on the representation and use of the context, and a simple example
of a technique to encode the context represented by the documents collected in the computer in which one
is working, and to use them to direct search. We show preliminary results showing that even this rather
simpleminded context representation can lead to considerable improvements with respect to commercial search
engines both for text and images.
Although traditional content-based retrieval systems have been successfully employed in many multimedia applications, the need for explicit association of higher concepts to images has been a pressing demand from users. Many research works have been conducted focusing on the reduction of the semantic gap between visual features and the semantics of the image content. In this paper we present a mechanism that combines broad high level concepts and low level visual features within the framework of the QuickLook content-based image retrieval system. This system also implements a relevance feedback algorithm to learn users' intended query from positive and negative image examples. With the relevance feedback mechanism, the retrieval process can be efficiently guided toward the semantic or pictorial contents of the images by providing the system with the suitable examples. The qualitative experiments performed on a database of more than 46,000 photos downloaded from the Web show that the combination of semantic and low level features coupled with a relevance feedback algorithm, effectively improve the accuracy of the image retrieval sessions.
In the context of the European Cantata project (ITEA project, 2006-2009), within Barco, a complete Multi-Content Analysis framework was developed for detection and analysis of compound images. The framework consists of: a dataset, a Multi-Content Analysis (MCA) algorithm based on learning approaches, a Ground Truth, an evaluation module based on metrics and a presentation module. The aim of the MCA methodology presented here is to classify image content of computer screenshots into different categories such as: text; Graphical User Interface; Medical images and other complex images. The AdaBoost meta-algorithm was chosen, implemented and optimized for the classification method as it fitted the constraints (real-time and precision). A large dataset separated in training and testing subsets and their ground truth (with ViPER metadata format) was both collected and generated for the four different categories. The outcome of the MCA is a cascade of strong classifiers trained and tested on the different subsets. The obtained framework and its optimization (binary search, pre-computing of the features, pre-sorting) allow to re-train the classifiers as much as needed. The preliminary results are quite encouraging with a low false positive rate and close true positive rate in comparison with expectations. The re-injection of false negative examples from new testing subsets in the training phase resulted in better performances of the MCA.
Volume visualization with random data access poses significant challenges. While tiling techniques lead to simple implementations, they are not well suited for cases where the goal is to access arbitrarily located subdimensional datasets (e.g., displaying a 'band' of several parallel lines of arbitrary orientation from a 2D image; being able to display an arbitrary 2D planar 'cut' from a 3D volume). Significant effort has been devoted to volumetric data compression, with most techniques proposing to tile volumes into cuboid subvolumes to enable random access. In this paper we show that, in cases where subdimensional
datasets are accessed, this leads to significant transmission inefficiency. As an alternative, we propose novel
server-client based data representation and retrieval methods which can be used for fast random access of low dimensional data from high dimensional datasets. In this paper, 2D experiments are shown but the approach can be extended to higher dimensional datasets. We use multiple redundant tilings of the image, where each tiling has a different orientation. We discuss the 2D rectangular tiling scheme and two main algorithm components of such 2D system, namely, (i) a fast optimal search algorithm to determine which tiles should be retrieved for a given query and (ii) a mapping algorithm to enable
efficient encoding without interpolation of rotated tiles. In exchange for increased server storage, we demonstrate that significant reductions, a factor of 2 in bandwidth reduction, can be achieved relative to conventional square tiling techniques. The transmission rate can be reduced even more by allowing more storage overhead. This method speeds up the random access procedure and saves memory on the user's side. Here we use the 2D example to retrieve random lines (or sets of lines) from a 2D image. While our experiments are based on extracting 1D data from 2D datasets, the proposed method can be extended to 3D or higher dimensions. The associated basic concepts and analysis (namely the extraction of 2D slices from 3D datasets) and a more detailed discussion focusing on the 3D and higher dimensional case will be presented in
another paper. In this paper, we design a tiling method that locates the rotation centers at points on a square Cartesian grid pattern and has the tile rotation angles uniformly distributed around each rotation center. The angles of the tiles associated to each rotation center are the same. Other various ways of tiling method design are also possible The performance by using other tiling methods will be addressed in future work.
In many collaborative research environments novel tools and techniques
allow researchers to generate data from experiments and observations
at a staggering rate. Researchers in these areas are now facing the
strong need for querying, sharing and exchanging these data in a
uniform and transparent fashion. However, due to the nature of the
various types of heterogeneous data and lack of local and global
database structures, standard data integration approaches fail or are
not applicable. A viable solution to this problem is the extensive use
of metadata. In this paper we present the model of an annotation
management system suitable for such research environments, and discuss
some aspects of its implementation. Annotations provide rich linkage
structure between data and between themselves that translates in a
complex graph structure of which annotations and data are the nodes. We
show how annotations are managed and used for data retrieval and
outline some of the query techniques used in the system.
Content-based image retrieval has been applied to many different biomedical applications1. In almost all cases, retrievals
involve a single query image of a particular modality and retrieved images are from this same modality. For example,
one system may retrieve color images from eye exams, while another retrieves fMRI images of the brain. Yet real
patients often have had tests from multiple different modalities, and retrievals based on more than one modality could
provide information that single modality searches fail to see. In this paper, we show medical image retrieval for two
different single modalities and propose a model for multimodal fusion that will lead to improved capabilities for
physicians and biomedical researchers. We describe a graphical user interface for multimodal retrieval that is being
tested by real biomedical researchers in several different fields.
Diagnosis accuracy in the medical field, is mainly affected by either lack of sufficient understanding of some diseases or the inter/intra-observer variability of the diagnoses. We believe that mining of large medical databases can help improve the current
status of disease understanding and decision making. In a previous study based on binary description of hypointensity in the brain, it was shown that brain iron accumulation shape provides additional information to the shape-insensitive features, such as the total brain iron load, that are commonly used in clinics. This paper
proposes a novel, nonbinary description of hypointensity in the brain based on principal component analysis. We compare the complementary and redundant information provided by the two
descriptions using Kendall's rank correlation coefficient in order to better understand the individual descriptions of iron accumulation in the brain and obtain a more robust and accurate search and retrieval system.