We present here the results obtained by including a new image descriptor, that we called prosemantic feature
vector, within the framework of QuickLook2 image retrieval system. By coupling the prosemantic features and
the relevance feedback mechanism provided by QuickLook2, the user can move in a more rapid and precise way
through the feature space toward the intended goal. The prosemantic features are obtained by a two-step feature
extraction process. At the first step, low level features related to image structure and color distribution are
extracted from the images. At the second step, these features are used as input to a bank of classifiers, each
one trained to recognize a given semantic category, to produce score vectors. We evaluated the efficacy of the
prosemantic features under search tasks on a dataset provided by Fratelli Alinari Photo Archive.
In this work we present a system which visualizes the results obtained from image search engines in such a way
that users can conveniently browse the retrieved images. The way in which search results are presented allows
the user to grasp the composition of the set of images "at a glance". To do so, images are grouped and positioned
according to their distribution in a prosemantic feature space which encodes information about their content at
an abstraction level that can be placed between visual and semantic information. The compactness of the feature
space allows a fast analysis of the image distribution so that all the computation can be performed in real time.
We propose a method the for semi-automatic organization of photo albums. The method analyzes how different
users organize their own pictures. The goal is to help the user in dividing his pictures into groups characterized
by a similar semantic content. The method is semi-automatic: the user starts to assign labels to the pictures
and unlabeled pictures are tagged with proposed labels. The user can accept the recommendation or made a
correction. We use a suitable feature representation of the images to model the different classes that the users
have collected. Then, we look for correspondences between the criteria used by the different users which are
integrated using boosting. A quantitative evaluation of the proposed approach is obtained by simulating the
amount of user interaction needed to annotate the albums of a set of members of the flickr R(trademark) photo-sharing
This paper proposes an alternative to formal annotation for the representation of semantics, and presents an
extension to it capable of handling multimedia (text and images) documents. The article argues that meaning
is not a property of a document, but an outcome of a contextualized and situated process of interpretation. The
consequence of this position is that one should not quite try to represent the meaning of a document (the way
formal annotation does), but the context of the activity of which search is part.
We present some general considerations on the representation and use of the context, and a simple example
of a technique to encode the context represented by the documents collected in the computer in which one
is working, and to use them to direct search. We show preliminary results showing that even this rather
simpleminded context representation can lead to considerable improvements with respect to commercial search
engines both for text and images.
In many collaborative research environments novel tools and techniques
allow researchers to generate data from experiments and observations
at a staggering rate. Researchers in these areas are now facing the
strong need for querying, sharing and exchanging these data in a
uniform and transparent fashion. However, due to the nature of the
various types of heterogeneous data and lack of local and global
database structures, standard data integration approaches fail or are
not applicable. A viable solution to this problem is the extensive use
of metadata. In this paper we present the model of an annotation
management system suitable for such research environments, and discuss
some aspects of its implementation. Annotations provide rich linkage
structure between data and between themselves that translates in a
complex graph structure of which annotations and data are the nodes. We
show how annotations are managed and used for data retrieval and
outline some of the query techniques used in the system.
Many evaluation techniques for content based image retrieval are based on the availability of a ground truth, that is on a "correct" categorization of images so that, say, if the query image is of category A, only the returned images in category A will be considered as "hits." Based on such a ground truth, standard information retrieval measures such as precision and recall and given and used to evaluate and compare retrieval algorithms. Coherently, the assemblers of benchmarking data bases go to a certain length to have their images categorized. The assumption of the existence of a ground truth is, in many respect, naive. It is well known that the categorization of the images depends on the a priori (from the point of view of such categorization) subdivision of the semantic field in which the images are placed (a trivial observation: a plant subdivision for a botanist is very different from that for a layperson). Even within a given semantic field, however, categorization by human subjects is subject to uncertainty, and it makes little statistical sense to consider the categorization given by one person as the unassailable ground truth. In this paper I propose two evaluation techniques that apply to the case in which the ground truth is subject to uncertainty. In this case, obviously, measures such as precision and recall as well will be subject to uncertainty. The paper will explore the relation between the uncertainty in the ground truth and that in the most commonly used evaluation measures, so that the measurements done on a given system can preserve statistical significance.
In this paper, we examine the problem of efficiently computing a class of aggregate functions on regions of space. We first formalize region-based aggregations for a large class of efficient geometric aggregations. The idea is to represent the query object with pre-defined objects with set operations, and compute the aggregation using the pre-computed aggregation values. We first show that it applies to existing results about points and rectangular objects. Since it is defined using set theory instead of object shapes, it can be applied to polygons. Given a database D of polygonal regions, a tessellation T of the plane, and a query polygon q constructed from T, we prove that the aggregation of q can be calculated by the aggregation over triangles and lines constructed from segments and vertices in q, which can be pre-computed. The query time complexity is O(klogn), where k is the size of query polygon and n is the size of T.
This paper will argue in favor of a comprehensive model of image data bases, which allows the inclusion of computer vision technique into a formal query framework on a rigorous data base foundation. It attempts to give a first, very tentative direction that this framework could take. The main idea of the paper is that a correct way to create a data base that relies on such heterogeneous techniques as those developed by computer vision researchers without collapsing under the sheer weight of its own complexity goes through the definition of abstract data types, and of suitable techniques to manipulate them in a query system without having to know anything of their implementation, that is, purely from a functional point of view.
In this paper we investigate the problem of integrating data sources with compex data types and complex data algebras, of the kind that is common in multimedia applications. We present the principles of mediation, and discuss why they are suitable for multimedia system. Then, we describe the mediation system Metropolis, and the specific solutions used for it that make it extendable to multimedia. Finally, we discuss one of the extensions currently under study: the integration of homologous abstract data types at the algebra level.
In this paper we present a formalism for query rewriting in the presence of data types with multiple representation, such as images. We show that the formalism is consistent and that it allows to derive rewriting rules, and we argue that the algebraic level at which it is express is the appropriate level for image access systems distributed over the internet, in which the internal details of the individual data repositories are not accessible from outside the local environment.
The Biomedical Informatics Research Network is wide breadth project sponsored by the American National Institutes of Health (NIH) to promote the use of modern telecommunication for data exchange and collaboration in brain research.
The project is attempting to buid a database and network infrastructure in which neuroscientists will post, query, and analyze raw data, processed data, and the results of the analysis.
The project is divided into parts, which analyze mouse brain data and human brain data, respectively. In this phase of the project, the data are essentially anatomical, while in a future phase we foresee the introduction of functional data. One important source of raw data, both for the mouse and the human brains are magnetic resonance images (MRI), which provide dense volumetric information of the density of the brain or (in the case of functional MRI), of the brain activity. In the case of the brain mouse, these data are supplemented with images of slices of brains and other histological measure.
One important technical problem that we are facing in BIRN is that of managing these volumetric data, processing them (possibly using tools available only remotely), storing the results of the analyses, and making them available to all the institutions participating in the project.
This paper describes the problems posed by the BIRN project, the importance of image data in these activities, and the challenges they pose. We will describe the shared environment that we are creating, and the facilities for storing, querying, remotely processing, and sharing the image data that constitute the bulk of the brain data that scientists are producing.
This paper presents a data model for images immersed in the world wide web and that derive their meaning from visual similarity, from the connection with the text of the pages that contains them, and from the link structure of the web. I will model images on the web as a graph whose nodes are either text documents or images, and whose edges are links, labeled with measures of relevance of one document towards the other. The paper presents briefly the features used to characterize the text and the visual aspect of the images, and then goes on to present a data algebra suitable to navigate and query the database.
In this paper we describe the general architecture and specific topics of a database system for multimedia (specifically, images) data. We analyze the conditions that must be met in order to achieve goals such as efficient schema design, query optimization, and extensibility. We assume that the engine uses the services of a traditional database (in the specific instance, a relational database), with the addition of user-defined indexing schemes. The paper presents general architectural issue and focuses on two problems: schema normalization using computational properties of the features, and the definition of feature algebras, in particular, the definition of a wavelets algebra for query expression.
This paper studies the relation between images and text in image databases. An analysis of this relation results in the definition of three distinct query modalities: (1) linguistic scenario: images are part of a whole including a self-contained linguistic discourse, and their meaning derives form their interaction with the linguistic discourse. A typical case of this scenario is constituted by images on the World Wide Web; (2) closed world scenario: images are defined in a limited domain, and their meaning is anchored by conventions and norms in that domain; (3) user scenario: the linguistic discourse is provided by the user. This is the case of highly interactive systems with relevance feedback. This paper deals with image databases of the first type. It shows how the relation between images and text can be inferred, and exploited for search. The paper develops a similarity model in which the similarity between two images is given by both their visual similarity and the similarity of the attached words. Both the visual and textural similarity can be manipulated by the user through the two windows of the interface.
Web cameras are becoming more and more common on the Internet, and the technology is ready to make cameras a standard accessory of any computer. The development of applications, however, hasn't followed the explosive diffusion of the cameras. Problems in developing applications for remote web cameras come form the low image quality that they generally provide and from the fact that, unless the application runs locally, the image data are only available at a very low frame rate (typically between 10 sec/image and 30 min/image). New image analysis and processing techniques are needed to take advantage of the opportunity represented by web cameras. This paper presents some early considerations and techniques to deal with what I call Very Low Rate Vide (VLRV). Certain operations of fundamental importance in vision, such as motion detection, are impossible in VLRV, due to the large interval between consecutive images. Other operations, like color processing, are made difficult by the low quality and temporal instability of the images. The paper presents techniques to deal with different processes with different time constants, and tries to determine the limits of what is feasible using one web camera and using a whole collection of web cameras.
This paper presents some methodological observations on the measurement of performance in Visual Information Retrieval systems. The paper identifies three different types of measures tow of which can be determined with methods inherited from physical and social sciences respectively. The third model is more typical of the design and construction of complicated systems, since it allows us to measure the performance of individual modules before their insertion in a particular application. This paper present some methodologies for the decontextualized evaluation, anchoring them to a case study of evaluation of several subsystems of an image database.
Most current image processing systems work on color images, and color is a precious perceptual clue for determining image similarity. Working with color images, however, is not the sam thing as working with images taking values in a 3D Euclidean space. Not only are color spaces bounded, but the characteristics of the observer endow the space with a 'perceptual' metric that in general does not correspond to the metric naturally inherited from R3. This paper studies the problem of filtering color images abstractly. It begins by determining the properties of the color sum and color product operations such that he desirable properties of orthonormal bases will be preserved. The paper then defines a general scheme, based on the action of the additive group on the color space, by which operations that satisfy the required properties can be defined.
This paper introduces a model of a spatio-temporal database, which we are developing to query interesting events in video sequences. The database we are designing is pushing the state-of-the-art for a number of fields, and there are many issues that are still waiting a satisfactory solution. In this paper, we present our (albeit still partial) answer to some of these problems, and the future direction of our work. Our design is divided into two layers: a logbook, which operates as a short time repository of unsummarized and unprocessed data, and a long-term spatio-temporal database, which stores and queries summarized data.
In this paper, we introduce our approach to multimedia database interfaces. Although we deal mainly with image databases, most of the ideas we present can be generalized to other types of data. We argue that, when dealing with complex data, such as images, the problem of access must be redefined along different lines than text databases. In multimedia databases, the semantics of the data is imprecise, and depends in part on the user's interpretation. This observation made us consider the development of interfaces in which the user explores the database rather than querying it. In this paper, we give a brief justification of our position and present the exploratory interface, which we have developed for our image database El Nino.