In this work, we propose two improvements of the Gestalt Interest Points (GIP) algorithm for the recognition of faces of people that have underwent significant weight change. The basic assumption is that some interest points contribute more to the description of such objects than others. We assume that we can eliminate certain interest points to make the whole method more efficient while retaining our classification results. To find out which gestalt interest points can be eliminated, we did experiments concerning contrast and orientation of face features. Furthermore, we investigated the robustness of GIP against image rotation. The experiments show that our method is rotational invariant and - in this practically relevant forensic domain - outperforms the state-of-the-art methods such as SIFT, SURF, ORB and FREAK.
The experiments described in this paper indicate that under certain conditions content-based features are not required for
efficient user-centred image retrieval in small media collections. The importance of feature selection drops dramatically if
classification is used for retrieval (e.g. if Support Vector Machines are used) and only little user feedback is available. In
this situation simple image features and even random features perform equally well as sophisticated signal processing-based
features (e.g. the content-based MPEG-7 image descriptors). Practically relevant applications for these findings are
retrieval on mobile devices and in heterogeneous (e.g. ad hoc generated) media collections.
This paper introduces a novel paradigm for integrated retrieval and browsing in content-based visual information retrieval systems. The proposed approach uses feature transformations and distance measures for content-based media access and similarity measurement. The first innovation is that distance space is visualised in a 3D user interface: 2D representations of media objects are shown on the image plane. The floor plane is used to show their distance relationships. Queries can interactively be defined by browsing through the 3D space and selecting media objects as positive or negative examples. Each selection operation defines hyper-clusters that are used for querying, and causes query execution and distance space adaptation in a background process. In order to help the user understanding distance space, descriptions are visualised in diagrams and associated with media objects. Changes in distance space are visualised by tree-like graphs. Furthermore, the user is enabled to select subspaces of distance space and select new distance metrics for them. This allows dealing with multiple similarity judgements in one retrieval process. The proposed components for visual data mining will be implemented in the visual information retrieval project VizIR. All VizIR components can be arbitrarily combined to sophisticated retrieval applications.
This paper describes how parallel retrieval is implemented in the content-based visual information retrieval framework VizIR. Generally, two major use cases for parallelisation exist in visual retrieval systems: distributed querying and simultaneous multi-user querying. Distributed querying includes parallel query execution and querying multiple databases. Content-based querying is a two-step process: transformation of feature space to distance space using distance measures and selection of result set elements from distance space. Parallel distance measurement is implemented by sharing example media and query parameters between querying threads. In VizIR, parallelisation is heavily based on caching strategies. Querying multiple distributed databases is already supported by standard relational database management systems. The most relevant issues here are error handling and minimisation of network bandwidth consumption. Moreover, we describe strategies for distributed similarity measurement and content-based indexing. Simultaneous multi-user querying raises problems such as caching of querying results and usage of relevance feedback and user preferences for query refinement. We propose a 'real' multi-user querying environment that allows users to interact in defining queries and browse through result sets simultaneously. The proposed approach opens an entirely new field of applications for visual information retrieval systems.
Evaluation in visual information retrieval is usually performed by executing test queries and calculating recall and precision based on predefined media collections and ground truth information. This process is complex and time consuming. For the evaluation of feature transformations (transformation of visual media objects to feature vectors) it would be desirable to have simpler methods available. In this paper we introduce an evaluation procedure for features that is based on statistical data analysis. The new idea is that we make use of the existing visual MPEG-7 descriptors to judge the characteristics of novel feature transformations. The proposed procedure is divided into four steps: (1) feature extraction, (2) merging with MPEG-7 data and normalisation, (3) statistical data analysis and (4) visualisation and interpretation. Three types of statistical methods are used for evaluation: (1) description (moments, etc.), (2) identification of similarities (e.g. cluster analysis) and (3) identification of dependencies (e.g. factor analysis). From statistical analysis several benefits can be drawn for feature redesign. Application of the evaluation procedure suggested and advantages of the approach are shown in several examples.
Visual information retrieval (VIR) is a research area with more than 300 scientific publications every year. Technological progress lets surveys become out of date within a short duration. This paper intends to shortly describe selected important advances in VIR in recent years and point out promising directions for future research. A software architecture for visual media handling is proposed that allows handling image and video content equally. This allows to integrate both types of media in a singe system. The major advances in feature design are sketched and new methods for semantic enrichment are proposed. Guidelines are formulated for further development of feature extraction methods. The most relevant retrieval processes are described and an interactive method for visual mining is suggested that really puts "the human in the loop". For evaluation, the classic recall- and precision-based approach is discussed as well as a new procedure based on MPEG-7 and statistical data analysis. Finally, an "ideal" architecture for VIR systems is outlined. The selection of VIR topics is subjective and represents the author's point of view. The intention is to provide a short but substantial introduction to the field of VIR.
This paper describes how the web standards Synchronized Multimedia Integration Language (SMIL) and Scalable Vector Graphics (SVG) are used in teaching at the Vienna University of Technology. SMIL and SVG are used in courses on multimedia authoring. Didactically, the goal is to teach students how to use media objects and timing concepts to build interactive media applications. Additionally, SMIL is applied to generate multimedia content from a database using a content management system. The paper gives background information on the SMIL and SVG standards and sketches how teaching multimedia is organized at the Vienna University of Technology. Courses from the summer term 2003 are described and illustrated in two case studies. General design problems of SMIL-based presentations are modelled as patterns. Additionally, suggestions for improvement in the standards are given and shortcomings of existing user agents are summarized. Our conclusion is that SMIL and SVG are very well suited for teaching multimedia. Currently, the main problem is that all existing SMIL players lack some properties desired for teaching applications (stability, correctness, etc.).
The study presented in this paper analyses descriptions extracted with MPEG-7-descriptors from visual content from the statistical point of view. Good descriptors should generate descriptions with high variance, a well-balanced cluster structure and high discriminance to be able to distinguish different media content. Statistical analysis reveals the quality of the used description extraction algorithms. This was not considered in the MPEG-7-design process where optimising the recall was the major goal. For the analysis eight basic visual descriptors were applied on three media collections: the Brodatz dataset (monochrome textures), a selection of the Corel dataset (colour photos) and a set of coats-of-arms images (artificial colour images with few colour gradations). The results were analysed with four statistical methods: mean and variance of descriptor elements, distribution of elements, cluster analysis (hierarchical and topological) and factor analysis. The main results are: The best descriptors for combination are Color Layout, Dominant Color, Edge Histogram and Texture Browsing. The other are highly dependent on these. The colour histograms (Color Structure and Scalable Color) perform badly on monochrome input. Generally, all descriptors are highly redundant and the application of complexity reduction transformations could save up to 80% of storage and transmission capacity.
This paper describes how the handling of visual media objects is implemented in the visual information retrieval project VizIR. Essentially, four areas are concerned: media access, media representation in user interfaces, visualisation of media-related data and media transport over the network. The paper offers detailed technical descriptions of the solutions developed in VizIR for these areas. Unified media access for images and video is implemented through class MediaContent. This class contains methods to access the view on a media object at any point in time as well as methods to change the colour model and read/write format parameters (size, length, frame-rate). Based on this low-level-API class VisualCube allows accessing spatio-temporal areas in temporal media randomly. Transformer-classes allow to modify visual objects in a very simple but effective way. Visualisation of media object is implemented in class MediaRenderer. Each MediaRenderer represents one media object and is responsible for any aspect of its visualisation. In the paper examples for reasonable implementations of MediaRenderer-classes are presented. Visualisation of media-related data is strongly connected to MediaRenderer. MediaRenderer is to a large extent responsible for displaying visual panels created by other framework components. Finally, media object transport in VizIR is based on the Realtime Transfer Protocol (for media objects) and XML-messaging (for XML-data).
The focus of this paper is on similarity modeling. In the first part we revisit underlying concepts of similarity modeling and sketch the currently most used VIR similarity model (Linear Weighted Merging, LWM). Motivated by its drawbacks we introduce a new general similarity model called Logical Retrieval (LR) that offers more flexibility than LWM. In the second part we integrate the Feature Contrast Model (FCM) in this environment, developed by psychologists to explain human peculiarities in similarity perception. FCM is integrated as a general method for distance measurement. The results show that FCM performs (in the LR context) better than metric-based distance measurement. Euclidean distance is used for comparison because it is used in many VIR systems and is based on the questionable metric axioms. FCM minimizes the number of clusters in distance space. Therefore it is the ideal distance measure for LR. FCM allows a number of different parameterizations. The tests reveal that in average a symmetric, non-subtractive configuration that emphasizes common properties of visual objects performs best. Its major drawback in comparison to Euclidean distance is its worse performance (in terms of query execution time).