In this paper, we explored the use of low fidelity Synthetic Environments (SE; i.e., a combination of simulation
techniques) for product design. We explored the usefulness of low fidelity SE to make design problems explicit. In
particular, we were interested in the influence of interactivity on user experience. For this purpose, an industrial design
case was taken: the innovation of an airplane galley. A virtual airplane was created in which an interactive model of the
galley was placed. First, three groups of participants explored the SE in different conditions: Participants explored the SE
interactively (Interactive condition), watched a recording (Passive Dynamic condition), or watched static images
(Passive Static condition). Afterwards, participants were tested in a questionnaire on how accurately they had memorized
the spatial layout of the SE. The results revealed that interactive SE does not necessarily provoke participants to
memorize spatial layouts more accurately. However, the effect of interactive learning is dependent on the participants'
Visual Spatial Ability (VSA). Consequently, this finding supports use of interactive exploration of prototypes through
low fidelity SE for the product design cycle when taking the individual's characteristics into account.
In this research, we are considering the use of the inverse perspective transformation in video surveillance
applications that observe (and possible influence) scenes consisting of moving and stationary objects; e.g., people
on a parking area. In previous research, objects were detected on video streams and identified as moving
or stationary. Subsequently, distance maps were generated by the Fast Exact Euclidean Distance (FEED)
transformation, which uses frame-to-frame information to generate distance maps for video frames in a fast
manner. From the resulting distance maps, different kinds of surveillance parameters can be derived. The
camera was placed above the scene, and hence, no inverse perspective transformation was needed. In this
work,the case is considered the case that the camera is placed under an arbitrary angle on the side of the scene,
which might be a more feasible placement than on the top. It will be shown that an image taken from a camera
on the side can be easily and fast converted to an image as would be taken by a camera on the top. The allows
the use of the previously developed methods after converting each frame of a video stream or only objects of
interest detected on them.
A distance transformation (DT) takes a binary image as input and generates a distance map image in which the
value of each pixel is its distance to a given set of object pixels in the binary image. In this research, DT's for
multi class data (MCDTs) are developed which generate both a distance map and a class map containing for each
pixel the class of the closest object. Results indicate that the MCDT based on the Fast Exact Euclidean Distance
(FEED) method is a factor 2 tot 4 faster than MCDTs based on exact or semi-exact euclidean distance (ED)
transformations, and is only a factor 2 to 4 slower than the MCDT based on the crude city-block approximation
of the ED. In the second part of this research, the MCDTs were adapted such that they could be used for the
fast generation of distance and class maps for video sequences. The frames of the sequences contain a number of
fixed objects and a moving object, where each object has a separate label. Results show that the FEED based
version is a factor 2 to 3.5 faster than the fastest of all the other video-MCDTs which is based on the chamfer 3,4
distance measure. FEED is even a factor 3.5 to 10 faster than another fast exact ED transformation. With video,
multi class FEED it will be possible to measure distances from a moving object to various identified stationary
objects with nearly the frame rate of a webcam. This will be very useful when the risk exists that objects move
outside surveillance limits.
A breakthrough is needed in order to achieve a substantial progress in the field of Content-Based Image Retrieval
(CBIR). This breakthrough can be enforced by: 1) optimizing user-system interaction, 2) combining the
wealth of techniques from text-based Information Retrieval with CBIR techniques, 3) exploiting human cognitive
characteristics, especially human color processing, and 4) conducting benchmarks with users for evaluating
new CBIR techniques. In this paper, these guidelines are illustrated by findings from our research conducted
the last five years, which have lead to the development of the online Multimedia for Art ReTrieval (M4ART)
system: http://www.m4art.org. The M4ART system follows the guidelines on all four issues and is assessed
on benchmarks using 5730 queries on a database of 30,000 images. Therefore, M4ART can be considered as a
first step into a new era of CBIR.
A new application for VR has emerged: product development, in which several stakeholders (from engineers to end
users) use the same VR for development and communicate purposes. Various characteristics among these stakeholders
vary considerably, which imposes potential constraints to the VR. The current paper discusses the influence of three
types of exploration of objects (i.e., none, passive, active) on one of these characteristics: the ability to form mental
representations or visuo-spatial ability (VSA). Through an experiment we found that all users benefit from exploring
objects. Moreover, people with low VSA (e.g., end users) benefit from an interactive exploration of objects opposed to
people with a medium or high VSA (e.g. engineers), who are not sensitive for the type of exploration. Hence, for VR
environments in which multiple stakeholders participate (e.g. for product development), differences among their
cognitive abilities (e.g., VSA) have to be taken into account to enable an efficient usage of VR.
Human vigilance is limited; hence, automatic motion and distance detection is one of the central issues in video surveillance. Hereby, many aspects are of importance, this paper specially addresses: efficiency, achieving real-time performance, accuracy, and robustness against various noise factors. To obtain fully controlled test environments, an artificial development center for robot navigation is introduced in which several parameters can be set (e.g., number of objects, trajectories and type and amount of noise). In the videos, for each following frame, movement of stationary objects is detected and pixels of moving objects are located from which moving objects are identified in a robust way. An Exact Euclidean Distance Map (E2DM) is utilized to determine accurately the distances between moving and stationary objects. Together with the determined distances between moving objects and the detected movement of stationary objects, this provides the input for detecting unwanted situations in the scene. Further, each intelligent object (e.g., a robot), is provided with its E2DM, allowing the object to plan its course of action. Timing results are specified for each program block of the processing chain for 20 different setups. So, the current paper presents extensive, experimentally controlled research on real-time, accurate, and robust motion detection for video surveillance, using E2DMs, which makes it a unique approach.
Various texture analysis algorithms have been developed the last decades. However, no computational model has
arisen that mimics human texture perception adequately. In 2000, Payne, Hepplewhite, and Stoneham and in
2005, Van Rikxoort, Van den Broek, and Schouten achieved mappings between humans and artificial classifiers
of respectively around 29% and 50%. In the current research, the work of Van Rikxoort et al. was replicated,
using the newly developed, online card sorting experimentation platform M-HinTS: http://eidetic.ai.ru.
nl/M-HinTS/. In two separate experiments, color and gray scale versions of 180 textures, drawn from the OuTex
and VisTex texture databases were clustered by 34 subjects. The mutual agreement among these subjects was
51% and 52% for, respectively, the experiments with color and gray scale textures. The average agreement
between the k-means algorithm and the participants was 36%, where k-means approximated some participants
up to 60%. Since last year's results were not replicated, an additional data analysis was developed, which uses
the semantic labels available in the database. This analysis shows that semantics play an important role in
human texture clustering and once more illustrate the complexity of texture recognition. The current findings,
the introduction of M-HinTS, and the set of analyzes discussed, are the start of a next phase in unraveling human
In image and video analysis, distance maps are frequently used. They provide the (Euclidean) distance (ED) of background pixels to the nearest object pixel. Recently, the Fast Exact Euclidean Distance (FEED) transformation was launched. In this paper, we present the three dimensional (3D) version of FEED. 3D-FEED is compared with four other methods for a wide range of 3D test images. 3D-FEED proved to be twice as fast as the fastest algorithm available. Moreover, it provides true exact EDs, where other algorithms only approximate the ED. This unique algorithm makes the difference, especially there where time and precision are of importance.
The prototype of an online Multimedia for Art ReTrieval (M4ART) system is introduced, which provides entrance to the digitized collection of the National Gallery of the Netherlands (the Rijksmuseum). The current online system of the Rijksmuseum is text-based and requires expert knowledge concerning the work searched for, else it fails in retrieving it. M4ART extends this system with querying by an example image that can be uploaded to the system or can be selected through browsing the collection. The global color distribution and (optionally) a set of texture features of the example image are extracted and compared with those of the images in the collection. Hence, based on either text or content-based features, the collection can be queried. Moreover, the matching process of M4ART can be inspected. With the latter feature, M4ART not only integrates the means to inspect collections by both experts and laypersons in one system but also provides the means to let the user to understand its working. These characteristics make M4ART a unique system to access, enhance, and retrieve the knowledge available in digitized art collections.
In an attempt to mimic human (colorful) texture classification by a clustering algorithm three lines of research have been encountered, in which as test set 180 texture images (both their color and gray-scale equivalent) were drawn from the OuTex and VisTex databases. First, a k-means algorithm was applied with three feature vectors, based on color/gray values, four texture features, and their combination. Second, 18 participants clustered the images using a newly developed card sorting program. The mutual agreement between the participants was 57% and 56% and between the algorithm and the participants it was 47% and 45%, for respectively color and gray-scale texture images. Third, in a benchmark, 30 participants judged the algorithms' clusters with gray-scale textures as more homogeneous then those with colored textures. However, a high interpersonal variability was present for both the color and the gray-scale clusters. So, despite the promising results, it is questionable whether average human texture classification can be mimicked (if it exists at all).
In image and video analysis, distance maps are frequently used. They provide the (Euclidean) distance (ED) of background pixels to the nearest object pixel. In a naive implementation, each object pixel feeds its (exact) ED to each background pixel; then the minimum of these values denotes the ED to the closest object. Recently, the Fast Exact Euclidean Distance (FEED) transformation was launched, which was up to 2x faster than the fastest algorithms available. In this paper, first additional improvements to the original FEED
algorithm are discussed. Next, a timed version of FEED (tFEED) is presented, which generates distance maps for video sequences by merging partial maps. For each object in a video, a partial map can be calculated for different frames, where the partial map for fixed objects is only calculated once. In a newly developed, dynamic test-environment for robot navigation purposes, tFEED proved to be up to 7x faster than using FEED on each frame separately. It is up to 4x faster than the fastest ED algorithm available for video sequences and even 40% faster than generating city-block or chamfer distance maps for frames. Hence, tFEED is the first real time algorithm for generating exact ED maps of video sequences.
We present the concept of intelligent Content-Based Image Retrieval (iCBIR), which incorporates knowledge concerning human cognition in system development. The present research focuses on the utilization of color categories (or focal colors) for CBIR purposes, in particularly considered to be useful for query-by-heart purposes.
However, this research explores its potential use for query-by-example purposes. Their use was validated for the field of CBIR by two experiments (26 subjects; stimuli: 4 times the 216 W3C web-safe colors) and one question ("mention ten colors"). Based on the experimental results a Color LookUp Table (CLUT) was defined. This
CLUT was used to segment the HSI color space into the 11 color categories. With that a new color quantization method was introduced making a 11 bin color histogram configuration possible. This was compared with three other histogram configurations of 64, 166, and 4096 bins. Combined with the intersection and the quadratic
distance measure we defined seven color matching systems. An experimentally founded benchmark for CBIR systems was implemented (1680 queries were performed measuring relevance and satisfaction). The 11 bin histogram configuration did have an average performance. A promising result since it was a naive implementation
and is still a topic of development.