In this paper, we explored the use of low fidelity Synthetic Environments (SE; i.e., a combination of simulation
techniques) for product design. We explored the usefulness of low fidelity SE to make design problems explicit. In
particular, we were interested in the influence of interactivity on user experience. For this purpose, an industrial design
case was taken: the innovation of an airplane galley. A virtual airplane was created in which an interactive model of the
galley was placed. First, three groups of participants explored the SE in different conditions: Participants explored the SE
interactively (Interactive condition), watched a recording (Passive Dynamic condition), or watched static images
(Passive Static condition). Afterwards, participants were tested in a questionnaire on how accurately they had memorized
the spatial layout of the SE. The results revealed that interactive SE does not necessarily provoke participants to
memorize spatial layouts more accurately. However, the effect of interactive learning is dependent on the participants'
Visual Spatial Ability (VSA). Consequently, this finding supports use of interactive exploration of prototypes through
low fidelity SE for the product design cycle when taking the individual's characteristics into account.
In this research, we are considering the use of the inverse perspective transformation in video surveillance
applications that observe (and possible influence) scenes consisting of moving and stationary objects; e.g., people
on a parking area. In previous research, objects were detected on video streams and identified as moving
or stationary. Subsequently, distance maps were generated by the Fast Exact Euclidean Distance (FEED)
transformation, which uses frame-to-frame information to generate distance maps for video frames in a fast
manner. From the resulting distance maps, different kinds of surveillance parameters can be derived. The
camera was placed above the scene, and hence, no inverse perspective transformation was needed. In this
work,the case is considered the case that the camera is placed under an arbitrary angle on the side of the scene,
which might be a more feasible placement than on the top. It will be shown that an image taken from a camera
on the side can be easily and fast converted to an image as would be taken by a camera on the top. The allows
the use of the previously developed methods after converting each frame of a video stream or only objects of
interest detected on them.
A distance transformation (DT) takes a binary image as input and generates a distance map image in which the
value of each pixel is its distance to a given set of object pixels in the binary image. In this research, DT's for
multi class data (MCDTs) are developed which generate both a distance map and a class map containing for each
pixel the class of the closest object. Results indicate that the MCDT based on the Fast Exact Euclidean Distance
(FEED) method is a factor 2 tot 4 faster than MCDTs based on exact or semi-exact euclidean distance (ED)
transformations, and is only a factor 2 to 4 slower than the MCDT based on the crude city-block approximation
of the ED. In the second part of this research, the MCDTs were adapted such that they could be used for the
fast generation of distance and class maps for video sequences. The frames of the sequences contain a number of
fixed objects and a moving object, where each object has a separate label. Results show that the FEED based
version is a factor 2 to 3.5 faster than the fastest of all the other video-MCDTs which is based on the chamfer 3,4
distance measure. FEED is even a factor 3.5 to 10 faster than another fast exact ED transformation. With video,
multi class FEED it will be possible to measure distances from a moving object to various identified stationary
objects with nearly the frame rate of a webcam. This will be very useful when the risk exists that objects move
outside surveillance limits.
A breakthrough is needed in order to achieve a substantial progress in the field of Content-Based Image Retrieval
(CBIR). This breakthrough can be enforced by: 1) optimizing user-system interaction, 2) combining the
wealth of techniques from text-based Information Retrieval with CBIR techniques, 3) exploiting human cognitive
characteristics, especially human color processing, and 4) conducting benchmarks with users for evaluating
new CBIR techniques. In this paper, these guidelines are illustrated by findings from our research conducted
the last five years, which have lead to the development of the online Multimedia for Art ReTrieval (M4ART)
system: http://www.m4art.org. The M4ART system follows the guidelines on all four issues and is assessed
on benchmarks using 5730 queries on a database of 30,000 images. Therefore, M4ART can be considered as a
first step into a new era of CBIR.
A new application for VR has emerged: product development, in which several stakeholders (from engineers to end
users) use the same VR for development and communicate purposes. Various characteristics among these stakeholders
vary considerably, which imposes potential constraints to the VR. The current paper discusses the influence of three
types of exploration of objects (i.e., none, passive, active) on one of these characteristics: the ability to form mental
representations or visuo-spatial ability (VSA). Through an experiment we found that all users benefit from exploring
objects. Moreover, people with low VSA (e.g., end users) benefit from an interactive exploration of objects opposed to
people with a medium or high VSA (e.g. engineers), who are not sensitive for the type of exploration. Hence, for VR
environments in which multiple stakeholders participate (e.g. for product development), differences among their
cognitive abilities (e.g., VSA) have to be taken into account to enable an efficient usage of VR.
Human vigilance is limited; hence, automatic motion and distance detection is one of the central issues in video surveillance. Hereby, many aspects are of importance, this paper specially addresses: efficiency, achieving real-time performance, accuracy, and robustness against various noise factors. To obtain fully controlled test environments, an artificial development center for robot navigation is introduced in which several parameters can be set (e.g., number of objects, trajectories and type and amount of noise). In the videos, for each following frame, movement of stationary objects is detected and pixels of moving objects are located from which moving objects are identified in a robust way. An Exact Euclidean Distance Map (E<sup>2</sup>DM) is utilized to determine accurately the distances between moving and stationary objects. Together with the determined distances between moving objects and the detected movement of stationary objects, this provides the input for detecting unwanted situations in the scene. Further, each intelligent object (e.g., a robot), is provided with its E<sup>2</sup>DM, allowing the object to plan its course of action. Timing results are specified for each program block of the processing chain for 20 different setups. So, the current paper presents extensive, experimentally controlled research on real-time, accurate, and robust motion detection for video surveillance, using E<sup>2</sup>DMs, which makes it a unique approach.
Various texture analysis algorithms have been developed the last decades. However, no computational model has
arisen that mimics human texture perception adequately. In 2000, Payne, Hepplewhite, and Stoneham and in
2005, Van Rikxoort, Van den Broek, and Schouten achieved mappings between humans and artificial classifiers
of respectively around 29% and 50%. In the current research, the work of Van Rikxoort et al. was replicated,
using the newly developed, online card sorting experimentation platform M-HinTS: http://eidetic.ai.ru.
nl/M-HinTS/. In two separate experiments, color and gray scale versions of 180 textures, drawn from the OuTex
and VisTex texture databases were clustered by 34 subjects. The mutual agreement among these subjects was
51% and 52% for, respectively, the experiments with color and gray scale textures. The average agreement
between the k-means algorithm and the participants was 36%, where k-means approximated some participants
up to 60%. Since last year's results were not replicated, an additional data analysis was developed, which uses
the semantic labels available in the database. This analysis shows that semantics play an important role in
human texture clustering and once more illustrate the complexity of texture recognition. The current findings,
the introduction of M-HinTS, and the set of analyzes discussed, are the start of a next phase in unraveling human
In image and video analysis, distance maps are frequently used. They provide the (Euclidean) distance (ED) of background pixels to the nearest object pixel. Recently, the Fast Exact Euclidean Distance (FEED) transformation was launched. In this paper, we present the three dimensional (3D) version of FEED. 3D-FEED is compared with four other methods for a wide range of 3D test images. 3D-FEED proved to be twice as fast as the fastest algorithm available. Moreover, it provides true exact EDs, where other algorithms only approximate the ED. This unique algorithm makes the difference, especially there where time and precision are of importance.
The prototype of an online Multimedia for Art ReTrieval (M4ART) system is introduced, which provides entrance to the digitized collection of the National Gallery of the Netherlands (the Rijksmuseum). The current online system of the Rijksmuseum is text-based and requires expert knowledge concerning the work searched for, else it fails in retrieving it. M4ART extends this system with querying by an example image that can be uploaded to the system or can be selected through browsing the collection. The global color distribution and (optionally) a set of texture features of the example image are extracted and compared with those of the images in the collection. Hence, based on either text or content-based features, the collection can be queried. Moreover, the matching process of M4ART can be inspected. With the latter feature, M4ART not only integrates the means to inspect collections by both experts and laypersons in one system but also provides the means to let the user to understand its working. These characteristics make M4ART a unique system to access, enhance, and retrieve the knowledge available in digitized art collections.
In an attempt to mimic human (colorful) texture classification by a clustering algorithm three lines of research have been encountered, in which as test set 180 texture images (both their color and gray-scale equivalent) were drawn from the OuTex and VisTex databases. First, a k-means algorithm was applied with three feature vectors, based on color/gray values, four texture features, and their combination. Second, 18 participants clustered the images using a newly developed card sorting program. The mutual agreement between the participants was 57% and 56% and between the algorithm and the participants it was 47% and 45%, for respectively color and gray-scale texture images. Third, in a benchmark, 30 participants judged the algorithms' clusters with gray-scale textures as more homogeneous then those with colored textures. However, a high interpersonal variability was present for both the color and the gray-scale clusters. So, despite the promising results, it is questionable whether average human texture classification can be mimicked (if it exists at all).
In image and video analysis, distance maps are frequently used. They provide the (Euclidean) distance (ED) of background pixels to the nearest object pixel. In a naive implementation, each object pixel feeds its (exact) ED to each background pixel; then the minimum of these values denotes the ED to the closest object. Recently, the Fast Exact Euclidean Distance (FEED) transformation was launched, which was up to 2x faster than the fastest algorithms available. In this paper, first additional improvements to the original FEED
algorithm are discussed. Next, a timed version of FEED (tFEED) is presented, which generates distance maps for video sequences by merging partial maps. For each object in a video, a partial map can be calculated for different frames, where the partial map for fixed objects is only calculated once. In a newly developed, dynamic test-environment for robot navigation purposes, tFEED proved to be up to 7x faster than using FEED on each frame separately. It is up to 4x faster than the fastest ED algorithm available for video sequences and even 40% faster than generating city-block or chamfer distance maps for frames. Hence, tFEED is the first real time algorithm for generating exact ED maps of video sequences.
Neural networks have been successfully used to classify pixels in remotely sensed images. Especially backpropagation neural networks have been used for this purpose. As is the case with all classification methods, the obtained classification accuracy is dependent on the amount of spectral overlap between classes. In this paper we study the new idea of using hierarchical neural networks to improve the classification accuracy. The basic idea is to use a first level network to classify the easy pixels and then use one or more second level networks for the more difficult pixels. First a rather standard backpropagation neural network is trained using the training pixels of a ground truth set. Two ideas to select the difficult pixels are tested. The first one is to take those pixels for which the value of the winning neuron is below a threshold value. The second one is to select pixels from output classes, which get a high contribution from wrong input classes. Both ideas improve on the percentage correctly classified pixels and on the average percentage correctly classified pixels per class.
Imaging spectrometers acquire images in many narrow spectral bands. because of the limited spatial resolution the measured spectrum of a pixel is often a composition of a number of basic spectra. The purpose of fuzzy classification is to determine the presence and abundance of the basic spectra in a measured spectrum. Previous work demonstrated that a neural network could perform fuzzy classification. In this paper we study a more realistic situation of 10 basic spectra using 12 band airborne data and of 16 basic spectra using 6 band LANDSAT data. Available for this study were images and sets of pixels, which have been classified by inspection on the ground. For the LANDSAT case the set was not very pure and not very large. A method to purify and to expand data sets using image processing methods was therefore developed. Mixed pixels training and testing sets were generated from each original and generated set using a linear mixture model, where a mixed pixel could have a contribution from up to three classes. For each of the training sets a 1 hidden layer backpropagation neural network was trained to do the fuzzy classification. Testing the networks showed that they performed up to 20% better than the developed AnaML method, which is a combination of two classical methods.
Imaging spectrometers acquire images in many narrow spectral bands but have limited spatial resolution. Spectral mixture analysis (SMA) is used to determine the fractions of the ground cover categories (the end-members) present in each pixel. In this paper a new iterative SMA method is presented and tested using a 30 band MAIS image. The time needed for each iteration is independent of the number of bands, thus the method can be used for spectrometers with a large number of bands. Further a new method, based on K-means clustering, for obtaining endmembers from image data is described and compared with existing methods. Using the developed methods the available MAIS image was analyzed using 2 to 6 endmembers.
Within the instantaneous field of view of a scanning device often more than one object is included, resulting in a pixel in which several characteristics are mixed. Classically the proportions of the components of such a mixed pixel are estimated using a linear mixture model. In this paper a new method is introduced for estimating the characteristics of these components, from which their proportions can be derived. Experiments with simulated data sets are conducted to compare the methods regarding their accuracy on estimating the proportions. In addition it is determined how well the proposed method can estimate the characteristics of each component.
To provide a quantitative measure of the quality of a segmentation of an image a `true' segmentation must be known and the differences between the two segmentations must be transformed into one or more quality values. A method is described to generate a realistic satellite image and its true segmentation to sub-pixel level using ground truth data and a real image. Quality measures are described which evaluate two kinds of errors: the splitting of a real field into more than one segment and the merging of pixels from different fields into a segment. Results for various segmentation methods are discussed.
A hybrid segmentation method has been developed integrating two segmentation methods, edge detection and region growing in order overcome weaknesses of either method. The segmentation method involves the following: (i) filtering, (ii) edge detection and following (iii) edge fragment linking, and (iv) region growing. In (ii) edge detection is carried out. The resulting edge magnitude values are thresholded and on the thresholded values a thinning operation is performed in order to create one pixel thick edges. In (iii) the resulting edge fragments are linked together where possible by detecting one pixel wide gaps between edge fragments. By connecting the edge fragments closed polygons are formed, dividing the image into a set of sub-images. Edge fragments not belonging to a closed polygon are pruned. In (iv) region growing is carried out within every polygon. Regions are not allowed to grow outside the polygons. The region growing method used is the best merge, which merges per merging scan over the image the pair with a lowest cost value. For merging remaining isolated pixels context rules are defined. Results of the segmentation method are shown for classification of a non-segmented Landsat-TM scene and its segmented counterpart by an artificial neural network. Moreover the use of the segmentation for filtering SAR imagery is indicated.
Segmentation methods for images often have cost functions which evaluate the (dis)similarity between pixels or segments. Thresholds on cost values are then used to decide whether or not to grow, join or split segments. The results for a given image critically depend on the selection of the threshold values. In remote sensing, a too low threshold will split up regions of constant ground cover and a too high threshold will join adjacent regions of different ground cover. Optimal thresholds can be determined using different classes of methods: generating cost value distributions from the original image; obtaining statistical distributions from segmented images; comparing a 'true' segmentation with the results of segmentation using a range of thresholds. A so-called 'true' segmentation can be derived from human expert segmentations or from maps obtained by ground surveys or segmentation of higher resolution images. Also artificial images can be generated having the advantage that the segmentation is known to sub-pixel level. Several methods for threshold determination are described for a hybrid segmentation method developed by us. Measures are described for comparison of two segmentations. Results are evaluated using several (parts of) LANDSAT images and artificial generated images.