One of the major challenges in human behavior modeling for military applications is dealing with all factors that can
influence behavior and performance. In a military context, behavior and performance are influenced by the task at hand,
the internal (cognitive and physiological) and external (climate, terrain, threat, equipment, etc.) state. Modeling the
behavioral effects of all these factors in a centralized manner would lead to a complex rule-base that is difficult to
maintain or expand. To better cope with this complexity we have developed the Capability-based Human-performance
Architecture for Operational Simulation (CHAOS). CHAOS is a multi-agent system for human behavior modeling that is
based on pandemonium theory. Every agent in CHAOS represents a specific part of behavior, such as 'reaction to threat'
or 'performing a patrol task'. These agents are competing over a limited set of resources that represent human
capabilities. By combining the element of competition with multiple limited resources, CHAOS allows us to model
stress, strain and multi-tasking in an intuitive manner. The CHAOS architecture is currently used in firefighter and
dismounted soldier simulations and has shown itself to be suitable for human behavior and performance modeling.
The choice of a colour space is of great importance for many computer vision algorithms (e.g. edge detection and object recognition). It induces the equivalence classes to the actual algorithms. Since there are many colour spaces available, the problem is how to automatically select the weighting to integrate the colour spaces in order to produce the best result for a particular task. In this paper we propose a method to learn these weights, while exploiting the non-perfect correlation between colour spaces of features through the principle of diversification. As a result an optimal trade-off is achieved between repeatability and distinctiveness. The resulting weighting scheme will ensure maximal feature discrimination.
The method is experimentally verified for three feature detection tasks: Skin colour detection, edge detection and corner detection. In all three tasks the method achieved an optimal trade-off between (colour) invariance (repeatability) and discriminative power (distinctiveness).
Many current video analysis systems fail to fully acknowledge the
process that resulted in the acquisition of the video data, i.e. they don't view the complete multimedia system that encompasses the several physical processes that lead to the captured video data. This multimedia system includes the physical process that created the appearance of the captured objects, the capturing of the data by the sensor (camera), and a model of the domain the video data belongs to. By modelling this complete multimedia system, a much more robust and theoretically sound approach to video analysis can be taken. In this paper we will describe such a system for the detection, recognition and tracking of objects in video's. We will introduce an extension of the mean shift tracking process, based on a detailed model of the video capturing process. This system is used for two applications in the soccer video domain: Billboard recognition and tracking and player tracking.
In this paper we present a system for the localisation and tracking of
billboards in streamed soccer matches. The application area for this research is the delivery of customised content to end users. When international soccer matches are broadcast, the diversity of the audience is very large and advertisers would like to be able to adapt the billboards to the different audiences. By replacing the billboards in the video stream this can be achieved. In order to build a more robust system, photometric invariant features are used. These colour features are less susceptible to the changes in illumination. Sensor noise is dealt with through variable kernel density estimation.
Static multimedia on the Web can already be hardly structured manually. Although unavoidable and necessary, manual annotation of dynamic multimedia becomes even less feasible when multimedia quickly changes in complexity, i.e. in volume, modality, and usage context. The latter context could be set by learning or other purposes of the multimedia material. This multimedia dynamics calls for categorisation systems that index, query and retrieve multimedia objects on the fly in a similar way as a human expert would. We present and demonstrate such a supervised dynamic multimedia object categorisation system. Our categorisation system comes about by continuously gauging it to a group of human experts who annotate raw multimedia for a certain domain ontology given a usage context. Thus effectively our system learns the categorisation behaviour of human experts. By inducing supervised multi-modal content and context-dependent potentials our categorisation system associates field strengths of raw dynamic multimedia object categorisations with those human experts would assign. After a sufficient long period of supervised machine learning we arrive at automated robust and discriminative multimedia categorisation. We demonstrate the usefulness and effectiveness of our multimedia categorisation system in retrieving semantically meaningful soccer-video fragments, in particular by taking advantage of multimodal and domain specific information and knowledge supplied by human experts.
We propose a multi-scale and multi-modal analysis and processing scheme for audio-video data. Using a non-linear scale-space technique audio-video is analyzed and processed such that it is invariant under various imaging and hearing conditions. Degradations due to Lyapunov and structural instabilities are suppressed by this scale-space technique without destroying essential semantic relations. On the basis of an audio-video segmentation its arrangements are quantified in terms of spatio-temporal inclusion relations and dynamic ordening relations by means of scaling connectivity relations. These relations infer a topological structure on top of the audio-video scale-space inducing a unimodal and multi-modal semantics. Our scheme is illustrated separately for video, audio and audio-video material the latter pointing out the added value of integrating audio and video.
In this paper, we study computational models and techniques to combine textural and image features for classification of images on Internet. A framework is given to index images on the basis of textural, pictorial and composite information. The scheme makes use of weighted document terms and color invariant image features to obtain a high-dimensional similarity descriptor to be used as an index. Based on supervised learning, the k-nearest neighbor classifier is used to organize images into semantically meaningful groups of Internet images. Internet images are first classified into photographical and synthetical images. After classifying images into photographical and synthetical images, we further classify photographical images into portraits and non-portraits. Further, synthetical images are classified into button and non-button images.