Static multimedia on the Web can already be hardly structured manually. Although unavoidable and necessary, manual annotation of dynamic multimedia becomes even less feasible when multimedia quickly changes in complexity, i.e. in volume, modality, and usage context. The latter context could be set by learning or other purposes of the multimedia material. This multimedia dynamics calls for categorisation systems that index, query and retrieve multimedia objects on the fly in a similar way as a human expert would. We present and demonstrate such a supervised dynamic multimedia object categorisation system. Our categorisation system comes about by continuously gauging it to a group of human experts who annotate raw multimedia for a certain domain ontology given a usage context. Thus effectively our system learns the categorisation behaviour of human experts. By inducing supervised multi-modal content and context-dependent potentials our categorisation system associates field strengths of raw dynamic multimedia object categorisations with those human experts would assign. After a sufficient long period of supervised machine learning we arrive at automated robust and discriminative multimedia categorisation. We demonstrate the usefulness and effectiveness of our multimedia categorisation system in retrieving semantically meaningful soccer-video fragments, in particular by taking advantage of multimodal and domain specific information and knowledge supplied by human experts.
We propose a multi-scale and multi-modal analysis and processing scheme for audio-video data. Using a non-linear scale-space technique audio-video is analyzed and processed such that it is invariant under various imaging and hearing conditions. Degradations due to Lyapunov and structural instabilities are suppressed by this scale-space technique without destroying essential semantic relations. On the basis of an audio-video segmentation its arrangements are quantified in terms of spatio-temporal inclusion relations and dynamic ordening relations by means of scaling connectivity relations. These relations infer a topological structure on top of the audio-video scale-space inducing a unimodal and multi-modal semantics. Our scheme is illustrated separately for video, audio and audio-video material the latter pointing out the added value of integrating audio and video.
We present and demonstrate a mathematical, physical and logical framework for classifying images at various scales (dynamic and spatio-temporal resolutions) such that the Internet requirements concerning e.g. MPEG7/21 standards and available bandwidth are met. The mathematical and physical framework hinges on the (de) categorification (simplification and abstraction) of the dynamics involved in image formation and the Internet requirements at various scales. Firstly, the dynamics is categorified by an initialization of physical fields, such as color models, subjected to a gauge group capturing various imaging conditions. A decategorification of those fields consists of joint (non-) local geometric and topological equivalences (symmetries or invariants). Secondly, categorifications of dynamic scale-space paradigms for these equivalences are derived incorporating Internet requirements. These paradigms are set up to be robust to particular imaging conditions, Lyapunov instabilities (noise) in image formation and to structural instabilities due to e.g. changes in Internet requirements. The logical framework consists of a decategorification of the various dynamic scale-space paradigms and their evolutions caused by changing Internet requirements in terms of (non-)local symmetries, conservation laws and curvatures. Simple examples of (de) categorifications of dynamic scale-space paradigms taking into account Internet requirements are presented.
The geometric and statistical physical concepts of dynamic scale-space paradigms are presented and juxtaposed to those of mathematical morphology. It turns out that the dynamic paradigms can be applied to, substantiate and even generalize the morphological techniques and paradigms. In particular the importance of the dynamic scale-space concepts in granulometry by means of size densities or statistical morphological operations, and in morphological scale-space theories by means of parabolic dilations and watersheds is pointed out.
A space curve, e.g., a parabolic line on a 2-dimensional surface in 3-dimensional Euclidean space, induces a plane curve under projective mapping. But 2-dimensional scalar input images of such an object are, normally, spatio-temporal slices through a luminance field caused by the interaction of an external field and that object. Consequently, the question arises how to obtain from those input images a consistent description of the space curve under projective transformations. By means of classical scale space theory, algebraic invariance theory, and classical differential geometry a new method of shape description for space curves from one or multiple views is proposed in terms of complete and irreducible sets of affine and projective differential geometric invariants. The method is based on defining implicitly connections for the observed curves that are highly correlated to the projected space curves. These projected curves are assumed to reveal themselves as coherent structures in the scale space representation of the differential structure of the input images. Several applications to stereo, optic flow, texture analysis, and image matching are briefly indicated.