The SVG (Scalable Vector Graphics) standard permits to represent complex graphical scenes by a collection of vectorial-based primitives. In this work we are interested in finding some heuristic techniques to cover the gap between the graphical vectorial world and the raster real world typical of digital photography. SVG format could find useful application in the world of mobile imaging devices, where typical camera capabilities should match with limited color/size resolutions displays.
Two different techniques have been applied: Data Dependent Triangulation (DDT) and Wavelet Based Triangulation (WBT). The DDT replaces the input image with a set of triangles according to a specific cost function. The overall perceptual error is then minimized choosing a suitable triangulation. The WBT uses the wavelet multilevel transformation to extract the details from the input image. A triangulation is achieved at the lowest level, introducing large triangles; then the process is iteratively refined, according to the wavelet transformation. That means increasing the quantity of small triangles into the texturized areas and fixing the amount of large triangles into the smooth areas.
Both DDT and WBT are then processed by the polygonalization. The purpose of this function is to merge triangles together reducing the amount of redundancies present into the SVG files.
The proposed technique has been compared with other raster to vector methods showing good performances. Experiments can be found in the SVG UniCT Group page http://svg.dmi.unict.it/.
New technologies, such as mobile internet, and handheld computing device offer everyone new opportunities to use medical image data regardless of time and places. But the limited resources in a mobile environment compose a major problem to represent medical data visually in 3 dimensions. This paper proposes a server side system for providing medical visualization service on the web using SVG to counteract the limited resource problem.
The web offers great promise for delivery of anatomical information once we have solutions to some of the economic and bandwidth problems. These potential benefits include widespread access, essentially unlimited content, flexibility in presentation format, and an organizational frame work for other web-based medical information. SVG is an API for graphical applications with an XML-based file format, powerful scripting, and event support. It allows for three types of graphic objects; vector graphics shapes, images, text with very compact expression. It can be used as a platform in which we can build graphically rich applications and user interfaces like 3D medical data visualization system.
The web server is connected to database 3D anatomical organs reconstructed from the Visual Human by polygon mesh through Marching cubes algorithm. 3D anatomical organs are stored into the database with multiresolution. The web server accepts commands from web clients, and processes them. The anatomical organs are presented by SVG, combined with each other or with planar slices and become structured anatomical scenes. The resolution of the anatomical scenes is to fit to the screen of the web clients. The anatomical scenes generated by the web server are encoded on-the-fly by an encode module and sent through the network. The data is received, decoded, and displayed on the clients. User interactions with the graphic interface on the clients’ side are made. The anatomical scenes are visualized with symbolic knowledge and data. The anatomical scenes generated by the web server are encoded on-the-fly by an encode module and sent through the network. The data is received, decoded, and displayed on the clients. Especially SVG have a set of 15 types of filters, which can be combined into a dataflow filter network applied to primitives. The paper explores the usability of the filters in visualizing anatomical organs for medical education.
The final goal of the present system is to have the medical image be available on professionals and students in medicine or the related area with handheld computing device. The present system provides more flexible access and medical image data visualization with them by using the power of the Web and SVG technique.
This paper presents a novel raster-to-vector technique for digital images by advanced watershed decomposition coupled with some ad-hoc heuristics devoted to obtain high quality rendering of digital photography. The system is composed by two main steps: first, the image is partitioned into homogeneous and contiguous regions using Watershed decomposition. Then, a Scalable Vector Graphics (SVG) representation of such areas is achieved by ad-hoc chain code building. The final result is an SVG file of the image that can be used for the transmission of pictures through Internet using different display systems (PC, PDA, Cellular Phones). Experimental results and comparisons provide the effectiveness of the proposed method.
This paper will argue in favor of a comprehensive model of image data bases, which allows the inclusion of computer vision technique into a formal query framework on a rigorous data base foundation. It attempts to give a first, very tentative direction that this framework could take. The main idea of the paper is that a correct way to create a data base that relies on such heterogeneous techniques as those developed by computer vision researchers without collapsing under the sheer weight of its own complexity goes through the definition of abstract data types, and of suitable techniques to manipulate them in a query system without having to know anything of their implementation, that is, purely from a functional point of view.
This paper reports the results of a usability experiment that investigated visual query formulation on three dimensions: effectiveness, efficiency, and user satisfaction. Twenty eight evaluation sessions were conducted in order to assess the extent to which query by visual example supports visual query formulation in a content-based image retrieval environment. In order to provide a context and focus for the investigation, the study was segmented by image type, user group, and use function. The image type consisted of a set of abstract geometric device marks supplied by the UK Trademark Registry. Users were selected from the 14 UK Patent Information Network offices. The use function was limited to the retrieval of images by shape similarity. Two client interfaces were developed for comparison purposes: Trademark Image Browser Engine (TRIBE) and Shape Query Image Retrieval Systems Engine (SQUIRE).
Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural
interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This paper explores new ways of human-computer
interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and physiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.
When completely automated systems don't yield acceptable accuracy, many practical pattern recognition systems involve the human either at the beginning (pre-processing) or towards the end (handling rejects). We believe that it may be more useful to involve the human throughout the recognition process rather than just at the beginning or end. We describe a methodology of interactive visual recognition for human-centered low-throughput applications, Computer Assisted Visual InterActive Recognition (CAVIAR), and discuss the prospects of implementing CAVIAR over the Internet. The novelty of CAVIAR is image-based interaction through a domain-specific parameterized geometrical model, which reduces the semantic gap between humans and computers. The user may interact with the computer anytime that she considers its response unsatisfactory. The interaction improves the accuracy of the classification features by improving the fit of the computer-proposed model. The computer makes subsequent use of the parameters of the improved model to refine not only its own statistical model-fitting process, but also its internal classifier. The CAVIAR methodology was applied to implement a flower recognition system. The principal conclusions from the evaluation of the system include: 1) the average recognition time of the CAVIAR system is significantly shorter than that of the unaided human; 2) its accuracy is significantly higher than that of the unaided machine; 3) it can be initialized with as few as one training sample per class and still achieve high accuracy; and 4) it demonstrates a self-learning ability.
We have also implemented a Mobile CAVIAR system, where a pocket PC, as a client, connects to a server through wireless communication. The motivation behind a mobile platform for CAVIAR is to apply the methodology in a human-centered pervasive environment, where the user can seamlessly interact with the system for classifying field-data. Deploying CAVIAR to a networked mobile platform poses the challenge of classifying field images and programming under constraints of display size, network bandwidth, processor speed, and memory size. Editing of the computer-proposed model is performed on the handheld while statistical model fitting and classification take place on the server. The possibility that the user can easily take several photos of the object poses an interesting information fusion problem. The advantage of the Internet is that the patterns identified by different users can be pooled together to benefit all peer users. When users identify patterns with CAVIAR in a networked setting, they also collect training samples and provide opportunities for machine learning from their intervention. CAVIAR implemented over the Internet provides a perfect test bed for, and extends, the concept of Open Mind Initiative proposed by David Stork. Our experimental evaluation focuses on human time, machine and human accuracy, and machine learning. We devoted much effort to evaluating the use of our image-based user interface and on developing principles for the evaluation of interactive pattern recognition system. The Internet architecture and Mobile CAVIAR methodology have many applications. We are exploring in the directions of teledermatology, face recognition, and education.
With the increase of multimedia contents in the internet, people need to handle a large amount of multimedia contents in the web as well as e-mail. Visual data mining is needed to find appropriate visual data among large multimedia contents. But editing process, which is common on the web affects the feature of visual data, and causes false retrieval in current visual mining system. In this paper, we propose an improving visual mining method by detecting and reducing image editing effects.
We propose a novel approach to retrieve similar images from image databases that works in the presence of significant illumination variations. The most common method to compensate for illumination changes is to perform color normalization. The existing approaches to color normalization tend to destroy image content in that they map distinct color values to identical color values in the transformed color space. From the mathematical point of view, the normalization transformation is not reversible. In this paper we propose to use a reversible illumination normalization transformation. Thus, we are able to compensate for illumination changes without any reduction of content information. Since natural illumination changes affect different parts of images in different amounts, we apply our transformation locally to sub-images. Basic idea is to divide an image into sub-images, normalize each one separately, and then project it to an n-dimensional reduced space using principal component analysis. This process yields a normalized texture representation as a set of n-vectors. Finding similar images is now reduced to computing distances between sets of n-vectors. Results were compared with a leading image retrieval system.
We consider the well-known problem of segmenting a color image into foreground-background pixels. Such result can be obtained by segmenting the red, green and blue channels directly. Alternatively, the result may be obtained through the transformation of the color image into other color spaces, such as HSV or normalized colors. The problem then is how to select the color space or color channel that produces the best segmentation result. Furthermore, if more than one channels are equally good candidates, the next problem is how to combine the results. In this article, we investigate if the principles of the formal model for diversification of Markowitz (1952) can be applied to solve the problem. We verify, in theory and in practice, that the proposed diversification model can be applied effectively to determine the most appropriate combination of color spaces for the application at hand.
JPEG2000 is a new image coding standard and system. A client side cache model is defined in JPEG2000 Part 9 - Interactivity tools, APIs and protocols (JPIP). The JPIP standard does not define how the client side and the server side internally keeps track of the client side cache. In this paper we propose XML based JPIP client side cache model management. We propose to use XML for JPIP client side cache model management because XML is standardized and easily exchangeable approach. Since the JPIP cache model descriptors follow hierarchical structure, this can be very well represented using XML. Using our proposed approach, the client and/or server side can use XML DOM API to easily add, delete, and update its cache model view of the client cache. We define a XML DTD and Schema for the proposed XML based client side cache model management.
Within the framework of telemedicine, the amount of images leads first to use efficient lossless compression methods for the aim of storing information. Furthermore, multiresolution scheme including Region of Interest processing is an important feature for a remote access to medical images. Moreover, the securization of sensitive data (e.g. metadata from DICOM images) constitutes one more expected functionality: indeed the lost of IP packets could have tragic effects on a given diagnosis. For this purpose, we present in this paper an original scalable image compression technique (LAR method) used in association with a channel coding method based on the Mojette Transform, so that a hierarchical priority encoding system is elaborated.
The LAR (Locally Adaptive Resolution) coder, based on an non-uniform subsampling of the image, is a multi-layered scheme that provides just as well lossless representation of data as very low-bit rates encoded images. The Mojette transform technique realizes multiple description of information elements in a very low order of complexity. These descriptions are transmitted without adding any specific mechanism for regulating flows purpose. This global system provides a solution for secured transmission of medical images through low-bandwidth networks such as the Internet.
The vast amount of video sequences available in digital format presents considerable challenges for descriptor extraction and information retrieval. The dominant motion in a video scene proves to be very important to characterize video sequences, but the cost to compute it is high when working in image domain because the retrieval of the optical flow of two consecutive frames is very demanding in terms of time, as well as the following estimation of parameters. In this paper we present a method to extract an affine description of the global motion of a video sequence using a robust estimator based on compressed domain data, where the motion vector field is already calculated. We perform further analysis, isolating and describing parametrically the local motions using the mean shift analysis as non parametric clustering method. Applying our approach to real sequences, we take advantage of the parametric description extracted to perform video summarization of the sequences using image mosaics.
Many current video analysis systems fail to fully acknowledge the
process that resulted in the acquisition of the video data, i.e. they don't view the complete multimedia system that encompasses the several physical processes that lead to the captured video data. This multimedia system includes the physical process that created the appearance of the captured objects, the capturing of the data by the sensor (camera), and a model of the domain the video data belongs to. By modelling this complete multimedia system, a much more robust and theoretically sound approach to video analysis can be taken. In this paper we will describe such a system for the detection, recognition and tracking of objects in video's. We will introduce an extension of the mean shift tracking process, based on a detailed model of the video capturing process. This system is used for two applications in the soccer video domain: Billboard recognition and tracking and player tracking.
We propose a visual tracking system that uses RFID-tags to identify
objects. The system firstly identifies an object in front of the
camera, and pulls up data of the object from the database. The data
includes a cad model of the object that is used for estimating 3D
motion relative to the camera and a set of image features that is used
for detecting the object in the initial image. The set of image
features is generated based on the cad model by means of the AdaBoost
algorithm and distinguishes the object in images from the backgrounds
efficiently. Identifying the object, the system processes images using
models that are specialized in the object in front of the camera.
The Internet in combination with digital presses has allowed the geographical distribution of manufacturing printed materials. An increasing number of printed pieces is customized for the recipient; when each printed piece is different, conventional proofing fails, because it is impossible to proof the entire print job. One frequent problem in automatically generated pieces is the readability of one page element on top of another element; the color combination can be unreadable or clash. I propose simple algorithms to automatically detect and correct color discriminability problems in variable data printing.
Web-based Virtual Tour has become a desirable and demanded application, yet challenging due to the nature of web application's running environment such as limited bandwidth and no guarantee of high computation power on the client side. Image-based rendering approach has attractive advantages over traditional 3D rendering approach in such Web Applications. Traditional geometry-based approach, such as VRML, requires labor-intensive 3D modeling process, high bandwidth and computation power especially for photo-realistic virtual scenes. QuickTime VR and IPIX as examples of image-based approach, use panoramic photos and the virtual scenes that can be generated from photos directly skipping the modeling process. But, QuickTime VR and IPIX may not only require special cameras or effort to take panoramic views but also provide only one fixed-point navigation (look-around and zooming in-out) rather than "walk around", that is a very important feature to provide immersive experience to virtual tourists. Easy and Effective Virtual Tour constructs virtual tour using several snap shots of conventional photos without special tools, build simple 3D space within each photo using spidery mesh, and expand the virtual spaces by connecting each other using simple user intervention to specify correspondence. The expanded virtual space provides virtual tourists with free navigation and immersive experience of walking around through the WWW.
With advances and availability of information and communication technology infrastructures in some nations and institutions, patients are now able to receive healthcare services from doctors and healthcare centers even when they are physically separated. The availability and transfer of patient data which often include medical images for specialist opinion is invaluable both to the patient and the medical practitioner in a telemedicine session. Two existing approaches to telemedicine are real-time and stored-and-forward. The real-time requires the availability or development of video-conferencing infrastructures which are expensive especially for most developing nations of the world while stored-and-forward could allow data transmission between any hospital with computer and telephone by landline link which is less expensive but with delays. We therefore propose a hybrid design of applications using hypermedia database capable of harnessing the features of real-time and stored-and-forward deployed over a wireless Virtual Private Network for the participating centers and healthcare providers.
Content-based image classification is a wide research field addressing the problem of categorizing images according to their content. A common way to approach content-based classification is through learning from examples --- a given class of images is described by means of a suitable training set of data. The main drawback of this approach is the fact that collecting data to build homogeneous training and validation sets is a boring and time consuming task, even if the Web can help providing a potentially inexhaustible source of images. In this paper we present a system to automatically download images from the Web and a selection of techniques useful to prune the images downloaded according to some criteria. These techniques work as filters at various degrees of complexity: some are simple measurements other are image classifiers themselves. We focus on two critical ones (monochrome vs color images and photos vs graphics) showing their effectiveness on a manually labeled validation set of data. We conclude the paper analyzing the overall performance of the system with an a posteriori analysis of the results obtained in a few run.
Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.
In Content-based Image Retrieval the comparison of a query image and each of the database images is defined by a similarity distance obtained from the two feature vectors involved. These feature vectors can be seen as sets of noisy indexes. Unlike text matching (that is exact) image matching is only approximate, leading to ranking
methods. Only images at the top ranks (within the scope) are returned as retrieval results. Image retrieval performance characterization has mainly been based on measures available from probabilistic text retrieval in the form of Precision-Recall or Precision-Scope graphs. However, these graphs offer an incomplete overview of the image retrieval system under study. Essential information about how the success of the query is influenced by the size and type of irrelevant images is missing. Due to the inexactness of the visual matching process, the effect of the irrelevant embedding, represented in the additional performance measure generality, plays an important role.
In general, a performance graph will be three-dimensional, a Generality-Recall-Precision Graph. By choosing appropriate scope values a new two-dimensional performance graph, the Generality-Recall-Precision Graph, is proposed to replace the commonly used Precision-Recall Graph, as the better choice for total recall studies.
We propose a graphical indexing of images to be exposed on the Web. This should be accomplished by "keypics", i.e. auxiliary,
simplified pictures referring to the geometrical and/or the semantic content of the indexed image. Keypics should not be rigidly standardized; they should be left free to evolve, to express nuances and to stress details. A mathematical tool for dealing with such freedom already exists: Size Functions. We support the idea of keypics with some experiments on a 498 images dataset.
Proteins are long chains of amino acids that have a definite 3-d conformation and the shape of each protein is vital to its function. Since proteins are normally in solution, hydrodynamics (describes the movement of solvent around a protein as a function of shape and size of the molecule) can be used to probe the size and shape of proteins compared to those derived from X-ray crystallography. The computation chain needed for these hydrodynamics calculations consists of several separate programs by different authors on various platforms and often requires 3D visualizations of intermediate results. Due to the complexity, tools developed by a particular research group are not readily available for use by other groups, nor even by the non-experts within the same research group. To alleviate this situation, and to foment the easy and wide distribution of computational tools worldwide, we developed a web based interactive computational environment (WICE) including interactive 3D visualization that can be used with any web browser. Java based technologies were used to provide a platform neutral, user-friendly solution. Java Server Pages (JSP), Java Servlets, Java Beans, JOGL (Java bindings for OpenGL), and Java Web Start were used to create a solution that simplifies the computing chain for the user allowing the user to focus on their scientific research. WICE hides complexity from the user and provides robust and sophisticated visualization through a web browser.
This paper considers the deceptively simple question: Why can't digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
Human face gives various kinds of information such as feeling, age, physical condition and so on. Therefore it is important to acquire and reconstruct the facial pattern precisely for various kinds of applications. Our goal is to create the electronic cosmetics systems with real-time tracking of 3D facial pattern. A simple 3D virtual mirror interface system is proposed for acquisition and reconstruction of facial pattern, which enables a real time tracking of 3D facial pattern. The movement of lip in the face, which can be estimated based on mean shift and Kalman filter is introduced to recognize the rotation and parallel shift of face. Modified Photometric Stereo Method is also proposed to acquire the 3D shape without the influence of deficiency in shadow area of the object. It is shown that the proposed system is effective to acquire and reconstruct for the real-time tracking of 3D facial pattern image than traditional approaches. We consider that this interface can be applied to the electronic cosmetics systems that can simulate make-up, plastic surgery and color appearance under various illuminations.
In modern digital broadcasting environment, broadcasting content filtering could provide a useful function that a TV viewer can find or store personal desired scenes from programs of multiple channels and it can be done even when one is watching a program from other channel. To achieve the filtering function in live broadcast, real-time processing is needed basically. In this paper, a broadcasting content filtering algorithm is proposed and filtering system requirements for the real-time content filtering are analyzed. To achieve real-time content processing, a buffer control algorithm is proposed as well. The usefulness of the broadcasting content filtering is demonstrated with experiments on a test-bed system.
This paper presents an innovative method that combines a feature-based approach with a holistic approach for tri-dimensional face detection and localization. Salient face features, such as the eyes and nose, are detected through an analysis of the curvature of the surface. In a second stage, each triplet consisting of a candidate nose and two candidate eyes is then processed by a PCA-based classifier, trained to discriminate between faces and non-faces. The method has been tested on about 150 3D faces acquired by a laser range scanner with good results.
Aimed at the limitations of existing keywords-based image search engines on Internet, in this paper, a set of solution based on vision features image search engine is presented. At first, referring to the universal system design norm provided in the MPEG-7, the methods for image features description, extraction and index, high effect algorithms for image features similarity measure and fast retrieval are deep researched, and a new representation combined wavelet with relative moments is given. Then, the advantages of artificial intelligence, data mining and optimal information search strategy on Internet are availably used for constructing a prototype system based on vision features image search engine. The experimental results show that the solution is reasonable and feasible.
A new MD (multiple description) video coding method, which is based on balanced multiwavelet image transformations, is proposed here. First, we apply balanced multiwavelet transformation to the image; then, corresponding components of each sub-band are gathered together, so that we can decompose the image into 4 MDs. By treating every frame of the video sequences like this, we can get a theme of MD video coding. A practical MD coding theme must satisfy two requirements. First, each description should carry the same amount of information. Secondly, there must be dependence between each description. Among commonly used multiwavelets, we find that only balanced multiwavelets can satisfy these two requirements. Furthermore, based on the feature of CARDBAL2 multiwavelet and strict mathematical deductions, we also find a way to estimate the lost descriptions. The experimental results presented in this paper show that, even when 75% of the data of the image are lost, we could still get a good-quality recovered image, with a PSNR value of nearly 30dB.
Universities, Governmental administrations, photography agencies and many other companies or individuals need framework to manage their multimedia documents and the copyright or authenticity attached to their images. We purpose a web-based interface able to realize many operations: storage, image navigation, copyright insertion, authenticity verification. When a photography owner wants to store and to publish the document on the Internet, he will use the interface
to add his images and set the internet sharing rules. The user can choose for example watermarking method or resolution viewing. He set the parameters visually in way to consider the best ratio between quality and protection. We propose too an authenticity module which will allow online verification of documents. Any user on internet, knowing the key encoding, will be able to verify if an watermarked image have been altered or not. Finally, we will give some practical
examples of our system. In this study, we merge the last technology in image protection and navigation to offer a complete scheme able to manage the images published. It allows to use only one system to supply the security and the publication of their images.
We propose an innovative approach to the selection of representative frames of a video shot for video summarization. By analyzing the differences between two consecutive frames of a video sequence, the algorithm determines the complexity of the sequence in terms of visual content changes. Three descriptors are used to express the frame’s visual content: a color histogram, wavelet statistics and an edge direction histogram. Similarity measures are computed for each descriptor and combined to form a frame difference measure. The use of multiple descriptors provides a more precise representation, capturing even small variations in the frame sequence. This method can dynamically, and rapidly select a variable number of key frame within each shot, and does not exhibit the complexity of existing methods based on clustering algorithm strategies. The method has been tested on various video segments of different genres (trailers, news, animation, etc.) and preliminary results shows that the algorithm is able to effectively summarize the shots capturing the most salient events in the sequences.
Video applications usually involve a large number of moving objects. Moving objects refer to semantic real-world entity definitions that are used to denote a coherent spatial region and be automatically computed by the continuity of spatial low-level features, such as color and motion. Spatial and temporal relationships among these objects should be efficiently supported and retrieved within a video authoring tool. In this paper we present several spatial, temporal and spatio-temporal relationships of interest and propose efficient indexing scheme, based on multidimensional spatial data structures, for video applications that involve objects. So, we emphasize on analyzing and interpreting video object motions for advanced video
application. To realize this objective the research in this field
is subdivided into two main directions: (1) Moving objects
description at the low levels: using the spatio-temporal
relationships to analyze and present the video object motions.
(2) Moving object description at the semantic level: Actions,
events and interactions of moving objects.