This paper describes Data Modeling for unstructured data of Diffusion Tensor Imaging (DTI). Data Modeling is an
essential first step for data preparation in any data management and data mining procedure. Conventional Entity-
Relational (E-R) data modeling is lossy, irreproducible, and time-consuming especially when dealing with unstructured
image data associated with complex systems like the human brain. We propose a methodological framework for more
objective E-R data modeling with unlimited query support by eliminating the structured content-dependent metadata
associated with the unstructured data. The proposed method is applied to DTI data and a minimum system is
implemented accordingly. Eventually supported with navigation, data fusion, and feature extraction modules, the
proposed system provides a content-based support environment (C-BASE). Such an environment facilitates an unlimited
query support with a reproducible and efficient database schema. Switching between different modalities of data, while
confining the feature extractors within the object(s) of interest, we supply anatomically specific query results. The price
of such a scheme is relatively large storage and in some cases high computational cost. The data modeling and its
mathematical framework, behind the scene of query executions and the user interface of the system are presented in this
Protein expression analysis has traditionally relied upon visual evaluation of immunohistochemical reaction by a pathologist, who analyzes the grade of staining intensity and estimates the percentage of cells stained in the area of interest. This method is effective in experienced hands but has potential limitations in its reproducibility due to subjectivity between and within operators. These limitations are particularly pronounced in gray areas where a distinction of weak from moderate protein expression can be clinically significant. Some research also suggests that sub localization of the protein expression into different components such as nuclei versus cytoplasm may be of great importance. This distinction can be particularly difficult to quantify using manual methods. In this paper, we formulate the problem of quantitative protein expression analysis as an active learning classification problem, where a very small set of pre-sampled user data is used for understanding expert evaluation. The expert coveted confidence is mapped to derive an uncertainty region to select the supplemental learning data. This is done by posing a structured query to the unknown data set. The newly identified samples are then augmented to the training set for incremental learning. The strength of our algorithm is measured in its ability to learn with minimum user interaction. Chroma analysis results of a Tissue Micro-array (TMA) images are presented to demonstrate the user interaction and learning ability. The chroma analysis results are then processed to obtain quantitative results.
In this paper we present a new online learning and classification algorithm and suggest its use for image segmentation. Our learning algorithm follows a variation of Bayesian estimation procedure, which combines prior knowledge and knowledge learned from data. Our classification algorithm strictly follows a statistical classification procedure. The new online learning algorithm is simple to implement, robust to initial parameters and has a linear complexity. The experimental results using computer generated data show that the proposed online learning algorithm can quickly learn the underlying structure from data.
The proposed online learning algorithm is used to develop a novel image segmentation procedure. This image segmentation procedure is based on the region growing and merging approach. First, region growing is carried out using the online learning algorithm. Then, a merging operation is performed to merge the small regions. Two merging methods are proposed. The first method is based on statistical similarity and merges the statistically similar and spatially adjacent regions. The second method uses an information-based approach merging small regions into their neighbouring larger regions. Many experimental results clearly show the efficacy of the proposed image segmentation method.
In this paper, we propose an efficient similarity search system PathSOM that combines Self-Organizing Map (SOM) and Pathfinder Networks (PFNET). In the front end of the system, SOM is applied to cluster the original data vectors and construct a visual map of the data. The Pathfinder network then organizes the SOM map units in the form of a graph to yield a framework for an improved search to find the best matching map unit. The ability of PathSOM approach for
efficient searches is demonstrated through well-known data sets.
This paper describes a method for creating agents for locating images of specific categories such as sky, vegetation, fire, and smoke. The method uses only color information, and is based on vector quantization to build a category specific codebook from a set of training images. The method is shown to yield categorization of images collected from several web sites on Internet with a high success rate. The method can be used as an aid to image annotation or as a way to filter images in a content-based image retrieval system.
The goal of the current paper is to introduce a novel clustering algorithm that has been designed for grouping transcribed textual documents obtained out of audio, video segments. Since audio transcripts are normally highly erroneous documents, one of the major challenges at the text processing stage is to reduce the negative impacts of errors gained at the speech recognition stage. Other difficulties come from the nature of conversational speech. In the paper we describe the main difficulties of the spoken documents and suggest an approach restricting their negative effects. In our paper we also present a clustering algorithm that groups transcripts on the base of informative closeness of documents. To carry out such partitioning we give an intuitive definition of informative field of a transcript and use it in our algorithm. To assess informative closeness of the transcripts, we apply Chi-square similarity measure, which is also described in the paper. Our experiments with Chi-square similarity measure showed its robustness and high efficacy. In particular, the performance analysis that have been carried out in regard to Chi-square and three other similarity measures such as Cosine, Dice, and Jaccard showed that Chi-square is more robust to specific features of spoken documents.
In image similarity retrieval systems, color is one of the most widely used features. Users who are not well versed with the image domain characteristics might be more comfortable in working with an Image Retrieval System that allows specification of a query in terms of keywords, thus eliminating the usual intimidation in dealing with very primitive features. In this paper we present two approaches to automatic image annotation, by finding those rules underlying the links between the low-level features and the high-level concepts associated with images. One scheme uses global color image information and classification tree based techniques. Through this supervised learning approach we are able to identify relationships between global color-based image features and some textual decriptors. In the second approach, using low-level image features that capture local color information and through a k-means based clustering mechanism, images are organized in clusters such that images that are similar are located in the same cluster. For each cluster, a set of rules is derived to capture the association between the localized color-based image features and the textual descriptors relevant to the cluster.
Humans tend to use high-level semantic concepts when querying and browsing multimedia databases; there is thus, a need for systems that extract these concepts and make available annotations for the multimedia data. The system presented in this paper satisfies this need by automatically generating semantic concepts for images form their low-level visual features. The proposed system is built in two stages. First, an adaptation of k-means clustering using a non- Euclidean similarity metric is applied to discover the natural patterns of the data in the low-level feature space; the cluster prototype is designed to summarize the cluster in a manner that is suited for quick human comprehension of its components. Second, statistics measuring the variation within each cluster are used to derive a set of mappings between the most significant low-level features and the most frequent keywords of the corresponding cluster. The set of the derived rules could be used further to capture the semantic content and index new untagged images added to the image database. The attachment of semantic concepts to images will also give the system the advantage of handling queries expressed in terms of keywords and thus, it reduces the semantic gap between the user's conceptualization of a query and the query that is actually specified to the system. While the suggested scheme works with any kind of low-level features, our implementation and description of the system is centered on the use of image color information. Experiments using a 21 00 image database are presented to show the efficacy of the proposed system.
In this research, we studied the joint use of visual and audio information for the problem of identifying persons in real video. A person identification system, which is able to identify characters in TV shows by the fusion of audio and visual information, is constructed based on two different fusion strategies. In the first strategy, speaker identification is used to verify the face recognition result. The second strategy consists of using face recognition and tracking to supplement speaker identification results. To evaluate our system's performance, an information database was generated by manually labeling the speaker and the main person's face in every I-frame of a video segment of the TV show 'Seinfeld'. By comparing the output form our system with our information database, we evaluated the performance of each of the analysis channels and their fusion. The results show that while the first fusion strategy is suitable for applications where precision is much more critical than recall. The second fusion strategy, on the other hand, generates the best overall identification performance. It outperforms either of the analysis channels greatly in both precision an recall and is applicable to more general applications, such as, in our case, to identify persons in TV programs.
We proposed an omni-face tracking system for video annotation in this paper, which is designed to find faces from arbitrary views in complex scenes. The face detector first locates potential faces in the input by performing skin-tone detection. The subsequent processing consists of two largely independent components, the frontal face module and the side- view face module, responsible for finding frontal-view and side-view faces, respectively. The frontal face module uses a region-based approach wherein regions of skin-tone pixels are analyzed for gross oval shape and the presence of facial features. In contrast, the side-view face module follows an edge-based approach to look for curves similar to a side-view profile. To extract the trajectories of faces, the temporal continuity between consecutive frames within the video shots is considered to speed up the tracking process. The main contribution of this work is being able to find faces irrespective of their poses, whereas contemporary systems deal with frontal-view faces only. Information regarding to human faces is encoded in XML format for semantic video content representation. The effectiveness of human face for video annotation is demonstrated in a TV program classification system that categories the input video clip into predefined types. It is shown that the classification accuracy is improved saliently by the employment of face information.
Currently, most approaches for object tracking are under spatial domain using optical flow and depth, or some model- based methods which need to uncompress those video sequences then do further disposal. The computation for uncompressing is expensive and not good for real time control. Although there are also some researchers doing object tracking under compressed domain, they only use part of DCT values of I frames in a video sequence, which doesn't take advantage of the information under compressed domain. In this paper we consider a new method for object tracking which uses the information only supplied in compressed domain through the MPEG encoder. The main scheme we have performed is to get the motion vectors of P and B frames directly from the MPEG video without decompressing it then cluster objects based on the motion vectors. In particular, camera motion is also taken into account since camera's motion can influence the objects' motion and segmentation results dramatically. Experiments based on the method mentioned above have been carried out in several videos. The results obtained indicate that tracking object in compressed domain is very promising.
In this paper, we do some pre-processing on the input data to remove some noise before putting them into the network and some post-processing before outputting the results. Different neural networks such as back-propagation, radias basis network with different architecture are tested. We choose the one with the best performance among them. From the experiments we can see that the results of the neural network are similar to those given by the experienced doctors and better than those of previous research, indicating that this approach is very practical and beneficial to doctors comparing with some other methods currently existing.
This paper addresses the problem of building a model for text documents of interest. Specifically, it considers a scenario where a large collection of documents, for example, the result of a search on the Internet, using one of the popular search engines, is given. Each document is indexed by certain keywords or terms. It is assumed that the user has identified a subset of documents that fits the user's needs. The goal is to build a term association model for the documents of interest, so that it can be used either for refining the user search or exported to other search engines/agents for further search of documents of interest. The model built is in the form of a unate Boolean function of the terms or keywords used in the initial search of documents. The proposed document model building algorithm is based on a modified version of the pocket algorithm for perceptron learning and a mapping method for converting neurons into equivalent symbolic representations.
Color is one of the most widely used features for image similarity retrieval. Most of the existing image similarity retrieval schemes employ either global or local color histogramming. In this paper, we explore the use of localized dominant hue and saturation values for color-based image similarity retrieval. This scheme results in a relatively compact representation of color images for similarity retrieval. Experimental results comparing the proposed representation with global and local color histogramming are presented to show the efficacy of the suggested scheme.
Video content characterization is a challenging problem in video databases. The aim of such characterization is to generate indices that can describe a video clip in terms of objects and their actions in the clip. Generally, such indices are extracted by performing image analysis on the video clips. Many such indices can also be generated by analyzing the embedded audio information of video clips. Indices pertaining to context, scene emotion, and actors or characters present in a video clip appear especially suitable for generation via audio analysis techniques of keyword spotting, and speech and speaker recognition. In this paper, we examine the potential of speaker identification techniques for characterizing video clips in terms of actors present in them. We describe a three-stage processing system consisting of a shot boundary detection stage, an audio classification stage, and a speaker identification stage to determine the presence of different actors in isolated shots. Experimental results using the movie A Few Good Men are presented to show the efficacy of speaker identification for labeling video clips in terms of persons present in them.
Chroma-keying or blue screen matting is an important video editing operational. In blue screen matting, everything in the image that has the 'user specified level for the blue channel' is 'keyed' out and replaced by either another image or a color from a color generator. We develop a technique for blue screen matting that manipulates the image when the information for the image is available only in compressed form such as a JPEG or MPEG bitstream. Specifically, for the compressed domain approach, we show that the matting process is a convolution operation; hence, we develop a DCT convolution theorem. The DCT convolution theorem can be used to show that the compressed domain approach proposed in this paper provides a significant reduction in computation complexity compared to previously developed approaches which were mostly adhoc techniques. The convolution theorem exploits the sparseness as well as the orthogonality of the data available in the DCT domain and thus yields an efficient algorithm for he chroma-keying or blue screen matting. The algorithm extends to the DCT domain concepts such as alpha channel and premultiplied alpha. The method is also extended to MPEG video domain wherein we explore the efficiency of the matting process when interframe coding is used.
Scanline algorithms are popular in computer graphics for complex geometric manipulations. The main characteristic of scanline algorithms is that a geometric transformation is decomposed into multipass transforms with each pass operating only along row
or column scanlines. This leads to conversion of 2-D image manipulation problems to straightforward 1-D problems resulting in simple and systematic methods. The goal of this work is to examine the
scanline approach for manipulation of transform-compressed images without decompressing them. We show how the scanline algorithms for rotation and projective mapping can be developed for JPEG/DCT images. The performance of the proposed scanline algorithms is evaluated with respect to quality, speed, and control and memory overhead.
The scanline algorithms are popular in a computer graphics for complex geometric manipulations. The main characteristic of the scanline algorithms is that a geometric transformation is decomposed into multiples transforms with each pass operating only along row or column scanlines. This leads to conversion of 2D image manipulation problems to straightforward 1D problems resulting in simple and systematic methods. The goal of this work is to examine the scanline approach for manipulation of transform-compressed images without decompressing them. We show how the scanline algorithms for rotation and projective mapping can be developed for JPEG/DCT images. The performance of the proposed scanline algorithms is evaluated with respect to quality, speed, and control and memory overhead.
The major problem facing video databases is that of content characterization of video clips once the cut boundaries have been determined. The current efforts in this direction are focussed exclusively on the use of pictorial information, thereby neglecting an important supplementary source of content information, i.e. the embedded audio or sound track. The current research in audio processing can be readily applied to create many different video indices for use in Video On Demand (VOD), educational video indexing, sports video characterization, etc. MPEG is an emerging video and audio compression standard with rapidly increasing popularity in multimedia industry. Compressed bit stream processing has gained good recognition among the researchers. We have also demonstrated feature extraction in MPEG compressed video which implements a majority of scene change detection schemes on compressed video. In this paper, we examine the potential of audio information for content characterization by demonstrating the extraction of widely used features in audio processing directly from compressed data stream and their application to video clip classification.
This paper examines the issue of direct extraction of low level features from compressed images. Specifically, we consider the detection of areas of interest and edges in images compressed using the discrete cosine transform (DCT). For interest areas, we show how a measure based on certain DCT coefficients of a block can provide an indication of underlying activity. For edges, we show using an ideal edge model how the relative values of different DCT coefficients of a block can be used to estimate the strength and orientation of an edge. Our experimental results indicate that coarse edge information from compressed images can be extracted up to 20 times faster than conventional edge detectors.
Many applications require similarity retrieval of an image from a large collection of images. In such cases, image indexing becomes important for efficient organization and retrieval of images. This paper addresses this issue in the context of a database of signature images and describes a system for similarity retrieval and recognition of signature images. The proposed system uses a set of geometric and topological features to map a signature image into two strings of finite symbols. A local associative indexing scheme is then used on the strings to organize and search the signature database. The advantage of the local associative indexing is that it is tolerant of missing features and allows queries even with partial signatures. The performance of the system has been tested with promising results with a signature database of 120 signatures.
In recent years, databases have evolved from storing pure textual information to storing multimedia information -- text, audio, video, and images. With such databases comes the need for a richer set of search keys that include keywords, shapes, sounds, examples, sketches, color, texture and motion. In this paper we address the problem of image retrieval where keys are object shapes or user sketches. In our scheme, shape features are extracted from each image as it is stored. The image is first segmented and points of high curvature are extracted. Regions surrounding the points of high curvature are used to compute feature values by comparing the regions with a number of references. The references themselves are picked out from the set of orthonormal wavelet basis vectors. An ordered set of distance measures between each local region and the wavelet references form a feature vector. When a user queries the database through a sketch, the feature vectors for high curvature points on the sketch are determined. An efficient nearest neighbor search then yields a set of images which contain objects that match the user's sketch closely. The process is completely automated. Initial experimental results are presented.
One of the challenging problems in video databases is the organization of video information. Segmenting a video into a number of clips and characterizing each clip has been suggested as one mechanism for organizing video information. This approach requires a suitable method to automatically locate cut points in a video. One way of finding such cut points is to determine the boundaries between consecutive camera shots. In this paper, we address this as a statistical hypothesis testing problem and present three tests to determine cut locations. All the three tests are such that they can be applied directly to the compressed video. This avoids an unnecessary decompression-compression cycle, since it is common to store and transmit digital video in compressed form. As our experimental results indicate, the statistical approach permits accurate detection of scene changes induced through straight as well as optical cuts.
Many target tracking methods rely on the use of centroids as features points. In several applications such as the tracking of vehicles on roadways, the appropriate target paths are known a priori. Based on this observation, an efficient centroid computation method is presented and analyzed in this paper.
This paper presents a generalization of the correspondence approach of Sethi and Jain by extending the path coherence criterion to high dimensional vector space. This allows the same correspondence procedure to be used for a variety of tokens including points, lines, planes and regions. To demonstrate the generalized approach, we apply it to track lines and present experimental results.
The problem of determining the position and orientation of a mobile robot has been addressed by several researchers using sensors of different modalities, including video cameras. Invariably, all the vision-based approaches for robot localization consider that the camera is mounted on the robot and that the robot working environment is assumed to contain prominent landmarks at known locations. In this paper we propose a robot localization scheme where the robot itself serves as the landmark for cameras that are positioned in the environment to cover the entire work area of the robot. Although the proposed approach is applicable for the robots of any regular shape, we develop the solution to the localization problem by assuming a cylindrical shape for the robot. A compete mathematical analysis of the localization problem is given by extending the three-dimensional structure-from-rotational motion approach to the present task. We also examine the implementation issue of the proposed approach and present experimental results to show its effectiveness.