Many surveillance and security monitoring videos are long and of low quality. Moreover, reviewing and extracting anomaly events in the videos is a lengthy and manually intensive process. In this paper, we present two efficient anomaly detection algorithms based on saliency to detect anomalous events in low quality videos. The events’ start times and durations are saved in a video summary for later reviews. The video summary is very short. For example, we have summarized a 14-minute long video into a 16-second video summary. Extensive evaluations of the two algorithms clearly demonstrated the feasibility of these algorithms. A user friendly software tool has also been developed to help human operators review and confirm those events.
Diabetic retinopathy (DR) is a consequence of diabetes and is the leading cause of blindness among 18- to 65-year-old adults. Regular screening is critical to early detection and treatment of DR. Computer-aided diagnosis has the potential of improving the practice in DR screening or diagnosis. An automated and unsupervised approach for retrieving clinically relevant images from a set of previously diagnosed fundus camera images for improving the efficiency of screening and diagnosis of DR is presented. Considering that DR lesions are often localized, we propose a multiclass multiple-instance framework for the retrieval task. Considering the special visual properties of DR images, we develop a feature space of a modified color correlogram appended with statistics of steerable Gaussian filter responses selected by fast radial symmetric transform points. Experiments with real DR images collected from five different datasets demonstrate that the proposed approach is able to outperform existing methods.
With the advent of progressive format display and broadcast technologies, video deinterlacing has become an important video-processing technique. Numerous approaches exist in the literature to accomplish deinterlacing. While most earlier methods were simple linear filtering-based approaches, the emergence of faster computing technologies and even dedicated video-processing hardware in display units has allowed higher quality but also more computationally intense deinterlacing algorithms to become practical. Most modern approaches analyze motion and content in video to select different deinterlacing methods for various spatiotemporal regions. We introduce a family of deinterlacers that employs spectral residue to choose between and weight control grid interpolation based spatial and temporal deinterlacing methods. The proposed approaches perform better than the prior state-of-the-art based on peak signal-to-noise ratio, other visual quality metrics, and simple perception-based subjective evaluations conducted by human viewers. We further study the advantages of using soft and hard decision thresholds on the visual performance.
We propose a method for defect detection based on taking the sign information of Walsh Hadamard Transform (WHT) coefficients. The core of the proposed algorithm involves only three steps that can all be implemented very efficiently: applying the forward WHT, taking the sign of the transform coefficients, and taking an inverse WHT using only the sign information. Our implementation takes only 7 milliseconds for a 512 × 512 image on a PC platform. As a result, the proposed method is more efficient than the PHase Only Transform (PHOT) method and other methods in literature. In addition, the proposed approach is capable of detecting defects of varying shapes, by combining the 2-dimensional WHT and 1-dimensional WHT; and can detect defects in images with strong object boundaries by utilizing a reference image. The proposed algorithm is robust over different background image patterns and varying illumination conditions. We evaluated the proposed method both visually and quantitatively and obtained good results on images from various defect detection applications.
Diabetic retinopathy (DR) is a vision-threatening complication from diabetes mellitus, a medical condition that is rising
globally. Unfortunately, many patients are unaware of this complication because of absence of symptoms. Regular
screening of DR is necessary to detect the condition for timely treatment. Content-based image retrieval, using archived and diagnosed fundus (retinal) camera DR images can improve screening efficiency of DR. This content-based image retrieval study focuses on two DR clinical findings, microaneurysm and neovascularization, which are clinical signs of non-proliferative and proliferative diabetic retinopathy. The authors propose a multi-class multiple-instance image retrieval framework which deploys a modified color correlogram and statistics of steerable Gaussian Filter responses, for retrieving clinically relevant images from a database of DR fundus image database.
We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of
clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of
the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that
are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR
images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as
bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is
characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns
a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation-
Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a
point in a <i>new multi-class bag feature space</i>. Finally a multi-class Support Vector Machine is trained in the multi-class
bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the
query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag
feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant
images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also
improves the efficiency and accuracy of DR lesion diagnosis and assessment.
In the visual tracking domain, Particle Filtering (PF) can become quite inefficient when being applied into high dimensional state space. Rao-Blackwellisation <sup></sup> has been shown to be an effective method to reduce the size of the state space by marginalizing out some of the variables analytically <sup></sup>. In this paper based on our previous work <sup></sup> we proposed RBPF tracking algorithm with adaptive system noise model. Experiments using both simulation data and real data show that the proposed RBPF algorithm with adaptive noise variance improves its performance significantly over conventional Particle Filter tracking algorithm. The improvements manifest in three aspects: increased estimation accuracy, reduced variance for estimates and reduced particle numbers are needed to achieve the same level of accuracy.
One of the major challenges facing current media management systems and the related applications is the so-called “semantic gap” between the rich meaning that a user desires and the shallowness of the content descriptions that are automatically extracted from the media. In this paper, we address the problem of bridging this gap in the sports domain. We propose a general framework for indexing and summarizing sports broadcast programs. The framework is based on a high-level model of sports broadcast video using the concept of an event, defined according to domain-specific knowledge for different types of sports. Within this general framework, we develop automatic event detection algorithms that are based on automatic analysis of the visual and aural signals in the media. We have successfully applied the event detection algorithms to different types of sports including American football, baseball, Japanese sumo wrestling, and soccer. Event modeling and detection contribute to the reduction of the semantic gap by providing rudimentary semantic information obtained through media analysis. We further propose a novel approach, which makes use of independently generated rich textual metadata, to fill the gap completely through synchronization of the information-laden textual data with the basic event segments. An MPEG-7 compliant prototype browsing system has been implemented to demonstrate semantic retrieval and summarization of sports video.
We propose a framework for event detection and summary generation in football broadcast video. First, we formulate summarization as a play detection problem, with play being defined as the most basic segment of time during which the ball is being played. Then we propose both deterministic and probabilistic approaches to the detection of the plays. The detected plays are concatenated to generate a compact, time-compressed summary of the original video. Such a summary is complete in the sense that it contains every meaningful action of the underlying game, and it also servers as a much better starting point for higher-level summarization and other analyses than the original video does. Based on the summary, we also propose an audio-based hierarchical summarization method. Experimental results show the proposed methods work very well on consumer grade platforms.
A hybrid algorithm is proposed for very low bit-rate video compression. The algorithm uses a new wavelet based coder for Intraframe compression and DCT for Interframe compression. The wavelet coder technique known as OBTWC (Overlapped Block Transform Wavelet Coder) consists of three steps. First, a set of overlapped block transforms is used to transform the image data into 8 X 8 blocks in the frequency domain. Second, a mapping is then performed to convert the transformed image into a multiresolution representation that resembles the zero- tree wavelet transform. Third, the multiresolution representation is then coded by a conventional dyadic wavelet coder, which basically truncates the high frequency contents in a very efficient manner. Our proposed method essentially combines the advantages of both block transform and wavelet coding techniques while eliminating their respective weaknesses. Simulation results show that the coder achieves more than 300:1 compression ratio at a frame rate of 10 per second.
A new wavelet based image coder is proposed for SAR image compression. The coding technique known as OBTWC (Overlapped Block Transform Wavelet Coder) consists of three steps. First, a set of overlapped block transforms is used to transform the image data into 8 X 8 blocks in the frequency domain. Second, a mapping is then performed to convert the transformed image into a multiresolution representation that resembles the zero-tree wavelet transform. Third, the multiresolution representation is then coded by a conventional dyadic wavelet coder, which basically truncates the high frequency contents in a very efficient manner. Our proposed method essentially combines the advantages of both block transform and wavelet coding techniques while eliminating their respective weaknesses. The image compression algorithm was applied to SAR images supplied by Air Force, Army, and NASA. The compression performance in terms of Peak Signal-to-Noise Ratio is better than that of a commercial wavelet coder in the market.
This paper presents an empirical evaluation of a number of recently developed Automatic Target Recognition algorithms for Forward-Looking InfraRed (FLIR) imagery using a large database of real second-generation FLIR images. The algorithms evaluated are based on convolution neural networks (CNN), principal component analysis (PCA), linear discriminant analysis (LDA), learning vector quantization (LVQ), and modular neural networks (MNN). Two model-based algorithms, using Hausdorff metric based matching and geometric hashing, are also evaluated. A hierarchial pose estimation system using CNN plus either PCA or LDA, developed by the authors, is also evaluated using the same data set.