KEYWORDS: Video, Image segmentation, Video surveillance, Video acceleration, Convolution, Picosecond phenomena, Neural networks, Video compression, Optical flow, Image processing algorithms and systems
We propose a three-dimensional video segmentation method using deep learning convolutional neural nets. The algorithm utilizes the local gradient computed at each pixel location together with the global boundary map acquired through deep learning methods to generate initial pixel groups by traversing from low to high gradient regions. A local clustering method is then employed to refine these initial pixel groups. The refined subvolumes in the homogeneous regions of video are selected as initial seeds and iteratively combined with adjacent groups based on intensity similarities. The volume growth is terminated at the color boundaries of the video. The oversegments obtained from the above steps are then merged hierarchically by a multivariate approach yielding a final segmentation map for each frame. The results show that our proposed methodology compares favorably well, on a qualitative and quantitative level, in segmentation quality and computational efficiency, with the latest state-of-the-art techniques utilizing the video segmentation benchmark dataset.
Aerial images acquired by multiple sensors provide comprehensive and diverse information of materials and objects within a surveyed area. The current use of pretrained deep convolutional neural networks (DCNNs) is usually constrained to three-band images (i.e., RGB) obtained from a single optical sensor. Additional spectral bands from a multiple sensor setup introduce challenges for the use of DCNN. We fuse the RGB feature information obtained from a deep learning framework with light detection and ranging (LiDAR) features to obtain semantic labeling. Specifically, we propose a decision-level multisensor fusion technique for semantic labeling of the very-high-resolution optical imagery and LiDAR data. Our approach first obtains initial probabilistic predictions from two different sources: one from a pretrained neural network fine-tuned on a three-band optical image, and another from a probabilistic classifier trained on LiDAR data. These two predictions are then combined as the unary potential using a higher-order conditional random field (CRF) framework, which resolves fusion ambiguities by exploiting the spatial–contextual information. We utilize graph cut to efficiently infer the final semantic labeling for our proposed higher-order CRF framework. Experiments performed on three benchmarking multisensor datasets demonstrate the performance advantages of our proposed method.
In this paper, we explore the use of two machine learning algorithms: (a) random forest for structured labels and (b) fully convolutional neural network for the land cover classification of multi-sensor remote sensed images. In random forest algorithm, individual decision trees are trained on features obtained from image patches and corresponding patch labels. Structural information present in the image patches improves the classification performance when compared to just utilizing pixel features. Random forest method was trained and evaluated on the ISPRS Vaihingen dataset that consist of true ortho photo (TOP: near IR, R, G) and Digital Surface Model (DSM) data. The method achieves an overall accuracy of 86.3% on the test dataset. We also show qualitative results on a SAR image. In addition, we employ a fully convolutional neural network framework (FCN) to do pixel-wise classification of the above multi-sensor image. TOP and DSM data have individual convolutional layers with features fused before the fully convolutional layers. The network when evaluated on the Vaihingen dataset achieves an overall classification accuracy of 88%.
We propose an unsupervised algorithm that utilizes information derived from spectral, gradient, and textural attributes for spatially segmenting multi/hyperspectral remotely sensed imagery. Our methodology commences by determining the magnitude of spectral intensity variations across the input scene, using a multiband gradient detection scheme optimized for handling remotely sensed image data. The resultant gradient map is employed in a dynamic region growth process that is initiated in pixel locations with small gradient magnitudes and is concluded at sites with large gradient magnitudes, yielding a map comprised of an initial set of regions. This region map is combined with several co-occurrence matrix-derived textural descriptors along with intensity and gradient features in a multivariate analysis-based region merging procedure that fuses the regions with similar characteristics to yield the final segmentation output. Our approach was tested on several multi/hyperspectral datasets, and the results show a favorable performance in comparison with state-of-the-art techniques.
This paper proposes a two stage algorithm for streaming video segmentation. In the first stage, shot boundaries are
detected within a window of frames by comparing dissimilarity between 2-D segmentations of each frame. In the second
stage, the 2-D segments are propagated across the window of frames in both spatial and temporal direction. The window
is moved across the video to find all shot transitions and obtain spatio-temporal segments simultaneously. As opposed to
techniques that operate on entire video, the proposed approach consumes significantly less memory and enables
segmentation of lengthy videos. We tested our segmentation based shot detection method on the TRECVID 2007 video
dataset and compared it with block-based technique. Cut detection results on the TRECVID 2007 dataset indicate that
our algorithm has comparable results to the best of the block-based methods. The streaming video segmentation routine
also achieves promising results on a challenging video segmentation benchmark database.
In this paper, we present an Edge Directed Super Resolution (EDSR) technique for grayscale and color images. The proposed algorithm is a multiple pass iterative algorithm aimed at producing better defined images with sharper edges. The basic premise behind this algorithm is interpolating along the edge direction and thus reducing blur that comes from traditional interpolation techniques which operate across the edge in some instances. To this effect, horizontal and vertical gradients derived from the input reference image resized to the target resolution, are utilized to generate an edge direction map, which in turn is quantized into four discrete directions. The process then utilizes the multiple input images shifted by a sub pixel amount to yield a single higher resolution image. A cross correlation based registration approach determines the relative shifts between the frames. In the case of color images, the edge directed super resolution algorithm is applied to the L channel in the L*a*b* color space, since most of the edge information is concentrated in that channel. The two color difference channels a* and b* are resized to a higher resolution using a conventional bicubic interpolation approach. The algorithm developed was applied to grayscale and color images and showed favorable results on a wide variety of datasets ranging from printing to surveillance, to regular consumer photography.
Measurement of glucose concentration is important for diagnosis and treatment of diabetes mellitus and other medical
conditions. This paper describes a novel image-processing based approach for measuring glucose concentration. A fluid
drop (patient sample) is placed on a thin film slide. Glucose, present in the sample, reacts with reagents on the slide to
produce a color dye. The color intensity of the dye formed varies with glucose at different concentration levels. Current
methods use spectrophotometry to determine the glucose level of the sample. Our proposed algorithm uses an image of
the slide, captured at a specific wavelength, to automatically determine glucose concentration. The algorithm consists of
two phases: training and testing. Training datasets consist of images at different concentration levels. The dye-occupied
image region is first segmented using a Hough based technique and then an intensity based feature is calculated from the
segmented region. Subsequently, a mathematical model that describes a relationship between the generated feature
values and the given concentrations is obtained. During testing, the dye region of a test slide image is segmented
followed by feature extraction. These two initial steps are similar to those done in training. However, in the final step,
the algorithm uses the model (feature vs. concentration) obtained from the training and feature generated from test image
to predict the unknown concentration. The performance of the image-based analysis was compared with that of a
standard glucose analyzer.