In this paper, we propose two classes of algorithms for modeling camera motion in video sequences captured by a camera. The first class can be applied in situations where there is no camera translation and the motion of the camera can be adequately modeled by zoom, pan, and rotation parameters. The second class is more general in that it can be applied to situations where the camera is undergoing a translational motion, as well as a rotation and zoom and pan. These algorithms are then applied to two problems: predictive video coding and three dimensional (3-D) scene reconstruction. In predictive coding, the 3-D camera motion is estimated from a sequence of frames and used to predict the apparent change in successive frames. In 3-D scene reconstruction, a 3-D object is `scanned' by translation/rotation motion of the camera along a pre-specified path. The information in successive frames is then used to recover depth at a few select locations. The depth and intensity information at these locations are then used to recover the intensity along intermediate points on the scanning path. Experimental results on both these applications are shown.
This paper describes an algorithm which, starting from the video signal, permits the user to recover the fertile information and to remove the false information introduced by the movements or vibrations of the source video sensor. Provided that some hypotheses on the exposure time and on the vibration spectrum are satisfied, a video sensor subjected to vibrations sees a static, or slowly moving, scene as a succession of frames which are identical, except for the fact that each one shows a shift with respect to the previous one. The electronic stabilization process finds the shift between the current frame and the previous one and recognizes the part due to the vibration from the target or the senor as valid movements. A stable output image can thus be obtained by shifting the output picture of the same evaluated amounts in the opposite direction. The electronic stabilization process finds application both for TV cameras and IR sensors.
A contrast-enhancement algorithm has been developed that avoids the often `flat,' noisy appearance produced by techniques that rely solely on the quantization properties of the input image. The algorithm is based on the theory of intensity-dependent spatial summation (IDS) which was first reported by Cornsweet and Yellot (1985). The initial intent of their paper was to demonstrate the theory's ability to replicate certain well-known effects in human vision, such as Mach bands, Ricco's Law, and Weber's Law. The model's implementation as an image enhancement routine has predominantly emphasized its ability to enhance edges. We have modified the IDS model to produce enhancement not only at the edges but throughout the entire image profile. Enhancement is achieved by minimizing the size of the point-spread functions generated over the input. Dramatic contrast gain is realized while limiting the amount of blur introduced to the output. As a comparison to histogram equalization, which possibly collapses quantization levels while balancing their distribution, and to IDS, which resolves the quantization levels of the image towards a uniform level (except at the edges), our algorithm preserves the original brightness gradations as it expands them. Thus, the shape- from-shading cues so important to the perception of form and contour are both preserved and enhanced.
We consider the sampling of band-limited spatio-temporal signals subject to the time- sequential (TS) constraint that only one spatial sample can be taken at a given time. Using the powerful techniques of lattice theory, we develop a new unifying theory linking TS sampling with generalized multidimensional sampling. The theory, which applies to arbitrary spatial and spectral supports in any dimension, includes tight fundamental performance bounds, which are asymptotically achievable. Simple graphical design algorithms are provided for lattice sampling patterns minimizing the sampling rates, and illustrated by examples.
As a time-sequential and Bayesian front-end for image sequence processing, we consider the square root information (SRI) realization of Kalman filter. The computational complexity of the filter due to the dimension of the problem -- the size of the state vector is on the order of the number of pixels in the image frame -- is decreased drastically using a reduced-order approximation exploiting the natural spatial locality in the random field specifications. The actual computation for the reduced-order SRI filter is performed by an iterative and distributed algorithm for the unitary transformation steps, providing a potentially faster alternative to the common QR factorization-based methods. For the space-time estimation problems, near- optimal solutions can be obtained in a small number of iterations (e.g., less than 10), and each iteration can be performed in a finely parallel manner over the image frame, an attractive feature for a dedicated hardware implementation.
In this paper we address the following two problems: (1) restoration of noisy and blurred progressively scanned image sequences, and (2) simultaneous restoration and de-interlacing of noisy and blurred interlaced image sequences. De-interlacing refers to the conversion of an interlaced image sequence to a progressive one. We first formulate a Kalman filtering algorithm for image sequence restoration, and then apply this algorithm to the simultaneous de-interlacing and restoration problem. To use Kalman filtering for image sequence restoration effectively, the temporal information contained in the image sequence needs to be incorporated into the problem formulation. One method of quantifying the new information inherent in an image sequence is to estimate the motion between successive frames of the sequence. We use this method, and incorporate the motion information into the Kalman filter through the observation equation. To restore frame k of an image sequence, an autoregressive (AR) model for the k'th frame is used, along with observations coming from both frame k and a previously restored and motion compensated frame k-1.
This paper proposes a novel image restoration approach by a neural network with hierarchical clustered architecture (NNHCA). The method is motivated by a universally accepted concept in digital image processing that natural image formation is a local process, and treats image restoration also as a globally coordinated local process. The image restoration based on the local model is realized by NNHCA, one of the recently emerged neural networks with sophisticated architectures. In the application of restoration, NNHCA consists of four processing levels. The zeroth level represents individual processing units. The first level simulates optimization process which governs local restoration. The second level acts as a link for information exchange between local clusters in the first level. And the third level coordinates the completes process.
This paper is concerned with the development of a two-dimensional (2-D) adaptive filtering using block diagonal least mean squared (BDLMS) method. In this adaptive filtering scheme, the image is scanned and processed block by block in a diagonal fashion, and the filter weights are adjusted once per each block rather than once per each pixel. As a result, this method takes into account the correlations of the pixels along both vertical and horizontal directions. The properties of this 2-D adaptive filter, including convergence behavior, adaptation speed, and adaptation accuracy, have been studied. Simulation results are presented which indicate the effectiveness of the 2-D BDLMS when used for image enhancement and estimation applications.
In direct methods of contrast enhancement, a contrast measure is first defined, which is then modified by a mapping function to generate the pixel value of the enhanced image. Various mapping functions such as the square root function, the exponential function, etc., have been introduced for the contrast measure modification. However, these functions do not produce satisfactory contrast enhancement results and are usually sensitive to noise and digitization effects. In addition, they are computationally complex from the point of view of implementation. In this paper, we propose a simple polynomial mapping function for the contrast measure modification. The polynomial function is ready to implement on digital computers and provides very satisfactory contrast enhancement. We also introduce an adaptive clipping technique to control the dynamic range of the output image.
This paper presents algorithms for simultaneous image geometric manipulation and enhancement. Single filters are developed for combined image translation, resizing, rotation, and enhancement of both single channel and multichannel/color images. Single filters have advantages for real time implementation and for minimizing artifacts produced by the sequential operations on the images. A filter structure is proposed for image sharpening, softening, blurring, and edge emphasis filtering. With the appropriate choice of the parameters, it is shown that the proposed structure can implement unsharp masking, local statistical filtering, and edge emphasis filtering. Image translation, resizing, and rotation geometric operations are combined with the structure in developing single filters. Closed form expressions are developed for real time implementation of the filters through lookup tables with different types of interpolations and for both linear and nonlinear enhancement kernels. A multichannel generalization of the algorithm is presented for simultaneous color image geometric manipulation and enhancement. Schemes are proposed for adapting the degree of enhancement based on local image information. The algorithm is applied for a number of electronic imaging applications and the evaluation results are presented.
A novel concept for characterizing the performance of cone-beam computerized tomography systems has been developed and its application to the evaluation of cone-beam algorithms is described. The new method consists of three steps. The first step is to determine the edge spread function (ESF) from an analysis of the edge profile for a simple reconstructed object like a uniform sphere or cylinder. The edge response is calculated for sets of points where linearity and spatial invariance hold. The second step is to calculate the line spread function (LSF) from the ESF, and the third step is to calculate the modulation transfer function (MTF) from the ESF. This method is amenable to automatic processing and is more useful to predict cone-beam algorithms and systems performance than other image quality measures which do not bear any relation to spatial resolution.
We consider the computational complexity of block transform coding and tradeoffs among computation, bit rate, and distortion. In particular, we illustrate a method of coding that allows decompression time to be traded with bit rate under a fixed quality criteria, or allows quality to be traded for speed with a fixed average bit rate. We provide a brief analysis of the entropy coded infinite uniform quantizer that leads to a simple bit allocation for transform coefficients. Finally, we consider the computational cost of transform coding for both the discrete cosine transform (DCT) and the Karhunen-Loeve transform (KLT). In general, a computation-rate- distortion surface can be used to select the appropriate size transform and the quantization matrix for a given bandwidth/CPU channel.
In this research, we present a new approach for still image data compression. The first step is to decompose an image into small blocks of the same size via full wavelet transform (FWT), where each block corresponds to a particular frequency band whereas each transform coefficient in these blocks corresponds to a local spatial region in the original image. The space-frequency energy compaction property of the FWT is demonstrated. That is, most energy is concentrated in either low frequency blocks or transform coefficients associated with spatial regions with strong variations such as edges or textures. Image compression can be achieved by effectively using this energy compaction property. The second step is bit allocation and quantization. The block consisting of the lowest frequency components is quantized with 6 bits with the Gaussian density assumption. For coefficients in the remaining blocks, we propose a bit assignment scheme based on the block and position energy of the FWT coefficients. They are then quantized with either the Laplacian or the Gaussian density depending on the number of quantization levels. The relationship between the proposed method and other popular image compression methods such as DCT, PWT (pyramidal wavelet transform), and SBC (subband coding) is also discussed.
By its very nature, multimedia includes images, text and audio stored in digital format. Image compression is an enabling technology essential to overcoming two bottlenecks: cost of storage and bus speed limitation. Storing 10 seconds of high resolution RGB (640 X 480) motion video (30 frames/sec) requires 277 MBytes and a bus speed of 28 MBytes/sec (which cannot be handled by a standard bus). With high quality JPEG baseline compression the storage and bus requirements are reduced to 12 MBytes of storage and a bus speed of 1.2 MBytes/sec. Moreover, since consumer video and photography products (e.g., digital still video cameras, camcorders, TV) will increasingly use digital (and therefore compressed) images because of quality, accessibility, and the ease of adding features, compressed images may become the bridge between the multimedia computer and consumer products. The image compression challenge can be met by implementing the discrete cosine transform (DCT)-based image compression algorithm defined by the JPEG baseline standard. Using the JPEG baseline algorithm, an image can be compressed by a factor of about 24:1 without noticeable degradation in image quality. Because motion video is compressed frame by frame (or field by field), system cost is minimized (no frame or field memories and interframe operations are required) and each frame can be edited independently. Since JPEG is an international standard, the compressed files generated by this solution can be readily interchanged with other users and processed by standard software packages. This paper describes a multimedia image compression board utilizing Zoran's 040 JPEG Image Compression chip set. The board includes digitization, video decoding and compression. While the original video is sent to the display (`video in a window'), it is also compressed and transferred to the computer bus for storage. During playback, the system receives the compressed sequence from the bus and displays it on the screen.
In order to achieve high compression ratios, while maintaining visual quality, a new approach combining Peano scanning and vector quantization is proposed. The Peano scan technique clusters highly correlated pixels, and a vector quantization scheme is developed to exploit interblock correlation. An efficient scan technique serves as a useful preprocessing unit in image compression. In this work, Peano curves are used locally to transform the original image into one containing clustered bands of correlated data. The Peano scan results in a much lower mean absolute difference between neighboring blocks than the Raster scan (almost 50% reduction) i.e., better reordering/clustering is achieved. A codebook is formed using the LBG algorithm. The clustered image is subdivided into blocks of 4 X 4 and vector quantized using the codebook. The pattern of indices of codevectors chosen for successive blocks is exploited. Due to clustering achieved, frequently the same codevector is used to encode a series of successive blocks. Transmission of the same codevector index for each block is inefficient. A method of assigning additional bits to encode the repeat pattern is proposed. This causes an overall reduction in bit-rate. Another technique employs dynamic partitioning of the codebook into a large passive part and a smaller active part. If the mean squared difference between successive blocks is below a predetermined threshold, only the active part is searched. This leads to lowering of bit-rate and search time. The combined approach suggested exploits interblock correlation in an image, using an efficient scan technique. It leads to lower bit-rates than conventional VQ methods, at comparable visual quality.
In this paper, we propose a new encoding method called predictive sub-codebook searching (PSCS) algorithm for vector quantization of images. This algorithm not only allows searching of a portion of the codebook to find the minimum distortion codeword of an input vector, but also permits the sending of the index of the codeword in sub-codebooks which requires shorter bit-length representation than that of the whole codebook. Computer simulations using real images show that the PSCS algorithm is very efficient since it requires a minimum number of computations among several existing fast encoding algorithms, and there is also about 8 to 21 percent bit-rate reduction depending on the natures of the images.
Differential pulse code modulation (DPCM) is a widely used technique for both lossy and lossless compression of images. In this paper, the effect of using a nonlinear predictor based on artificial neural networks (ANN) for a DPCM encoder is investigated. The ANN predictor uses a 3-layer perceptron model with 3 input nodes, 30 hidden nodes, and 1 output node. The back-propagation learning algorithm is used for the training of the network. Simulation results are presented to compare the performance of the proposed ANN-based nonlinear predictor with that of a global linear predictor as well as an optimized minimum-mean-squared-error (MMSE) linear predictor. Preliminary computer simulations demonstrate that for a typical test image, the zeroth-order entropy of the differential (error) image can be reduced by more than 15% compared to the case where optimum linear predictors are employed. Some future research directions are also discussed.
We introduce a new technology for lossy compression of continuous tone images and video sequences. The Normal Forms representation is based on picture decomposition in the geometric domain and is composed of two parts: background and superimposed elements (normal forms) capturing low scale picture details. This representation provides a very high compression ratio, with minimal or no visual degradation; it allows for instantaneous application of image processing operations on compressed data or during the compression process; it is inherently extendable to video sequences compression.
The semivariogram of the image is estimated. Based on this we estimate the zone of influence. If r is the zone of influence of the semivariogram function of the picture, then we sample the picture using a square sampling design with side length equal to a X r, where a is a factor less than one. A kriging estimator is used to estimate the pixels not included in the sample. Under the assumptions of the process being wide sense stationary this kriging estimator is best linear unbiased. The mean square error is calculated, and we prove that as a goes to zero the decompressed image converges to the original. The convergence is with probability one (convergence in probability). The algorithm is lossy, and tests using popular images are presented. Comparison of this algorithm with other popular image compression algorithms are made. The comparisons include compression ratio, and a measure of distance between the original image, and the decompressed image.
Stereo images are used in many applications including autonomous vehicle navigation, geoscience, and machine vision. The storage and transmission of the stereo images involve large amounts of data. Therefore, efficient coding techniques are required to encode the two correlated stereo image pairs with low bit rates. This paper presents two techniques to compress stereo image data. The proposed schemes are based on subband coding and disparity compensation (DC). DC, which is similar to motion compensation, exploits the redundancy that exists between two stereo images. In the first method, the two images are decomposed into four subbands using two-tap filters. The lowest band of the left image is coded using block DPCM and DC is applied to code the lowest band of the right image. The high bands are directly quantized and PCM coded. The coding is adaptive in the sense that different quantizers are selected based on the activity of each block. In the second method, disparity compensated interpolative predictive coding, DC is applied to the original images and proceeds in a different way. To find the differential signals, the pixels in each block of the right image are divided into two groups, P1 and P2. P1 pixels are predicted by the corresponding pixels in its matching block and P2 pixels are predicted interpolatively by the decoded P1 pixels. Finally, a method for the lossless coding of the disparity information is presented. The proposed lossless scheme reduces the overhead information and the search efforts.
In an effort to remove the soldier, to the extent possible, from harm's way, some of the Army's future combat vehicles will be teleoperable. Video or forward looking infrared (FLIR) sensors will be mounted on the vehicle, and imagery of the surrounding scene will be transmitted to a control station by way of a radio frequency (RF) link. However, visual imagery is a large user of bandwidth, and bandwidth on the battlefield is very limited. Therefore, a means must be found of achieving a low-data-rate transmission. We have developed a system that accomplishes this by using two distinct techniques. First, a 25:1 bandwidth reduction ratio is achieved by compressing the transmitted image using a combination of the discrete cosine transform and Huffman encoding. Second, a 90:1 bandwidth reduction ratio is achieved by transmitting only one frame every 3 seconds rather than the usual 30 frames per second. The intervening 3 seconds are filled with 89 synthetically created frames (synthetic optic flow) which appear very much like those that would have been transmitted using full bandwidth transmission. The result of these two steps is an overall bandwidth reduction ratio of 25 X 90 equals 2250:1.
This paper focuses the video coding for real time applications and its transmission over synchronous or asynchronous networks, with special interest in ATM (asynchronous transfer mode) switching and multiplexing. Video coders compress the information of a sequence producing a bit stream whose rate, in real time applications, should be controlled to fit the transmission resources. In this work we analyze the main control proposals from a general and comprehensive point of view, pointing out the observed variables, the prediction mechanisms, the action and periodicity of the control. The control action is to switch between different quantization modes. Actually, the coder must decide before coding a unit. A prediction of the data volume to be generated is helpful to avoid ungraceful degradation. The tradeoff in the periodicity of the action is the other key point. We propose a general strategy, compatible with MPEG standards, that is the real buffer evolution shaping. We observe the real buffer and impose a desired level variation. The buffer shaping unit can be a frame, slice, or macroblock. A method of prediction of the data volume to be produced in a unit is proposed. Using the proposed approach we study constant quality and variable bit rate coding, well suited for asynchronous networks. We compare the buffer evolution and the quality of coding under different strategies. In ATM environments the policing functions are taken into account as restrictions to be respected. methods to adapt coding with known policing functions are proposed.
An efficient image compression scheme is presented. The scheme is based on the principles and ideas reflected by the specification and development of the SCAN language and is mainly motivated for the compression of 2-D digital images. SCAN is a special purpose context-free language which accesses sequentially the data of a 2-D array, by describing and generating a wide range of accessing algorithms from a short set of simple ones. The SCAN language has been implemented in C language and run efficiently in IBM PCs. The compression scheme presented here uses the algorithmic description of a 2-D image and applies a regions searching/coding criteria for the classification and compression of the image data based on color or gray level. Note that each SCAN letter or word accesses the image data with a different order (or sequence), thus the application of a variety of SCAN words associated with the coding criteria will produce various compressed versions of the same image. The compressed versions are compared in memory size and the best of them could be used for the image transmission. In this paper the compression model and image compression results are provided by using a variety of images.
The new science of Autosophy explains `self-assembling structures,' such crystals or living trees, in mathematical terms. This research has produced a mathematical theory of `learning' and a new `information theory' which permits the growing of self-assembling data network in a computer memory similar to the growing of `data crystals' or `data trees' without data processing or programming. Self-growing Autosophy networks yield real-time `lossless' image compression in which the transmission bandwidth is independent of screen size, resolution, or scanning rates. The systems contain a peculiar self-growing omni dimensional image library in which many image fragments are stored in mathematical hyperspace. For transmission each input image is broken into fragments of various sizes by comparing it like a jigsaw puzzle with the largest matching fragments in the library. The fragments ar identified with a `pattern address' and a `location address' and transmitted as `superpixel.' Each superpixel may represent any size part of the image, from single pixels to entire screen images. For storage compression each image is reduced into a single output code to the computer. The image information is stored in a mathematical hyperspace to yield many orders of magnitude lossless image compression.
In image restoration process the Wiener filter method, derived from the minimum mean square error criterion, is probably the most popular. In this method the constant (Gamma) , which is an a priori representation of the signal-to-noise ratio for the complete image plane, is supplied by the user and adjusted by trial and error method. In a previous paper an estimation process for (Gamma) was introduced. An expression for (Gamma) [i, j], which is an a priori representation of the signal-to-noise ratio for the pixel [i, j], was derived assuming that two degraded images of the same object are provided. The expression depends on the correlation between the two degraded images and the point spread function involved in blurring the original image. The estimate for (Gamma) was obtained by taking a statistical average of the values of (Gamma) [i, j]. However, it may not always be possible to have the second image, in which case this process of estimation of (Gamma) cannot be used. In this paper, a new algorithm is proposed to construct the second image based on Lagrange's interpolation technique so that the above method of estimating (Gamma) can still be used when the second image is not available.
A Bayesian approach for segmentation of three-dimensional (3-D) magnetic resonance imaging (MRI) data of the human brain is presented. Connectivity and smoothness constraints are imposed on the segmentation in 3 dimensions. The resulting segmentation is suitable for 3-D display and for volumetric analysis of structures. The algorithm is based on the maximum a posteriori probability (MAP) criterion, where a 3-D Gibbs random field (GRF) is used to model the a priori probability distribution of the segmentation. The proposed method can be applied to a spatial sequence of 2-D images (cross-sections through a volume), as well as 3-D sampled data. We discuss the optimization methods for obtaining the MAP estimate. Experimental results obtained using clinical data are included.