A powerful automated license plate recognition system is presented, which is able to read license numbers of cars, even under non-ideal circumstances. At the front-end of the system, there is a high-speed shutter camera and a frame grabber that delivers the digitized images of cars passing by. In a license plate segmentation step, the approximate positions of the four corner points of the plates are indicated. Due to the perspective view, these corner points may not correspond to a rectangle. By means of resampling, a rectangular license plate with a fixed size of 180 X 40 pixels is reconstructed. After image enhancement steps, the characters are approximately segmented, based on the properties of a vertical projection of the license plate. Next, the separate characters are normalized with respect to contrast, intensity and size. Each character image is projected on a low-dimensional space using the Karhunen- Loeve (KL) transform, containing the relevant information to distinguish it from other characters. A problem with this transformation arises, when the character is not properly segmented. We solved that problem by comparing the inverse KL transformed result with the original character. In case they differ significantly, this may indicate a major segmentation error, for which we can correct. This leads to a much improved segmentation and thus a transformation that holds the needed information for the classification. The KL transformed characters can be classified by several methods. We obtained good results by classifying the transformed characters with the help of the Euclidean distance. A misclassification rate of 0.4% was achieved with a rejection rate of 13%. Further development of the system, for which a number of recommendations are given, is expected to increase the system performance.
Robust reconstruction of coherent speckle images from non- imaged laser speckle patterns in the aperture plane of an optical system requires adequate sampling of the speckle intensity at the focal plane. Although detector size cannot be changed dynamically in the course of an experiment to achieve the necessary sampling in every frame, a measure of speckle size could be used to accept or reject individual frames in post-processing software to improve the final reconstructed image. This paper investigates the use of a speckle size metric to gauge the integrity of speckle sampling in each frame of a series of coherent speckle images. Frames containing inadequate sampling are sorted out of the final reconstructed image. The quality of the final recovery for a variety of targets and imaging conditions are compared for sorted and non-sorted reconstructions.
A high-speed 3D imaging system has been developed using multiple independent CCD cameras with sequentially triggered acquisition and individual field storage capability. The system described here utilizes sixteen independent cameras. A stereo alignment and triggering scheme arranges the cameras into two angularly separated banks of eight cameras each. By simultaneously triggering correlated stereo pairs, an eight-frame sequence of stereo images is captured. The delays can be individually adjusted to yield a greater number of acquired frames during more rapid segments of the vent, and the individual integration periods may be adjusted to ensure adequate radiometric response while minimizing image blur. Representation of the data as a 3D sequence introduces the issue of independent camera coordinate registration with the real scene. A discussion of the forward and inverse transform operator for the digital data is provided along with a description of the acquisition system.
During the past decade 3D image processing has become an important key component in biological research mainly due to two different developments. The first is based on an optical instrument, the so-called confocal laser scanning microscope, allowing optical sectioning of the biological specimen. The second is a biological preparatory method, the so-called FISH-technique (Fluorescence-In-Situ- Hybridization), allowing labeling of certain cellular and sub-cellular compartments with highly specific fluorescent dyes. Both methods make it possible to investigate the 3D biological framework within cells and nuclei. Image acquisition with confocal laser scanning microscopy must deal with different limits of resolution along and across the optical axis. Although lateral resolution is about 0.7 times better than in non-confocal arrangements, axial resolution is more than 3 - 4 times poorer than that of the lateral (depending on the pinhole size). For 3D reconstruction it is desirable to improve axial resolution in order to provide nearly identical image information across the 3D specimen space. This presentation will give an overview of some of the most popular restoration and deblurring algorithms used in 3D image microscopy. After 3D image restoration, segmentation of certain details of the cell structure is usually the next step in image processing. We compared two different kinds of algorithms for segmentation of chromosome territories in interphase cell nuclei. One is based on Mathematical Morphology, the other on Split & Merge methods. The segmented image regions provided the basis for chromosome domain reconstruction as well as for regional localization for subsequent quantitative measurements. As a result the chromatin density within certain chromosome domains as well as some terminal DNA sequences (telomere signals) could be measured.
A laboratory system for multiple point, closed-loop, surface temperature control has been developed to test control algorithms for manufacturing applications such as rapid prototyping laminate manufacture and rapid thermal processing of semiconductor wafers. Accurate surface temperature measurements with high spatial resolution are required to provide a signal for feedback control. Image acquisition and conversion to temperature must be fast enough to operate in conjunction with the control method used. Optical methods are a natural fit, since many spatially distributed measurements can be obtained rapidly with a minimum of complexity. The operation of thermochromic liquid crystals in a real time control loop is described. Every 200 msec, the temperature at each of 196 zones must be obtained. Two frames and multiple pixels per zone are averaged to reduce statistical uncertainty in the reading. An in situ calibrator was developed and used to study the errors inherent in this system, including errors due to hysteresis and uneven surface lighting. The statistical uncertainty in temperature measured due to the calibrator uncertainty varied with temperature, but was less than 0.03 degree(s)C over most of the range. The uncertainty in the in situ temperature reading was larger because fewer pixels could be averaged. This uncertainty also depended upon temperature, and was below 0.2 degree(s)C over an 7.5 degree(s)C range. Additional errors were investigated. Hysteresis, or path dependent effects in the temperature response of the liquid crystals, was seen to depend upon the maximum temperature reached, even when the maximum temperature was within the usable range of the liquid crystals. Taking the crystals to temperatures above the usable range led to larger hysteresis errors, as much as 0.2 degree(s)C, while errors below 0.1 degree(s)C were seen for a smaller temperature excursion. Uneven lighting had a large effect on the saturation and intensity measured, and a significantly smaller effect on the hue. However, the small effect on the hue caused an error around 0.1 degree(s)C, so uneven lighting should be avoided if possible. Also, thresholding (determining if crystals are reading valid temperatures) is complicated by variations in lighting intensity over the surface to be measured.
Proc. SPIE 3460, Aperture design and numerical reconstruction technique requirements for high-resolution imaging of neutrons in inertial confinement fusion (ICF), 0000 (1 October 1998); https://doi.org/10.1117/12.323219
In Inertial Confinement Factor (ICF) experiments, radiation from compressed core is increasingly reabsorbed. For the largest experiments, the only radiation to escape is the 14 MeV fusion neutrons to which we must turn to learn of the physical processes taking place. The most important parameters are the shape and the size of the compressed core and this involves imaging the neutrons produced by the fusion reactions. The penumbral technique is ideally suited to neutron imaging and the feasibility of this technique has been demonstrated at the Lawrence Livermore National Laboratory in the United States. At the Phebus laser facility in France, this method has been used to image compressed ICF cores with diameters of 150 micrometers yielding approximately 109 neutrons, and the overall spatial resolution obtained in the reconstructed source was approximately 100 micrometers . On the Laser Megajoule project which is the equivalent of the National Ignition Facility in the United States, the spatial resolution required to diagnose high-convergence targets is 10 micrometers . We wish first to obtain a spatial resolution of 30 micrometers to image source with a diameter <EQ 100 micrometers at a neutron yield in the range of 1011 - 1014 neutrons. A collaborative experimental program with the Laboratory for Laser Energetics at the University of Rochester in this perspective is planned. At the same time, there is a research program in collaboration with Laval University concerning coded aperture designs and the associated reconstruction techniques. In this article we first review the basic requirements of such imagery and the concept of the penumbral imaging technique. Then we concentrate on the aperture design criteria and on the quantity of information necessary to achieve high spatial resolution. Finally, we survey the reconstruction techniques used followed by results and comparative evaluation of those methods.
Robust geometric distortion characterization and correction methods have been developed and validated for DSA application. A compact as well as efficient numerical parameterization technique has been used for characterizing the 3D geometric image distortion of a clinical DNA system. To some extent, the 3rd order polynomial was found to be sufficient for the parameterization of global image distortion for all rotational angles. One of the advantages of the new technique is compact in the sense that it does not require a large distortion parameter look-up table. The new method promises a potential of using less regular distributed calibration points for a complete 3D image distortion characterization of DSA systems. Furthermore, two different schemes based on the pixel mapping for correcting image distortion have been successfully demonstrated yielding satisfactory results.
Quantitative estimation of tissue labeling heavily depends on the efficiency of image segmentation technique. In this paper, an encoder-segmented neural network was proposed to improve the efficiency of image segmentation. The features are ranked according to the encoder indicators by which the insignificant feature vector will be eliminated from the original feature vectors and the important feature vectors can be re-organized as the encoded feature vectors for the subsequent clustering. ESNN developed can improve the exist FCM algorithm in feature extraction and the cluster's number selection. This method was successfully implemented automatic labeling of tissue in brain MRIs. Examples of the results are also presented for diagnosis of brain using MR images.
In this article we present an implementation of a watershed algorithm on a multi-FPGA architecture. This implementation is based on an hierarchical FIFO. A separate FIFO for each gray level. The gray scale value of a pixel is taken for the altitude of the point. In this way we look at the image as a relief. We proceed by a flooding step. It's like as we immerse the relief in a lake. The water begins to come up and when the water of two different catchment basins reach each other, we will construct a separator or a `Watershed'. This approach is data dependent, hence the process time is different for different images. The H-FIFO is used to guarantee the nature of immersion, it means that we need two types of priority. All the points of an altitude `n' are processed before any point of altitude `n + 1'. And inside an altitude water propagates with a constant velocity in all directions from the source. This operator needs two images as input. An original image or it's gradient and the marker image. A classic way to construct the marker image is to build an image of minimal regions. Each minimal region has it's unique label. This label is the color of the water and will be used to see whether two different water touch each other. The algorithm at first fill the hierarchy FIFO with neighbors of all the regions who are not colored. Next it fetches the first pixel from the first non-empty FIFO and treats this pixel. This pixel will take the color of its neighbor, and all the neighbors who are not already in the H-FIFO are put in their correspondent FIFO. The process is over when the H-FIFO is empty. The result is a segmented and labeled image.
Segmentation algorithms are fast and simple technique used to obtain an image representation at different resolution levels, so they are widely used for image compression. Neither floating-point calculations nor large amounts of memory is required, so these algorithms can be easily implemented in relatively cheap and simple real-time systems. The proposed algorithm divides an image into rectangular blocks, which may overlap. The width and height of these blocks are set independently and can have optimal values from a preset range. Blocks are filled with a mean value of pixels from original image and their sizes are increased until the mean square error value for the block is smaller than the preset value. Next, the hardware implementation in single FPGA device is proposed. Paper also presents results obtained during off-line image compression. These results show better quality (in PSNR ratio) of restored images in compare to standard QuadTree algorithm. Simulations show that proposed hardware architecture can process standard monochrome CIF image with speed over 30 frames per second preserving low cost and high quality.
This paper addresses the problem of robust 2D image motion estimation in natural environments. We develop an adaptive tracking-region selection and optical-flow estimation technique. The strategy of adaptive region selection locates reliable tracking regions and makes their motion estimation more reliable and computationally efficient. The multi-stage estimation procedure makes it possible to discriminate between good and poor estimation areas, which maximizes the quality of the final motion estimation. Furthermore, the model fitting stage further reduces the estimation error and provides a more compact and flexible motion field representation that is better suited for high-level vision processing. We demonstrate the performance of our techniques on both synthetic and natural image sequences.
The three-step search (TSS) has played a key role in real time video encoding because of its light computational complexity, regularity of search rule, and reasonable performance for reduced computation. Many researches about modified TSS algorithms have been studied for reducing the amount of computation or improving the quality of the image predicted with obtained motion vector. This paper explains a new concept of hierarchical search in motion estimation for more reduction of computational complexity and better error performance compared with conventional modified TSS algorithms. The structure of the proposed algorithm is similar to that of the conventional TSS algorithm. The proposed algorithm, however, has different precision of search for each step. It will be shown that the proposed algorithm is very efficient in terms of speed up for computation and has improved error performance over the conventional modified TSS algorithms. Our proposed algorithm will be useful in software-based real-time video coding and low bit rate video coding.
In a previous work we have introduced the concept of a parameter-dependent connected component of gray-scale images that takes into account both the gray values of the pixels and the differences of the gray values of the neighboring pixels. This concept is a convenient tool to analyze or understand images at a higher level than the pixel level. In this paper, we describe an algorithm for finding the parameter-dependent components for a given image. We discuss different strategies used in the algorithm and analyze their effects through the experimental results. Since the proposed algorithm is independent of the formation of the images, it can be used for the analyzed of many types of images. The experimental results show that for some appropriate values of the parameters, the objects of an image may be represented by its parameter-dependent components reasonably well. Thus, the proposed algorithm provides us with the possibility of analyzing images further at the component level.
A new clustering technique is developed for segmentation of partially overlapped thin objects. The technique is based on an enhanced Voronoi diagram which partitions random data into clusters where intra-class members possess features of close similarity. An important aspect of this study consists of introducing predicting directional vectors, reminiscent of the first and second principal components, in order to achieve better partitioning of data clusters. Computer implementations of this new partitioning scheme illustrate superior partitioning performance over the standard Voronoi approach. It is shown that the new scheme minimizes the error in data classification. A mathematical framework is provided in support of this new clustering method. Experimental results on partitioning glass fibers are presented to illustrate application of the technique to object segmentation.
Optimal shape modeling of character-classes is crucial for achieving high performance on recognition of mixed-font, hand-written or (and) poor quality text. A novel scheme is presented in this regard focusing on constructing such structural models that can be hierarchically examined. These models utilize a certain `well-thought' set of shape primitives. They are simplified enough to ignore the inter- class variations in font-type or writing style yet retaining enough details for discrimination between the samples of the similar classes. Thus the number of models per class required can be kept minimal without sacrificing the recognition accuracy. In this connection a flexible multi- stage matching scheme exploiting the proposed modeling is also described. This leads to a system which is robust against various distortions and degradation including those related to cases of touching and broken characters. Finally, we present some examples and test results as a proof-of- concept demonstrating the validity and the robustness of the approach.
Edges in digital imagery can be identified from the zero- crossings of Laplacian of Gaussian (LOG) filtered images. Time or frequency-sampled LOG filters have been developed for the detection and localization of edges in digital image data. The image is decomposed into overlapping subblocks and processed in the transform domain. Adaptive algorithms are developed to minimize spurious edge classifications. In order to achieve accurate and efficient implementations, the discrete symmetric cosine transform of the input data is employed in conjunction with adaptive filters. The adaptive selection of the filter coefficients is based on the gradient criterion. For instance, in the case of the frequency-sampled LOG filter, the filter parameter is systemically varied to force the rejection of false or weak edges. In addition, the proposed algorithms easily extend to higher dimensions. This is useful where 3D medical image data containing edge information has been corrupted by noise. This paper employs isotropic and non-isotropic filters to track edges in such images.
We investigate a novel method for the retrieval of an arbitrary amplitude-object which is illuminated from the far-field and sampled through a stratified random medium of unknown statistics. The setup includes two observation paths, a CCD-based imaging system and a multiaperture interferometer placed in a plane conjugate to the entrance pupil of the imaging system. The interferometric baselines are arranged in closed loops to make the closure phase insensitive to random refractive fluctuations. The method may be beneficial to applications such as surveillance, speckle interferometry and biomedical imaging.
This paper presents a method for fast surface matching. The algorithm handles all six degrees of freedom and is based on the curvature of a surface. Two surfaces are sampled at discrete points and represented as a set of 3D verteces. The sampling rate is supposed to be at least the double of the nyquist frequency. Steps in the surface lead to a curvature value higher than a threshold. The related verteces are marked and not taken into account for any further calculation. The gaussian curvature of the two surfaces is computed. Then a certain number of feature points are extracted out of the surfaces. These feature points are connected to create triangles. Similar triangles found in both surfaces are compared. It they match the rotation between these two triangles can be computed. A transformation histograms determines the rotation with the highest probability and a sequencing displacement calculation specifies a displacement between the triangles will the best likelihood. Only the displacement between the triangles contributing to the calculated orientation vote for the correct displacement. The exact matching is done by a least square optimization procedure considering only the triangles connected with the initial transformation and possessing the same parameters in both surfaces such as size and form. The proposed method is applicable on range images without any edges or known reference points as it is based on free-form surface inherent features.
As the performance of systems for surveillance, reconnaissance, target detection, target recognition and target identification increases in competition with the increased skill in reduction of IR-signatures, there has been an increasing demand for analyzing and predicting the spatial properties of targets and backgrounds. The temporal variations of spatial properties, measured as texture, for object and background is of vital importance for target detection and assessment of signature reduction methods. One important question to be answered is: how does the texture for objects and backgrounds vary as a function of environment parameters e.g. weather? If that question could be answered, one important part of the problem of performing signature forecast could be solved. In an attempt to predict the dependences between spatiotemporal IR-signatures and weather parameters, the diurnal time series of different texture measures for different areas in a natural background scene have been measured and related to different weather parameters e.g. incidence, temperature and humidity. Examples of covariations between texture measures and weather parameters will be given in the paper.
In this paper, a multilevel Ising search method for human face detection is proposed to speed up the search. In order to utilize the information obtained from the previous searched points. Ising model is adopted to represent the candidates of `face' positions and is combined with the scale invariant human face detection method. In the face detection, the distance from the mean vector of `face' class in discriminant space represents the likelihood of face. By integrating the measured distance into the energy function of Ising model as the external magnetic field, the search space is narrowed down effectively (the candidates of `face' are reduced). By incorporating color information of face region in the external magnetic field, the `face' candidates can be reduced further. In the multilevel Ising search, face candidates (spins) with different resolutions are represented in a Pyramidal structure and the coarse-to-fine strategy is taken. We demonstrate that the proposed multilevel Ising search method can effectively reduce the search space and can detect human face correctly.
This paper presents diagonal forms of matrices representing symmetric convolution which is the underlying form of convolution for discrete trigonometric transforms. Symmetric convolution is identically equivalent to linear convolution for appropriately zero-padded sequences. These diagonal forms provide an alternate derivation of the symmetric convolution-multiplication property of the discrete trigonometric transforms. Derived in this manner, the symmetric convolution-multiplication property extends easily to multiple dimensions, and generalizes to multidimensional asymmetric sequences. The symmetric convolution of multidimensional asymmetric sequences can then be accomplished by taking the product of the trigonometric transforms of the sequences and then applying an inverse transform to the result. An example is given of how this theory can be used for applying a 2D FIR filter with nonlinear phase which models atmospheric turbulence.
Snakes are active contours that minimize an energy function. Sandwich snakes are formed by two snakes, one inside and the other outside of the contour that one is looking for. They have the same number of particles, which are connected in correspondence one to one. At the global minimum the two snakes have the same position.
In this paper a method for noise reduction in ocular fundus image sequences is described. The eye is the only part of the human body where the capillary network can be observed along with the arterial and venous circulation using a non invasive technique. The study of the retinal vessels is very important both for the study of the local pathology (retinal disease) and for the large amount of information it offers on systematic haemodynamics, such as hypertension, arteriosclerosis, and diabetes. In this paper a method for image integration of ocular fundus image sequences is described. The procedure can be divided in two step: registration and fusion. First we describe an automatic alignment algorithm for registration of ocular fundus images. In order to enhance vessel structures, we used a spatially oriented bank of filters designed to match the properties of the objects of interest. To evaluate interframe misalignment we adopted a fast cross-correlation algorithm. The performances of the alignment method have been estimated by simulating shifts between image pairs and by using a cross-validation approach. Then we propose a temporal integration technique of image sequences so as to compute enhanced pictures of the overall capillary network. Image registration is combined with image enhancement by fusing subsequent frames of a same region. To evaluate the attainable results, the signal-to-noise ratio was estimated before and after integration. Experimental results on synthetic images of vessel-like structures with different kind of Gaussian additive noise as well as on real fundus images are reported.
Machine vision and imaging processing techniques have been increasingly important for the fruit industry, especially when applied to quality inspection and defect sorting applications. However, automating the defect sorting process is still a challenging project due to the complexity of the process. One of the biggest difficulties involved in the technology of automated machine vision inspection of fruit defects is how to distinguish the stem-end (stem cavity) and calyx (bloom bottom) for true defects such as bruises, insect damages, and blemishes. Traditional mechanical, image processing, and structured lighting methods are proved to be unable to solve this problem due to their limitations in accuracy, speed, and so on. In this paper, a novel method is developed based on dual-wavelength infrared imaging using both near infrared and mid infrared cameras. This method enables a quick and accurate discrimination between true defects and stem-ends/calyxes. The obtained results have significant meanings to automated apple defect detection and sorting.
An important feature of human vision system is the ability of selective visual attention. The stimulus that reaches the primate retina is processed in two different cortical pathways; one is specialized for object vision (`What') and the other for spatial vision (`Where'). By this, the visual system is able to recognize objects independently where they appear in the visual field. There are two major theories to explain the human visual attention. According to the Object- Based theory there is a limit on the isolated objects that could be perceived simultaneously and by the Space-Based theory there is a limit on the spatial areas from which the information could be taken up. This paper deals with the Object-Based theory that states the visual world occurs in two stages. The scene is segmented into isolated objects by region growing techniques in the pre-attentive stage. Invariant features (moments) are extracted and used as input of an Artificial Neural Network giving the probable object location (`Where'). In the focal-stage, particular objects are analyzed in detail through another neural network that performs the object recognition (`What'). The number of analyzed objects is based on a top-down process doing a consistent scene interpretation. With Visual Attention is possible the development of more efficient and flexible interfaces between low sensory information and high level process.
The transmission of multi-scale digital map information via a wireless link of the personal digital assistant (PDA) system is investigated in this work. We consider a digital map representation of the vector model consisting of 29 layers, in which the road layer plays the most important role. Based on the street segment length, the road layer is classified into different scales. A multi-scale map database can be constructed by adding the classification information without modifying the original database. Unlike the conventional digital map service, where the retrieved map data is first generated as a bitmap image, compressed at the server side and then transmitted to the remote client via wired link, we propose a new approach that can overcome the narrow bandwidth of the wireless channel. The basis idea is to transmit the map drawing commands rather than the rendered bitmap data by assuming that the PDA has the sufficient computational power to render the map at the client side. Preliminary experiments have been done to verify the effectiveness of the proposed scheme. It is demonstrated that the acceptable transmission through the wireless channel of 8 Kbps can be achieved.
Fast motion-compensated frame interpolation (FMCI) schemes for the decoder of the block-based video codec operating in low bit rates are examined in this paper. The main objective is to improve the video quality by increasing the frame rate without a substantial increase in the computational complexity. Two FMCI schemes are proposed depending on the motion vector mapping strategy, i.e. the non-deformable and the deformable block-based FMCI schemes. They provide a trade-off of the computational complexity and the visual performance. With proposed schemes, the decoder can perform frame interpolation using motion information received from the encoder. The complexity of FMCI is reduced since no additional motion search in the decoder is needed as required by standard MCI. It has been observed from experimental results that the visual quality of coded low- bit-rate video is significantly improved at the expense of a small increase in decoder's complexity.
High quality video compression is necessary for reduction of transmission bandwidth and in archiving applications. We propose a compression scheme which, depending on the available bandwidth, can vary from lossless compression to lossy compression, but always with guaranteed quality. In the case of lossless compression, the customer receives the original content without any loss. Even the lower compression ratios obtained with lossless compression can represent significant savings in the communication bandwidth. In the case of lossy compression, the maximum error between recovered and the original video is mathematically bounded. The amount of compression achieved is a function of the error bounds. Furthermore, errors are statistically independent from the video content, and thus guaranteed not to create any type of artifacts. So the recovered video has the same quality, visually indistinguishable from the original, at all times and all motion conditions.
In a pseudo-color (color-mapped) image pixel values represent indices that point to color values in a look-up table. Well-known linear predictive schemes, such as JPEG and CALIC, perform poorly when used with pseudo-color images, while universal compressors, such as Gzip, Pkzip and Compress, yield better compression gain. Recently, Burrows and Wheeler introduced the Block Sorting Lossless Data Compression Algorithm (BWA). The BWA algorithm received considerable attention. It achieves compression rates as good as context-based methods, such as PPM, but at execution speeds closer to Ziv-Lempel techniques. The BWA algorithm is mainly composed of a block-sorting transformation which is known as Burrows-Wheeler Transformation (BWT), followed by Move-To-Front coding. In this paper, we introduce a new block transformation, Linear Order Transformation (LOT). We delineate its relationship to BWT and show that LOT is faster than BWT transformation. We then show that when MTF coder is employed after the LOT, the compression gain obtained is better than the well-known compression techniques, such as GIF, JPEG, CALLIC, Gzip, LZW (Unix Compress) and the BWA for pseudo-color images.
Conventional transform coding schemes such as JPEG process the spectrum signal in a block by block manner due to its simple manipulation; nevertheless it does not consider the similarity of different spectrums. The proposed method devises a translation function, which reorganizes the individual spectrum data to generate the global spectrums according to their frequency band. Among those different bands, high similarity characteristic is existing. Our algorithm analyzes the similarity of those different spectrum bands to reduce the bit rate of transmission or storage. Simulations are carried to many different nature images to demonstrate that the proposed method can improve the performances when compared with other existing transform coding schemes especially at very low bit rate (below 0.25 bpp) requirement.
A low bit rates, the bit budget for I-frame coding in H.263+ can be too high to be practical. A hybrid DCT/wavelet transform based I-frame coding is proposed in this work as a solution to the rate control problem. This new coder is compatible with the H.263+ bit stream syntax, and aims at an R-D optimized performance with a reasonable amount of computational complexity. By employing fast estimation of the coding efficiency with a rate- distortion model and performing an R-D based rate allocation, the hybrid coding scheme achieves higher coding gain at low bit rates.
A new entropy codec, which can recover quickly from the loss of synchronization due to the occurrence of transmission errors, is proposed and applied to wireless image transmission in this research. This entropy codec is designed based on the Huffman code with a careful choice of the assignment of 1's and 0's to each branch of the Huffman tree. The design satisfies the suffix-rich property, i.e. the number of a codeword to be the suffix of other codewords is maximized. After the Huffman coding tree is constructed, the source can be coded by using the traditional Huffman code. Thus, this coder does not introduce any overhead to sacrifice its coding efficiency. Statistically, the decoder can automatically recover the lost synchronization with the shortest error propagation length. Experimental results show that fast synchronization recovery reduces quality degradation on the reconstructed image while maintaining the same coding efficiency.
The volume of medical image data is expected to increase dramatically in the next decade due to the large use of radiological image for medical diagnosis. The economics of distributing the medical image dictate that data compression is essential. While there is lossy image compression, the medical image must be recorded and transmitted lossless before it reaches the users to avoid wrong diagnosis due to the image data lost. Therefore, a low complexity, high performance lossless compression schematic that can approach the theoretic bound and operate in near real-time is needed. In this paper, we propose a hybrid image coder to compress the digitized medical image without any data loss. The hybrid coder is constituted of two key components: an embedded wavelet coder and a lossless run-length coder. In this system, the medical image is compressed with the lossy wavelet coder first, and the residual image between the original and the compressed ones is further compressed with the run-length coder. Several optimization schemes have been used in these coders to increase the coding performance. It is shown that the proposed algorithm is with higher compression ratio than run-length entropy coders such as arithmetic, Huffman and Lempel-Ziv coders.
This paper addresses the problems of how to exploit the space and frequency properties of the wavelet coefficients, and how to design a wavelet packet coder optimally in the rate and distortion sense. From the localization properties of the wavelets, the best quantizer for a wavelet coefficient is expected to match its local characteristics, i.e., to be adaptive both in space and frequency domain. Previous image coders tended to design quantizer in a band or a class level, which limited their performances as it is difficult for the localization properties of wavelets to be exploited. Contrasting with previous coders, we introduce a new image coding framework, where the compaction properties in frequency domain are exploited through the selection of wavelet packets, and the compaction properties in space domain are exploited with the tree-structured wavelet representations. For each wavelet coefficient, its model is estimated from the quantized causal neighborhoods, therefore, the optimal quantizer is spatial-varying and rate sensitive, and the optimization problem is no longer a joint optimization problem as in the SFQ-like coders. The simulation results demonstrate that the proposed coding performance is competitive, and often is superior than those of state of art zerotree-based coding schemes.
The Joint Photographic Experts Group (JPEG) within the ISO international standards organization is defining a new standard for still image compression--JPEG-2000. This paper describes the Wavelet Trellis Coded Quantization (WTCQ) algorithm submitted by SAIC and The University of Arizona to the JPEG-2000 standardization activity. WTCQ is the basis of the current Verification Model being used by JPEG participants to conduct algorithm experiments. The outcomes from these experiments will lead to the ultimate specification of the JPEG-2000 algorithm. Prior to describing WTCQ and its subsequent evolution into the initial JPEG-2000 VM, a brief overview of the objectives of JPEG-2000 and the process by which it is being developed is presented.
We extend the work of Sherwood and Zeger to progressive video coding for noisy channels. By utilizing a 3D extension of the set partitioning in hierarchical trees (SPIHT) algorithm, we cascade the resulting 3D SPIHT video coder with a rate-compatible punctured convolutional channel coder for transmission of video over a binary symmetric channel. Progressive coding is achieved by increasing the target rate of the 3D embedded SPIHT video coder as the channel condition improves. The performance of our proposed coding system is acceptable at low transmission rate and bad channel conditions. Its low complexity makes it suitable for emerging applications such as video over wireless channels.
A computationally efficient postprocessing technique to reduce compression artifacts in low-bit-rate video coding is proposed in this research. We first formulate the artifact reduction problem as a robust estimation problem. Under this framework, the artifact-free image is obtained by minimizing a cost function that accounts for smoothness constraints as well as image fidelity. Instead of using the traditional approach that applies the gradient descent search for optimization, a set of nonlinear filters is proposed to determine the approximating global minimum to reduce the computational complexity so that real-time postprocessing is possible. We have performed experimental results on the H.263 codec and observed that the proposed method is effective in reducing severe blocking and ringing artifacts, while maintaining a low complexity and a low memory bandwidth.
A new scheme to search perceptually significant wavelet coefficients for effective digital watermark casting is proposed in this research. An adaptive method is developed to determine significant subbands and select a number of significant coefficients in these subbands. Experimental results show that the cast watermark can be successfully retrieved after various attacks including signal processing, geometric processing, noise adding, JPEG and wavelet-based compression methods.
An enhancement to a previously developed Karhunen- Loeve/discrete cosine transform-based multispectral bandwidth compression technique is presented. This enhancement is achieved via addition of a spectral screening module prior to the spectral decorrelation process. The objective of the spectral screening module is to identify a set of unique spectral signatures in a block of multispectral data to be used in the subsequent spectral decorrelation module. The number of unique signatures found will depend on the desired spectral angle separation, irrespective of their frequency of occurrence. This set of unique spectral signatures, instead of the signature of each and every point of the block of data, will be used to construct the spectral covariance matrix and the resulting Karhunen-Loeve spectral transformation matrix that is used to spectrally decorrelate the multispectral images. The significance of this modification is that the covariance matrix so constructed will not be entirely based on the statistical significance of the individual spectral in the block but rather on the uniqueness of the individual spectra. Without this added spectral screening feature, small objects and ground features would likely be manifested in the low eigen planes mixed with all of the noise present in the scene. Since these lower eigen planes are coded via the subsequent JPEG compression module at a much lower bit rate, the fidelity of these small objects will be severely impacted by the compression-induced error. However, the addition of the proposed spectral screening module will relegate these small objects into the higher eigen planes and hence will greatly enhance preservation of their fidelities in the compression process. This modification alleviates the need to update the covariance matrix frequently over small sub-blocks, resulting in a reduced overhead bit requirement and a much simpler implementation task.
This paper introduces two image partition boundary coding models that are composed solely of binary decisions. Because of their simplified decision structure, the models can take advantage of various accelerating schemes for binary arithmetic coding. The number of decisions necessary to describe a partition using either model varies between one and two per pixel location and is proportional to partition complexity. The first model is a binary decomposition of Steve Tate's neighboring edge model. The decomposition employs boundary connectivity constraints to reduce the number of model parameters. The constraints also reduce the number of descriptive decisions to just over one per pixel for typical partitions. A theoretical zero order entropy bound of 1.6 bits per pixel also results. The second model represents a partition as a sequence of strokes. A stroke consists of one or two three-way chains. Chain termination is accomplished without redundant boundary traversal by using a special termination decision at encounters with previously drawn chains. Chain initiation decisions are also conditioned on previously drawn edge patterns. Chain direction decisions are conditioned via a boundary state machine. The paper compares object based boundary coding and pixel based coding, placing the new coders into the latter category. A technique for determining the appropriate application domain of pixel based codes is developed. The new coding models are placed into context with previous pixel based work by the development a new categorization of image partition representations. Four representations are defined, the map coloring, the edge map, the outline map, and the perimeter map. Experiments compare the new methods with other pixel based methods and with a canonical object based method.
Simple filters used to restore blurred images require knowledge of the point spread function (PSF) of the blurring system. Unfortunately such knowledge is usually not available when the blur is caused by relative motion between the camera and the scene. Various methods addressing this problem were developed in the last four decades. These methods can be divided into two types: direct methods whereby the restoration process is performed in a one step fashion, and indirect methods whereby the restoration process is performed by an iterative technique. Direct methods usually require identification of the PSF as a first step, and then use it to restore the blurred image with a simple filter. Lately, a new direct method was developed. As a result of this development, direct restoration methods (given only a single blurred image) are studied and compared in this paper for a variety of motion types. Various criteria such as quality of restoration, sensitivity to noise and computation requirements are considered.
We proposed a perceptual image compression method by using the wavelet transform. This method is different from conventional wavelet coding schemes in that the Human Visual System characteristics is used in the quantization steps. Rather than the amplitude of wavelet coefficients, the contrasts of each resolution are coded. The resulting compression scheme is able to distribute the visual error uniformly over the whole image thus the visual artifact at low bit rate is minimized. Experimental results are given to show the superior visual performance of the new method in comparison with those of the conventional wavelet coders.
In this research, we examine the problem of real-time video streaming over the Internet by introducing an adaptive least-mean-squares (LMS) bandwidth controller to adjust the amount of video data uploaded to the network so that the packet loss can be minimized in face of network congestion. The adaptive LMS bandwidth controller, which resides at the client end, sends a feedback signal to the server regarding the available bandwidth that can be supported by the network at a specified packet loss rate. The available bandwidth is continuously updated with the everchanging network conditions. Simulation results are provided to demonstrate the superior performance of the proposed LMS bandwidth controller.
A visual pattern-based image compression technique is presented, in which 4 X 4 image blocks are classified in perceptually significant `shade' and `edge' classes. The proposed technique attempts to make use of neighboring blocks to encode a shade or an edge block by exploiting the Human Visual System characteristics. To reduce correlation present in the shade regions of an image, the mean intensity of a shade block is predicted from the neighboring shade blocks, and the error mean is computed. The error mean of a block is then encoded by choosing an appropriate quantizer based on its predicted mean. The quantizer has been designed after a careful study of the distribution of the error mean of shade blocks in test images, based on Weber's law, to maximize the compression ratio without introducing any visible error. Higher dimension shade blocks (8 X 8 and 16 X 16) are also formed, by merging adjacent shade blocks which further reduces the inter-block correlation. An edge block is assumed to contain two uniform intensity regions (low and high intensity) separated by a transition region. Hence, an edge block can be encoded by coding its edge pattern, low or high intensity and gradient. In order to reduce the inter-block correlation, the edge pattern and mean intensity (low or high) are predicted. The mean intensity of error is encoded by using an appropriate quantizer. Therefore, this technique achieves higher compression ratios, as compared to other visual pattern- based techniques, at very low computational complexity.
This paper investigates the compactness aspect of two transforms namely DCT and WHT. We define a parameter called Activity Index (AI) of the image which is the ratio of the first derivative energy in the edges and the total first derivative energy of the image. Sobel edge operator is used to obtain the edges in the image and the first derivative energy in these edges is computed. It is demonstrated that the activity index provides a good measure of the relative compression which the two transforms would give and therefore can help in choosing the transform for better compression of an image. Computations show that the WHT performs better for images having higher AI (close to 1) whereas if the AI is small (less than 0.5) the DCT's performance is superior. If AI is close to 0.5 both transforms give more or less same compression. The algorithm is tested on a variety of natural as well as robotic type of images.
Block-matching motion estimation algorithms (BMAs) are widely used to eliminate temporal redundancies for video coding. For BMAs, there is an implicit assumption that the motion within each block is uniform. It is not always valid if the fixed block size is not approximate to the real object in an image. Then the block effect will be noticeable and the quality of the prediction suffers. In this paper, the block-classified motion estimation algorithm is presented. The proposed algorithm classifies the frame into stationary and moving object blocks. The object blocks are then adaptively segmented into different regions according to their motion and edge characteristics. The proposed method can estimate the edge blocks accurately. Experimental results show that this scheme has better performance in terms of objective and subjective measures than the full search and variable block-size quadtree segmentation motion estimation algorithms.
A complexity and visual quality analysis of several fast motion estimation (ME) algorithms for the emerging MPEG-4 standard was performed as a basis for HW/SW partitioning for VLSI implementation of a portable multimedia terminal. While the computational complexity for the ME of previously standardized video coding schemes was predictable over time, the support of arbitrarily shaped visual objects (VO), various coding options within MPEG-4 as well as content dependent complexity (caused e.g. by summation truncation for SAD) introduce now content (and therefore time) dependent computational requirements, which can't be determined analytically. Therefore a new time dependent complexity analysis method, based on statistical analysis of memory access bandwidth, arithmetic and control instruction counts utilized by a real processor, was developed and applied. Fast ME algorithms can be classified into search area subsampling, pel decimation, feature matching, adaptive hierarchical ME and simplified distance criteria. Several specific implementations of algorithms belonging to these classes are compared in terms of complexity and PSNR to ME algorithms for arbitrarily and rectangular shaped VOs. It is shown that the average macroblock (MB) computational complexity per arbitrary shaped P-VOP (video object plane) depicts a significant variation over time for the different motion estimation algorithms. These results indicate that theoretical estimations and the number of MBs per VOP are of limited applicability as approximation for computational complexity over time, which is required e.g. for average system load specification (in contrast to worst case specification), for real-time processor task scheduling, and for Quality of Service guarantees of several VOs.
In this work, we present a postprocessing technique applied to a 3D graphic model of a lower resolution to obtain a visually more pleasant representation. Our method is an improved version of the Butterfly subdivision scheme developed by Zorin et al. Our main contribution is to exploit the flatness information of local areas of a 3D graphic model for adaptive refinement. Consequently, we can avoid unnecessary subdivision in regions which are relatively flat. The proposed new algorithm not only reduces the computational complexity but also saves the storage space. With the hierarchical mesh compression method developed by Li and Kuo as the baseline coding method, we show that the postprocessing technique can greatly improve the visual quality of the decoded 3D graphic model.
A wavelet-based image codec compresses an image with three major steps: discrete wavelet transform, quantization and entropy coding. There are many variants in each step. In this research, we consider a versatile software development system called the wavelet compression research platform (WCRP). WCRP provides a framework to host components of all compression steps. For each compression stage, multiple components are developed and they are contained in WCRP. They include a selection of floating-point and integer filter sets, different transform strategies, a set of quantizers and two different arithmetic coders. A codec can be easily formed by picking up components in different stages. WCRP provides an excellent tool to test the performance of various image codec designs. In addition, WCRP is an extensible system, i.e., new components available in the future can be easily incorporated and quickly tested. It makes the development of new algorithms much easier. WCRP has been used in developing a family of new quantization algorithms that are based on the concept of Binary Description of multi-level wavelet coding objects. These quantization schemes can serve different applications, such as progressive fidelity coding, lossless coding and low complexity coding. Both progressive fidelity coding and lossless coding performance of our codec are among the best in its class. A codec of low implementational complexity is made possible by our memory-scalable quantization scheme.
Compression of a noisy source is usually a two stage problem, involving the operations of estimation (denoising) and quantization. A survey of literature on this problem reveals that for the squared error distortion measure, the best possible compression strategy is to subject the noisy source to an optimal estimator followed by an optimal quantizer for the estimate. What we present in this paper is a simple but sub-optimal vector quantization (VQ) strategy that combines estimation and compression in one efficient step. The idea is to train a VQ on pairs of noisy and clean images. When presented with a noisy image, our VQ-based system estimates the noise variance and then performs joint denoising and compression. Simulations performed on images corrupted by additive, white, Gaussian noise show significant denoising at various bit rates. Results also indicate that our system is robust enough to handle a wide range of noise variations, while designed for a particular noise variance.
In this paper, two schemes for optimizing the JPEG quantizer are investigated. The first scheme starts from a given quantization-table and executes a sub-optimal search for quantization-table parameters. Those parameters, when changed by a pre-specified incremental step-size, result in the most optimal move towards decreasing (increasing) the compressed file-size, with a minimal increase (maximum decrease) of error. This procedure is repeated, until a pre- specified file-size is reached. The second scheme is based on performing a mapping from the JPEG default quantization- table to an optimized one. This mapping adapts the default quantization-table to the statistics of the DCT coefficients of the image at hand. The superiority of these two optimized JPEG quantizers is established in terms of the visual quality of their reconstructed images. This was done by running comparative visual image quality experiments involving 20 human subjects. These optimized quantizers were also demonstrated to result in a higher level of machine image quality, by improving the accuracy of cheque amount reading applications. Experimental results indicate that the second optimization scheme yields a higher level of visual image quality, while requiring a fraction of the processing time used by the first scheme.
We propose to minimize a cost function, which depends on the values of the input signal to a linear time-invariant system, to reach an optimal estimation of this input signal. This cost function is the square of the error signal between the output and the convolution of the estimated input with the blurring system. The minimization of the cost function is done using an optimization technique which requires the use of an initial estimation of the input signal. Van- Cittert deconvolution method gives this required initial estimation. Singular value decomposition technique is used in estimating the improved input signal.
A new super resolution algorithm is proposed which can provide bandwidth extension of noisy images in a small number of iterations and is potentially capable of real time operation. Attempts to restore band-limited images frequently introduce ringing artifacts. Methods designed to reduce this ringing often suppress sharp features in the scene. In images with a well-defined background intensity, super resolution techniques involving a positive constraint are effective in suppressing this ringing and providing a high degree of bandwidth extension. Problems arise, however, in a general image where no such well-defined background exists. The first stage of the algorithm reported here addresses these problems by computing an effective background. Features that need to be enhanced then exist as blurred deviations from this background. The background is computed from the first and second differentials of the image with respect to further blurring. It has been possible to suppress ringing artifacts, resulting in bandwidth extension, by comparing the calculated background with the known original blurred image. An iterative procedure based on Gerchberg's error energy reduction technique has produced good results. Computer calculations applied to both synthetic images and real millimeter wave images show that the algorithm is effective, efficient and largely immune to noise.
This paper demonstrates results of wavelet-based restoration for scenes with pixel-scale features and various degrees of smoothness. The model of choice is the so-called C/D/C system model that represents the image acquisition process by accounting for system blur, for the effects of aliasing, and for additive noise. Wavelet domain modeling discretizes both the image acquisition kernel and the representations of scenes and images. In this way the image restoration problem is formulated as a discrete least squares problem in the wavelet domain. The treatment of noise is related to the singular values of the image acquisition kernel. We show that pixel-scale features can be restored exactly in the absence of noise, for various degrees of smoothness. Results are similar in the presence of noise, except for some noise- amplification and ringing artifacts that we control with an automated choice of a restoration parameter. This paper extends work in wavelet-based restoration, and builds on research in C/D/C model-based restoration.