PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This paper presents a general procedure for determining the optimal MPEG coding strategy in terms of the selection of macroblock coding modes and quantizer scales. The two processes of coding mode decision and rate control are intimately related to each other and should be determined jointly in order to achieve optimal coding performance. We formulate the constrained optimization problem and present solutions based upon rate- distortion characteristics, or R(D) curves, for all the macroblocks that compose the picture being coded. Distortion of the entire picture is assumed to be decomposable and expressible as a function of individual macroblock distortions, with this being the objective function to minimize. The determination of the optimal solution is complicated by the MPEG differential encoding of motion vectors and dc coefficients, which introduces dependencies that carry over from macroblock to macroblock for a duration equal to the slice length. Once the upper bound in performance is calculated, it can be used to assess how well practical sub-optimum methods perform.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a two-layer sequence image coding algorithm based on residual block matching using fractal approximation. First, the motion compensation (MC) error signal is encoded by the discrete cosine transform (DCT). The motion vector and DCT coefficients are transmitted as the first layer and the residual signal of MC/DCT is encoded by fractal approximation and transmitted as the second layer. The second layer is encoded by the matching block selected from a dynamic residual pool. The reconstructed MC error image is used as a dynamic residual signal which is called a domain pool in conventional fractal coding. The computer simulation result by the proposed methods and the DCT-based methods shows that the performance improvement by the proposed method is significant.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a method to recover good quality pictures from channel errors in the transmission of coded video sequences. This work is basically an extension of the work presented in to video sequence coding. Transmitted information in most video coding standards is mainly composed of motion vectors (MV) and motion-compensated prediction errors (MCPE). The compressed data are transmitted in binary form through a noisy channel in general. Channel errors in this bit stream result in objectionable degradations in consecutive reconstructed frames. Up to now, there have been many studies for concealing the effects of channel errors on the reconstructed images, but they did not consider recovery of the actual binary data and instead just utilized replacement and/or interpolation techniques to make errors less visible to an observer. In order to have a simple and powerful method to recover the video sequences separately in MV and MCPE from errors, it is necessary to take full advantage of both the source and channel characteristics. The proposed method takes advantage of single-bit-error dominance in a received bit string by using a parity bit for error detection. It also takes advantage of high pixel correlation in usual images by using side- match criterion in selecting the best-fit among candidate replacements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image compression based on quantizing the image in the discrete cosine transform (DCT) domain can generate blocky artifacts in the output image. It is possible to reduce these artifacts and RMS error by adjusting measures of block edginess and image roughness, while restricting the DCT coefficient values to values that would have been quantized to those of the compressed image. This paper presents a fast algorithm to replace our gradient search method for RMS error reduction and image smoothing after adjustment of DCT coefficient amplitude.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At this time, almost all (de-facto) video coding standards are DCT based. It would be wrong though to think that DCT is the only practical way to reach a reasonable compression ratio for a reasonable codec complexity. In this paper, a description of a complete hybrid codec based on OLA (Optimal Level Allocation) and HVS (Human Visual System) based classification is given. Then a performance comparison is made between this codec and an MPEG-2 like codec.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
ISO/IEC MPEG-2 Test Model 5 (TM5) describes a rate control method which consists of three steps: bit allocation, rate control and modulation. There are basically two problems with the TM5 rate control. First, the quantization parameter for a macroblock is fully dependent upon the channel buffer fullness. Hence, macroblocks in a picture may not all be treated equally because of variations in buffer fullness, which may result in nonuniform picture quality. Secondly, the TM5 rate control does not handle scene changes properly because the target bit rate for a picture is determined based only on the information obtained from encoding of the pervious pictures. This paper presents a rate control approach which addressed these two problems associated with the TM5 rate control. A single quantization parameter is used for each picture, which guarantees that all the macroblocks in a picture are treated equally. To address the impact of scene changes on picture quality, we propose to code the first scheduled P picture after scene change as an I picture and the I picture in the following group of picture as a P picture. The simulation results demonstrate that a significant improvement is obtained using the proposed rate control.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a performance evaluation of loop filtering in a generic hybrid video coding algorithm using the projection onto convex sets (POCS) method. This study will be conducted in terms of both objective and subjective image quality metrics as well as compression gain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The subject of the extraction of a dedicated fast playback stream from a normal play MPEG encoded stream is important for recording applications where fast playback is supported by such a device. The most important issue is the selection of the coefficients of codewords to retain for the fast playback stream. In this paper several codeword extraction methods of varying complexity, ranging from optimal extraction methods to a zonal extraction method are evaluated. The range of possible solutions gives the possibility to make a trade-off between performance and complexity. The newly developed selection method, based on a Lagrangian cost minimization per block in combination with a feedback rate control, yields an attractive performance-complexity combination.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a symmetry measurement based on the correlation coefficient. This symmetry measurement is used to locate the center line of faces, and afterward, to decide whether the face view is frontal or not. A 483-face image database obtained from the U.S. Army was used to test the algorithm. Though the performance of the algorithm is limited to 87%, this is due to the wide range of variations present in the database used to test our algorithm. Under more constrain conditions, such as uniform illumination, this technique can be a powerful tool in facial feature extraction. In regards its computational requirements, though this algorithm is very expensive, three independent optimizations are presented; two of which are successfully implemented, and tested.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An approach obtaining three dimensional object shapes from a series of silhouette images is presented. A two dimensional image sequence of an object put on a turning table is taken by a video camera at regular time intervals to obtain silhouette images from multiple viewpoints. A pillar is obtained by sweeping the silhouette along a line parallel to the viewing direction. An intersection of the pillars from all the viewpoints surrounding the object is sampled to give a set of volume data. The segmentation and the modeling of potted plants are also studied as an application of computer graphics. The set of volume data is segmented into three parts of a potted plant, i.e., a pot, stems and leaves. Then, the pot and the stems are modeled by frustums and each of the leaves is approximated by two Bezier surface patches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a computationally efficient 3D object surface matching algorithm. In the proposed method, object and model surfaces are scaled to be in a unit cube in the 3D space. They are then sliced along the magnitude axis and the resultant object and model surface cross sections are represented in binary image format. The cross- sections' centroids of an unknown object and the models of different shapes are computed in their respective binary images. The resultant cross-sections are translated to the origin of the spatial plane using the centroids. Major and minor axes of the plane cross sections are aligned with the coordinate axes of the spatial plane. Matching of the aligned cross sections is done in the direction of the gradient of the cross section boundary by computing the shape deformation as the Euclidean distance between the object boundary points and the corresponding points in the model cross section boundary. The shape deformation distances measured in different cross sections are average and the minimum average shape deformation distance is used to identify the model best matching to the object of unknown classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spectral approach to distribution-free classification is presented. The discriminant function is based on generalized unconditional tests. The main steps of the algorithms are: (1) finding a set of deadlock generalized tests, (2) computing a local discriminant function for each such a test, and (3) performing the actual classification of the observed pattern into a class. The spectral algorithms involve computation of the Walsh and Reed-Muller (conjunctive) spectra using fast algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A method to determine positions and rotational angles of 3D objects in stereo images is proposed in this paper. First, range data of edge points of a left image are calculated by a stereo matching method. Next, a three dimensional model is rotated, translated and projected to a 2D plane, and the edges of the projected image are compared with those of the left image. The space transformation parameters which give the maximum matching ratio are searched by a Genetic Algorithm;GA. In the searching process of the proposed method, a set of space transformation parameters is regarded as chromosome of an individual, and a randomly generated population is evolved according to GA rules. Principle of the method and several experimental results are described.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A system combining the BackPropagation Network (BPN) and the Distributed Associative Memory (DAM) for 2D pattern recognition is proposed. In the system, a sequence of image processes and transformations, including complex transform, Laplacian, and Fourier transform are used for invariant feature extraction. Two modified neural networks are proposed for pattern recognition: (1) the DAM combined with BPN, and (2) the BPN improved by DAM. In the DAM combined with BPN, a fine training is provided by the BPN to take the pattern variations within each class into the training procedure. Experimental results indicate that this improved DAM has higher recognition rates compared to a traditional DAM. In the BPN improved by DAM, the weights of the first layer used the memory matrix of DAM as initial values. This network is compared with the BPN. Experimental results show that this network not only has slightly higher recognition rate, but also requires less training time than a BPN. Finally, the system is also tested with noisy patterns. According to the experiments results, it is found that the system also has high recognition rate even on the noisy images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose an effective segmentation and recognition algorithm for range images. The proposed recognition system based on the hidden Markov model (HMM) and back-propagation (BP) algorithm consists of three parts: segmentation, feature extraction, and object recognition. For object classification using the BP algorithm we use 3D moments, and for surface matching using the HMM we employ 3D features such as surface area, surface type and line lengths. Computer simulation results show that the proposed system can be successfully applied to segmentation and recognition of range images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A pyramid classifier is proposed for large vocabulary Chinese characters, which at first uses low resolution features to roughly classify the input character, and then used higher resolution features to make finer classification stage by stage. In addition to the rule- based preclassification, there are three stages to achieve recognition. The number of candidate categories is reduced step by step. We use one thousand categories of Chinese characters for experiments. Simulation results show that this classifier can recognize the input character with 93.1% and 90% accuracy on the training set and the test set respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents an algorithm attempting to combine both intra-frame and interframe information to reconstruct lost macroblocks due to imperfect communication channels when decoding a MPEG bitstream. The algorithm is a POCS-based (Projection Onto the Convex Set) iterative restoration algorithm incorporating both the temporal and spatial constraints derived from a set of analysis performed on the picture sequence. Often the use of the temporal information in the restoration process is complicated by the scene changes or large random motion activities. To reliably utilize the temporal information, we formulate a series of test to determine the usefulness of the temporal information. In addition, the tests yield a temporal constraint if the temporal information is deemed good. Along with the spatial constraints as described in [1], the temporal constraint is used in the proposed iterative restoration algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A simple spatial-domain multiresolution scheme is presented for preserving multiple regions of interest (ROIs) in images. User-selected ROIs are maintained at high (original) resolution while peripheral areas are degraded. The presented method is based on the well-known MIP texture mapping algorithm used extensively in computer graphics. Most ROI schemes concentrate on preserving a single foveal region, usually attempting to match the visual acuity of the human visual system (HVS). The multiple ROI scheme presented here offers three variants of peripheral degradation, including linear and nonlinear resolution mapping, as well as a mapping matching HVS acuity. Degradation of image pixels is carried out relative to each ROI. A simple criterion is used to determine screen pixel membership in given image ROIs. Results suggest that the proposed multiple ROI representation scheme may be suitable for gaze-contingent displays as well as for encoding sparse images while optimizing compression and visual fidelity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, an adaptive noise filter implemented in Wavelet Transform (WT) domain is proposed. This filter smoothes noise while preserving edges as much as possible by taking advantage of different characteristics of signal and noise in WT domain. (1) The shape of signal histograms in WT domain approaches a Gaussian distribution, the mean is zero and the variance increases as scale increases. (2) The white noise in spatial domain remains white in WT domain with variance decreasing proportionally to the scale. (3) Uncorrelated signal and noise in spatial domain remain uncorrelated in WT domain. (4) The signal-to-noise ratio (SNR) increases as scale increases in WT domain. Based on these analyses, we derive a simple form of the 2D Minimum Mean Square Error (MMSE) estimate algorithm in WT domain that is applicable for nonstationary image models. All the nonstationary image statistical parameters needed for the filter can be estimated from the noisy image and no a priori information about the original image is required. A comparison demonstrates that the method in WT domain provides better improvement of SNR and better subjective impression than the same method in spatial domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a novel blocking artifacts reduction method which is based on the notion that the blocking artifacts are present in images due to heavy accuracy loss of transform coefficients in the quantization process. We define the block boundary discontinuity measure as the sum of the squared differences of pixel values along the block boundary. The proposed method makes correction of the selected transform coefficients so that the resultant image has minimum block boundary discontinuity. It does not specify a transform domain where the correction should take place, therefore an appropriate transform domain can be selected at user's discretion. In the experiments, the scheme is applied to DCT- based compressed images to show its performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a novel filter algorithm that is more capable in removing impulse noise than some of the common noise removal filters. The philosophy of the new algorithm is based on a pixel identification concept. Rather than processing every pixel in a digital image, this new algorithm intelligently interrogates a subimage region to determine which are the 'corrupted' pixels within the subimage. With this knowledge, only the 'corrupted' pixels are eventually filtered, whereas the 'uncorrupted' pixels are untouched. Extensive testing of the algorithm over a hundred noisy images shows that the new algorithm exhibits three major characteristics. First, its ability in removing impulse noise is better visually and has the smallest mean-square error compared with the median filter, averaging filter and sigma filter. Second, the effect of smoothing is minimal. As a result, edge and line sharpness is retained. Third, the new algorithm is consistently faster than the median filter in all our test cases. In its current form, the new filter algorithm performs well with impulse noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An efficient spectral algorithm for representing any Boolean function as a linear combination of positive Boolean functions (PBF) is proposed. A processor architecture realizing this algorithm with the varying levels of parallelism is suggested. The algorithm finds not only the truth table of PBF's, but also their minimal disjunctive normal form representations. This allows to incorporate previously proposed efficient stack filtering designs into the construction of architectures for threshold Boolean filters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we describe a new method for image size reduction and magnification. The method is based on a concept of circular apertures. In image reduction, a circular region in the original image is mapped to a single pixel in the reduced image. The average intensity of the pixels in the circle is assigned to be the intensity of the resulting pixel. In image magnification, a single pixel in the original image is projected to a circular region in the enlarged image. Weighted averaging is used to determine the intensity values for pixels in the resulting image at places where circles overlap. We have addressed 4 basic reduction/magnification scales in the paper. For each scale we have studied two aperture sizes. Higher scales can be obtained by repeatedly applying the basic ones. We have compared the results produced by the proposed method with the results produced by the commonly used resampling/zero-order hold method and found the proposed method gives results that are superior in both visual comparison and quantitative analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we provide a novel method based on higher order statistic cumulants to identify nonminimum-phase point spread functions and enlarges the possible distribution type of the image formation field. In our method, we specify the blur identification problem as an ARMA model parameter identification problem, but we consider the image model as a realization of a colored signal instead of a normal zero-mean white Gaussian signal, which enlarges the range of image types. For the colored input ARMA model, the contributions of the bicepstrum of ARMA model are only along the axes and 45 degree(s) degree lines, so we extract the linear parts of cumulant of blur image to analysis, in which we use higher-order statistic techniques to estimate the ARMA parameters. The experiments are present in this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a new, efficient 2D object-based coding method for very low bit rate video compression based on affine motion compensation with triangular patches under connectivity constraints. We, then, compare this approach with a 3D object-based method using a flexible wireframe model of a head-and-shoulders scene. The two approaches will be compared in terms of the resulting bitrates, peak-signal-to-noise-ratio (PSNR), visual image quality, and execution time. We show that 2D object-based approaches with affine transformations and triangular mesh models can simulate all capabilities of 3D object-based approaches using wireframe models under the orthographic projection, at a fraction of the computational cost. Moreover, 2D object-based methods provide greater flexibility in modeling arbitrary input scenes in comparison to 3D object-based methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vector subband coding (VSC) has been shown to be a promising technique for very low bit rate (VLBR) video coding. Good performance of VSC is achieved because the vector filter bank used in VSC preserves intra- vector correlation while reducing inter-band and inter-vector correlations. Application of VSC to VLBR video coding with intra-frame coding only has been reported previously. In this paper, we describe an inter-frame VSC scheme of VLBR video coding. To reduce inter-frame redundancy, the most popular technique is motion compensated prediction. This technique is effective in pixel-based coding schemes such as the ones used in various video coding standards. However, it is not very efficient when used in vector-based coding schemes. Therefore, instead of predicting pixel values and coding the prediction difference, the proposed new technique used the vectors in the vector subbands of the previous frame to predict the vectors in the corresponding vector subbands of the current frame. It is shown that such an inter-frame coding scheme promises good performance for VLBR video coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In video coding at high compression rates, e.g., in very low bit rate coding, every transmitted bit carries a significant amount of information that is related either to motion parameters or to intensity residual. As demonstrated in the SIM-3 coding scheme, a more precise motion model leads to improved quality of coded images when compared with the H.261 coding standard. In this paper, we present some of our recent results on the modeling and estimation of motion of the compression and post-processing of typical videophone ('head-and- shoulders') image sequences. We describe a block-based motion estimation that attempts to optimize the overall bit budget for intensity residual, motion and overhead information. We compare simulation results for this scheme with full-search block matching in the context of the H.261 coding. Then, we discuss a region-based motion estimation that exploits segmentation maps obtained from an MDL-based (minimum description length) algorithm. We compare experimentally several algorithms for the compression of such maps. Finally, we describe motion-compensated interpolation that takes into account pixel acceleration. We show experimentally a major performance improvement of the constant- acceleration model over the usual constant-velocity models. This is a very promising technique for post-processing in the receiver to improve reconstruction of frames dropped in the transmitter.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we discuss issues related to analysis and synthesis of facial images using speech information. An approach to speaker independent acoustic-assisted image coding and animation is studied. A perceptually based sliding window encoder is proposed. It utilizes the high rate (or oversampled) acoustic viseme sequence from the audio domain for image domain viseme interpolation and smoothing. The image domain visemes in our approach are dynamically constructed from a set of basic visemes. The look-ahead and look-back moving interpolations in the proposed approach provide an effective way to compensate the mismatch between auditory and visual perceptions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most of the existing video coding algorithms produce highly visible artifacts in the reconstructed images as the bit-rate is lowered. These artifacts are due to the information loss caused by the quantization process. Since these algorithms treat decoding as simply the inverse process of encoding, these artifacts are inevitable. In this paper, we propose an encoder/decoder paradigm in which both the encoder and decoder solve an estimation problem based on the available bitstream and prior knowledge about the source image and video. The proposed technique makes use of a priori information about the original image through a nonstationary Gauss-Markov model. Utilizing this mode, a maximum a posteriori (MAP) estimate is obtained iteratively using mean field annealing. The fidelity to the data is preserved by projecting the image onto a constraint set defined by the quantizer at each iteration. The performance of the proposed algorithm is demonstrated on an H.261-type video codec. It is shown to be effective in improving the reconstructed image quality considerably while reducing the bit-rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Poster Session I: Image Processing Algorithms and Implementations
The successful judgement of color harmony primarily depends on the determined features related to human's pleasure. In this paper, a new color feature of color linguistic distributions (CLD) is proposed upon a designed 1D image scale of 'CHEERFUL-SILENT'. This linguistic feature space is mainly designed by consisting with the color-difference of practical color vision. The CLD is described by a distance-based color linguistic quantization (DCLQ) algorithm, and is capable to indicate the fashion trends in Taiwan. Also, the grade of harmony can be measured based on the similarity of CLDs. Experiment of quantitative color harmony judgement demonstrate that the results based on CIE1976-LUV and CIE1976-LAB color spaces accomplish better consistency with those of questionnaire-based harmony judgement than the hue-dominated method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-foregrounds-backgrounds (MFsBs) images involve histogram- interlaces and space-neighboring foregrounds and backgrounds. In this paper, a concept of 'dummy background' is proposed to represent the 'perceived' background, instead of multibackgrounds. Also, the dummy background corresponds to scaling the morphological distance of foregrounds and backgrounds, which improve the space-association capability of traditional segmentation algorithm. Experimental results demonstrate that more better segmentation is accomplished by less time- consuming.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose here a simple way to synthesize a shift, rotation and limited size correlation filter, making use of the idea of synthetic discriminant functions (SDF). The SDF is synthesized by superimposing four 2nd order circular harmonics of a training reference pattern in 4 different sizes. Computer simulation experiments have shown that the filter is indeed shift, fully rotation and limited size invariant over a size range from 1 to 1.75. The invariant range can be increased if more training patterns are used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an auto-recognition scheme for printed Thai characters. The structure of Thai characters is very different from that of English characters. Thus, we summarize some properties of Thai characters for auto-recognition, and propose an algorithm to segment each character from the printed character image. Our experimental system results in good recognition more than ninety percent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The method for obtaining optimum exposure factors of tube voltage and mAs value (the product of the tube current and the exposure time) is illustrated. For this purpose, attenuation curves of the energy absorbed in emulsion layers and characteristic curves of film, in which the absorbed energy is used as the input instead of exposure, are derived. In addition, the knowledge of the minimum perceptible contrast and the optimum film density is required.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, comparison of the performance of three types of correlation filters: linear high-pass circular harmonic filter (LHPCHF), ideal high-pass circular harmonic filter (IHPCHF), and wavelet transform circular harmonic filter (WTCHF), are presented. These filters combining 2D symmetrical edge detection filter and circular harmonic filter (CHF) are used to perform shift- and rotational-invariant optical pattern recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A fuzzy reasoning processor for the autofocusing operation of a camera has been developed. The automatic focus is performed by evaluating the object distance and luminance. The object distance is measured by the beams of infrared light. The adequate contrast of object contours is evaluated by the image luminance. The proposed fuzzy reasoning processor can efficiently determine a good focusing point and its computation power can reach 3.75 million fuzzy logic inferences per second at a system clock of 30 MHz.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we use bit-serial 'local sorting' to perform the threshold Boolean filtering, based on running processing without ordering the input data. The proposed architecture is simple and suitable for realization. It is shown, that introduced homogeneous generalized threshold Boolean filters can be represented as a threshold Boolean filter on the appended input signal window, and can be computed in the same architectures. Also homogeneous generalized threshold Boolean filters are represented as linear combinations of homogeneous generalized stack filters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a VLSI implementation of the lossless block-based predictive Rice codec (BPRC). The BPRC uses an adaptive predictive coding algorithm to remove the redundancy in the image, codes the residue using an entropy coder. This algorithm can adapt well to local images statistics. The codec chip will encode 4 to 16-bit pixels at 10 Mpixels/sec input, and decode at 10 Mpixels/sec output. For images of normal size it requires little supports circuitry, only input data formatting and output data defomatting. Large images can be supported with external FIFOs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To improve the reconstructed image quality with a given number of sampling points, nonuniform sampling is desired which adapts the sampling density according to the local bandwidth of the signal. Determination of optimal sampling positions and interpolation from nonuniform samples through the use of a coordinate mapping which converts nonuniform samples into points on a regular sampling lattice. We then introduce a nonuniform sampling scheme which embeds the samples in a generally deformed mesh structure that can be easily mapped to a regular sampling lattice. The optimal samples or the mesh is generated by minimizing the interpolation error. The numerical difficulty associated with dealing with nonuniform samples are circumvented by mapping all the operations to the master domain where the samples are uniformly distributed. With this scheme, in order to maintain the mesh topology, unnecessary nodes are usually allocated in large but smooth regions. For an improved sampling efficiency, a hierarchial nonuniform sampling scheme is also developed. Which embeds the samples in a generalized quadtree structure. Compared to its nonhierarchical counterpart, this scheme can reduce the numbers of samples significantly, under the same visual quality constraint.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image processing operations are computational intensive and are usually solved by parallel algorithms implemented either on dedicated parallel computing devices or a network of workstations. With the increasing computing power of Personal Computer (PC) and better networking facilities, it is now feasible to perform distributive computing on a PC network. As the cost/performance ratio of PC is high, this provides a cost effective solution for solving image processing problems. In this paper, we describe how to implement distributive image processing algorithms on a peer-to-peer PC network under the WindowsNTTM operating system and the efficiency of such algorithms will be discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In many coastal oceans of the world, the flora and fauna are under stress. In some areas, seagrasses, coral reefs, fish stocks, and marine mammals are disappearing at a rate great enough to capture the attention of, and in some cases, provoke action by local, national, and international governing bodies. The governmental concern and consequent action is most generally rooted in the economic consequences of the collapse of coastal ecosystems. In the United States, for example, some experts believe that the rapid decline of coral reef communities within coastal waters is irreversible. If correct, the economic impact on the local fisheries and tourism industries would be significant. Most scientists and government policy makers agree that remedial action is in order. The ability to make effective management decisions is hampered, however, by the convolution of the potential causes of the decline and by the lack of historical or even contemporary data quantifying the standing stock of the natural resource of concern. Without resource assessment, neither policy decisions intended to respond to ecological crises nor those intended to provide long-term management of coastal resources can be prudently made. This contribution presents a methodology designed to assess the standing stock of immobile coastal resources (eg. seagrasses and corals) at high spatial resolution utilizing a suite of optical instrumentation operating from unmanned underwater vehicles (UUVs) which exploits the multi-spectral albedo and fluorescence signatures of the flora and fauna.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A very common task in image processing is the segmentation of the image in areas that are uniform in the sense of its features. Various applications can benefit even from partial segmentation, which is performed without need of physical of semantic knowledge. Several segmentation methods exist, but none that is applicable to all tasks. We use color and perceptual texture information to segment color images. Perceptual texture features are the features that can be qualified in simple descriptions by humans. Color information is represented in a perceptual way, using hue, value and saturation. These features' values are represented by histograms that integrates texture information around a small area. Segmentation and classification are obtained by comparing histograms of classes with the histograms of the area around the pixel being classified. We build a small application that uses remote sensing imagery and allows a user to interactively segment a Landsat TM-5 image using color and texture information. The steps and intermediate results of the classification are shown. The results are visually good, and the segmentation using color and texture information is more coherent than the using only color.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a matching algorithm for radical-based on-line Chinese character recognition. The major effort of this paper is to demonstrate recognition procedures for subcharacters, such as radicals and residual subcharacters, and nonradical characters. Since a Chinese character could have front radical, rear radical or none of them, the matching algorithm should be able to take care of all these conditions. Furthermore, instead of picking up the front/rear radical strokes from the input character before the matching process is taken, our matching algorithm determines how many strokes the front/rear radical should be during the matching process; it thus enjoys the property of flexibility. After the radical type and the number of strokes of the radical are figured out, the residual subcharacter can be picked up and submitted for matching again. By sequentially recognizing the types of front/rear radicals and the types of residual subcharacters, we can determine what the input characters are.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many digital signal processing and image coding systems implement the linear predictor with rounding. Usually, people will obtain the linear predictors by solving the Yule-Walker equations or doing something equivalent. The predictors obtained in this way will not necessarily be the true minimum mean square error predictor considering the effect of rounding. In this paper, we address the issue of finding the true optimum mean square error rounded linear predictor. Experiment results show that when the prediction results are rounded, this true MMSE linear predictor could outperform the conventional one without considering the effect of rounding very significantly for data of low prediction errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Future multimedia applications involving images and video will require technologies enabling users to manipulate image and video data as flexibly as traditional text and numerical data. However, vast amounts of image and video data mandate the use of image compression, which makes direct manipulation and editing of image data difficult. To explore the maximum synergistic relationships between image manipulation and compression, we extend our prior study of transform-domain image manipulation techniques to more complicated image operations such as rotation, shearing, and line-wise special effects. We propose to extract the individual image rows (columns) first and then apply the previously proposed transform-domain filtering and scaling techniques. The transform-domain rotation and line-wise operations can be accomplished by calculating the summation of products of nonzero transform coefficients and some precalculated special matrices. The overall computational complexity depends on the compression rate of the input images. For highly-compressed images, the transform-domain technique provides great potential for improving the computation speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In computer vision, it is one of the important subjects to estimate 3D shapes of objects from a 2D view. As a means of solving this problem, there is a method of using knowledge about objects for understanding a 2D view. This article presents a method to acquire shape data of objects from industrial sketches using knowledge about objects and to construct 3D models. Knowledge including uncertainty is used in the image processing. The processes are performed using the concept of likelihoods since the discussion using ambiguous knowledge are also ambiguous. Three-dimensional data of objects are acquired from industrial sketches, and then object models are constructed using bicubic Bezier surfaces.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The transmission capacity of discrete multitone (DMT) modulation system for Taiwan's subscriber loops is evaluated in this study. Based on the characteristics of Taiwan's local loops, the transmission capacity is estimated to be 1.544 Mb/s and 6 Mb/s in Taiwan. Simulation results also show how many percents of users in Taiwan may have 1.544 Mb/s or 6 Mb/s of asymmetric digital subscriber lines (ADSL) services. Self far-end crosstalk (FEXT) and additive white Gaussian noise (AWGN) are considered to be the dominant noise sources in the work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new coding scheme for partially computer-rendered image sequences will be presented. It is specifically suited for heterogeneous data sets containing symbolic and pixel-based image descriptions, which are used by an electronic set system at the receiver site for the synthesis and mixture of transmitted image sequences. The different types of data sets and their particular properties regarding data compression are explained. Finally, results are given comparing the new coding scheme with traditional MPEG2 coding based on typical test sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we are going to introduce a pyramid architecture, which we are currently constructing for computer vision applications. Some of the architectural features of the system is its linear array interconnections, reconfigurable architecture and its design without top-down control mechanism. The system is targeted to use in real time image processing applications, so that high performance processors are used during construction. It has three processing layers for three different stages of image processing. Each layer has direct access to image source. Architectural properties of the system, its control mechanism and expected performance, are outlined in the following sections.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new systolic architecture to realize the encoder of the full-search vector quantization (VQ) for high-speed applications. The architecture possesses the features of regularity and modularity, and is thus very suitable for VLSI implementation. For a codebook of size N and dimension k, the VQ encoder has area complexity of O(N), time complexity of O(k), and I/O bandwidth of O(k). It reaches a compromise between hardware cost and speed requirement as compared to existing systolic/regular VQ encoders. At the current state of VLSI technology, the proposed system can easily be realized in a single chip for most practical applications. In addition, it provides flexibility in changing the codebook contents and extending the codebook size, where the latter is achieved simply by cascading some identical basic chips. With 0.8 micrometers CMOS technology to implement the proposed VQ encoder for N equals 256 and k equals 16, the die size required is about 5 X 8.5 mm2 and the processing speed is up to 100 M samples per second. These features show that the proposed architecture is attractive for use in high-speed image/video applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A major problem with a VQ based image compression scheme is its codebook search complexity. Recently a Predictive Residual Vector Quantizer (PRVQ) was proposed in Ref. 8. This scheme has a very low search complexity and its performance is very close to that of the Predictive Vector Quantizer (PVQ). This paper presents a new VQ scheme called Variable-Rate PRVQ (VR-PRVQ) which is designed by imposing a constraint on the output entropy of the PRVQ. The proposed VR-PRVQ is found to give an excellent rate-distortion performance and clearly outperforms the state-of-the-art image compression algorithm developed by Joint Photographic Experts Group (JPEG).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new FSVQ scheme called Finite-State Residual Vector Quantization (FSRVQ) in which each state uses a Residual Vector Quantizer (RVQ) to encode the input vector. Furthermore, a novel tree- structured competitive neural network is proposed to jointly design the next-state and the state-RVQ codebooks for the proposed FSRVQ. Joint optimization of the next-state function and the state-RVQ codebooks eliminates a large number of redundant states in the conventional FSVQ design; consequently, the memory requirements are substantially reduced in the proposed FSRVQ scheme. The proposed FSRVQ can be designed for high bit rates due to its very low memory requirements and low search complexity of the state-RVQs. Simulation results show that the proposed FSRVQ scheme outperforms the conventional FSVQ schemes both in terms of memory requirements and perceptual quality of the reconstructed image. The proposed FSRVQ scheme also outperforms JPEG (current standard for still image compression) at low bit rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A family of multirate representation of a given signal is defined for data compression. This family of multirate signals is constructed by polynomial interpolation of these direct decimated versions of a given signal. Interpolated signal from decimated signal is used to predict the higher resolution signal. The prediction error is the difference between the interpolated signal from lower resolution and the higher resolution one. This kind of signal representation can be called as interpolation compensated signal prediction. A multiresolution interpolative DPCM is then proposed to represent the prediction errors with a hierarchial multirate structure. This structure possesses the advantages of both the pyramid structure and the DPCM structure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The exhaustive search process leads to a computational burden and therefore increases the complexity in the fractal image coding system. This is the main drawback to employ fractals for practical image compression applications. In this paper, an image compression scheme based on the fractal block coding and the simplified finite-state algorithm is proposed. For the finite-state algorithm that has been successfully employed in the vector quantization (VQ) technique, the state codebook (equivalent to the domain pool in the fractal image coding) is determined by a specific next-state function. In this research, we use the position of the range block to decide its domain pool. Therefore, a confined domain pool is limited in the neighboring region of the range block and thus the search process is simplified and faster. During the computer simulations, we consider two partition types, the single-level (8 X 8 blocks) and two-level (8 X 8 and 4 X 4 blocks) conditions. The simulation results show that the proposed scheme greatly reduces the computational complexity and improves the system performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wavelet transform has recently been attracting notable attention in its applicability for a variety of signal processing or image coding1'2'3, since it is expected that wavelet provides a unified interpretation to transform coding, hierarchical coding, and subband coding, all of which have ever been studied separately. It is also expected that wavelet transform is shown to be more advantageous than other image coding schemes because the wavelet coefficients represent the features of an image localized both in spatial and frequency domains4'5. In case of transform coding or subband coding, the efficiency is generally maximized by designing bit allocation to each decomposed band signal proportional to the relative importance of information in it. This technique is known as the optimum bit allocation algorithm (OBA). However, OBA should not directly be applied to the wavelet coding, because OBA does not well exploit the spatial local information represented on each wavelet coefficient. The purpose of this work is to develop a quantization scheme which maintains significant spatial information locally represented on wavelet coefficients. Preserving only the selected coefficients which represent visually significant features and discarding the others, is expected to keep high image quality since the significant features will be kept even at a low bit rate. In this respect, we propose two kinds of image data compression techniques employing a selective preservation of wavelet coefficients. Section 2 gives a brief description of wavelet transform, which includes construction of wavelet basis functions, feature extraction with wavelet, and multi-resolution property of wavelet. In Section 3, the first technique is proposed where resolution dependent thresholding is introduced to classify wavelet coefficients into significant or insignificant ones. In Section 4, the second technique is proposed where better performance can be achieved by further classifying significant coefficients with a multiresolution property of wavelet. Finally, summary and conclusions are provided in Section 5.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose an image coding method, which basically consist on the application of Vector Quantization and appropriate post- processing in order to minimize the mosaic effect. We have to store a monographic image data base. We use a Hierarchical Multirate Vector Quantization to codify the data base in a low bit rate and tolerable SNR. HMVQ is a suitable algorithm in order to separate different contrast blocks in each image. This division provides as a reconstruction selective filtering to minimize the mosaic effect. Each block will be filtered according to its characteristics. The gap between contiguous blocks is minimized with low contrast subimages. At the same time, edge blocks will maintain image details.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The goal of this paper is to describe a new fully-reversible image transform specifically designed for an efficient (pseudo-critical) coding while preserving a psychovisual Fourier domain description. There is now strong evidence for the presence of directional and angular sensitivity in the cells of the human visual cortex and the representation proposed here has for main objective to respect this human like filter bank. The decomposition is performed using a discrete Radon transform for the angular patches and by splitting each projection with a 1D spline wavelet for the radial part. Consequently, the whole algorithm is performed in the spatial domain. Finally, we show that the transform is both well-suited for psychovisual quantization and channel adapted coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we have investigated the compression behavior of each processing step of the JPEG baseline image coder. The two main objectives of this research are to provide a better understanding of the JPEG system and to provide a guideline for improving the performance of any JPEG-like image coders. For performance evaluation, we have chosen the estimated entropy as a means for performance measure. The key results of this paper are: (1) Generally, the psychovisually-weighted plays a dominant role of the overall system performance. (2) The compression gain provided by the entropy coding procedure is also significant. Since there is a gap between the estimated entropies and the actual coding rates, a more efficient entropy coding procedure which reflects the signal statistics should improve the system performance. (3) The common concept of the optimal transform is variance-based, which requires a zonal selection of the transform coefficients. Since the JPEG adopts the thresholding quantization, the ordinary discussion of an optimal transform is not appropriate. A truly optimal transform should take the transform and its subsequent operations into account. In consequence, to improve the overall system performance it would be effective to focus on the quantization and entropy coding procedures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fractal image compression is a relatively new and very promising technique for still image compression. However, it is not widely applied due to its very time consuming encoding procedure. In this research, we focus on speeding up this procedure by introducing three schemes: dimensionality reduction, energy-based classification, and tree search. We have developed an algorithm that combines these three schemes together and achieves a speed-up factor of 175 at the expense of only 0.6 dB degradation in PSNR relative to the unmodified exhaustive search for a typical image encoded with 0.44 bpp.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper reports on a comprehensive subjective evaluation of different waveform coding algorithms for monochrome still pictures. In order to obtain reliable and relevant results about the coding efficiency in the sense of a rate-distortion criterion, i.e. mean opinion scores (MOS) of observers vs. bit rate, the coding algorithms are optimized as far as possible by maintaining the subjective quality of coded pictures. Test pictures with various bit rates are generated from several source pictures. The psychophysical picture quality experiments are carried out for the whole set of test pictures. Based on the experimental results, different coding algorithms are quantitatively compared with each other. The coding methods investigated include the following stand-alone methods: DPCM, vector quantization (VQ), discrete cosine transform (DCT) coding, recursive block coding (RBC), subband coding (SBC), wavelet transform coding (WTC), Laplacian pyramid coding, Cortex transform coding, and combined methods: ISO standard JPEG, wavelet transform with run-length coding and variable length coding, DCT with pyramid vector quantization (PVQ) and subband transform with PVQ.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The removal of perceptual redundancy from image signals has been considered as a promising approach to maintain high image quality at low bit rates, and has recently become an important area of research. In this paper, a perceptually tuned discrete cosine transform (DCT) coder of gray-scale images is presented, where a just-noticeable distortion (JND) profile is measured as the perceptual redundancy inherent in an image. The JND profile provides each signal being coded with a visibility threshold of distortion, below which reconstruction errors are rendered imperceptible. By exploiting basic characteristics of human visual perception, the JND profile is derived from analyzing local properties of image signals. According to the sensitivity of human visual perception to spatial frequency, a distortion allocation algorithm is applied to each block for screening out perceptually unimportant coefficients (PUC's) and, simultaneously, determining quantizer stepsizes of perceptually important coefficients (PIC's). Simulation results show that high visual quality can be obtained at low bit rates, and, for a given bit rate, the visual quality of the images compressed by the proposed coder is more acceptable than those compressed by ISO-JPEG coder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vector Quantization has been applied to low-bit-rate speech and image compression. One of the most serious problems for vector quantization is the high computational complexity of searching for the closest codeword in the codebook design and encoding processes. To overcome this problem, a fast algorithm, under the assumption that the distortion is measured by the squared Euclidean distance, will be proposed to search for the closest codeword to an input vector. Using the means and variances of codewords, the algorithm can reject many codewords that are impossible to be candidates for the closest codeword to the input vector and hence save a great deal of computation time. Experimental results confirm the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To transmit facsimile images through a very low bit-rate channel such as the half-rate mobile channel, a very efficient coding scheme for data compression is required. Lossy coding is expected to perform more data reduction than that achieved by the conventional lossless coding schemes. This paper discusses approximate representation of scanned character patterns for data reduction. First, the quality of character patterns is considered in terms of the size of patterns. According to this consideration, the attributes of scanned character patterns and the quality associated with them are assumed. For preserving quality under approximation, a character pattern is described by a set of strokes in tree data structure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new image compression algorithm based on an adaptive vector quantization is presented. A novel efficient on-line codebook refining mechanism, called 'Gold-Washing' (GW) mechanism, including the GW algorithm which works on a dynamic codebook, called the GW codebook, is presented and implemented. This mechanism is universal so that it is suitable for any type of input data sources and is adaptive so that no source statistics transmission is needed. The asymptotic optimality of GW mechanism has been proven for not only memoryless (i.i.d.) sources but also stationary, ergodic sources. The efficiency and time complexity of the GW mechanism are analyzed. Based on this mechanism, an efficient hybrid adaptive vector quantizer which incorporates with other coding techniques such as a basic VQ with a large auxiliary codebook, called universal-mother (UM) codebook, as a new codeword generator, quadtree- based hierarchial decomposition, and classification is designed for image coding applications. From the experimental results, the performance of out image compression algorithm is competitive to and even better than those of JPEG and other coding algorithms, especially in low bit rate applications. The coded results with but rate of 0.120- 0.150 bits per pixel and acceptable image quality can be achieved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a new ADPCM method for image coding called directional ADPCM which can remove more redundancy from the image signals than the conventional ADPCM. The conventional ADPCM calculates the two-dimensional prediction coefficients by using the correlation functions followed by solving the Yule-Walker equation. Actually, the quantities of correlation functions to be the approximation of the correlation function. However, the block size is limited by the error accumulation effect during packet transmission. Using small block may induce the unregulated prediction coefficients. Therefore, we need to develop the directional ADPCM system to overcome such a problem and to have better prediction result. Our directional ADPCM utilized the fan- shape filters to obtain the energy distribution in four directions and then determines the four directional prediction coefficient. All the fan-shape filters are designed by using the singular value decomposition (SVD) method, the two-dimensional Hilbert transform technique, and the frequency weighting concept. In the experiments, we illustrate that the M.S.E. for the directional ADPCM is less than that of the conventional ADPCM.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Among the image coding techniques, vector quantization (VQ) has been considered to be an effective method for coding images at low bit rate. Side-match finite-state vector quantizer (SMVQ) exploits the correlations between the neighboring blocks (vectors) to avoid large gray level transition across block boundaries. In this paper, an improved SMVQ technique named two-pass side-match finite-state vector quantization (TPSMVQ) has been proposed. In TPSMVQ, the size of state codebook in the first pass is decided by the variances of neighboring blocks. In the second pass, we will improve the blocks encoded in the first pass whose variances are greater than a threshold. Moreover, not only the left and upper blocks but also the down and right blocks are used for constructing the state codebook. In our experiment results, the improvement of second pass is up to 1.5 dB in PSNR over the fist pass. In comparison to ordinary SMVQ, the improvement is upt to 1.54 dB at nearly the same bit rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An improved minimum distortion encoding method called predictive mean search (PMS) algorithm is proposed for image vector quantization in this paper. With the proposed method, the minimum distortion codeword can be found by searching only a portion of the codebook and its relative address to the origin of search is sent instead of absolute address. Two schemes are proposed to transmit the relative address of the minimum distortion codeword. Simulation results show that a significant reduction in both computations and bit-rates is achieved by using the proposed methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, reversible subband coding of images are proposed. The high-band signals of conventional one-dimensional reversible filter banks were extrapolative prediction error signals or interpolative ones of which number of levels became four times that of the input signals. Therefore, we design reversible filter banks that generate interpolative prediction error signals of which number of levels becomes twice that of the input signals. We also design nonseparable filter banks. It is shown that the proposed methods have the better performance than the conventional reversible subband coding and that the separable method with low-pass filter has little aliasing in the reduced images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A full-fledged Visual Pattern Image Coding system is developed. For high compression ratio, Uniform Patterns are merged by Quadtree merging. For high visual quality, a new set of 2 X 2 Edge Patterns are designed for near perfect reconstruction. By a classification scheme of 11 groups or a threshold of gradient magnitude, the performance profile can adapt to a wide variety of applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new 2D split and merge algorithm (2DSM) for image coding is devised. An image is modelled as a 2.5-dimensional surface and approximated by a surface formed by triangular patches. The algorithm iteratively improves the approximated image by splitting the merging of the triangles in order to drive the error under a specified bound. In addition, a new optimal triangulation for image data approximation is proposed. The algorithm is successfully applied for coding monochrome images using Interpolative Vector Quantization (IVQ) technique. Simulation results show that the proposed method can achieve 2.8 dB improvement on the approximated image and 0.68 dB improvement on the decoded image at a bit rate lower than the current schemes. Besides, excellent reconstruction visual quality is observed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a method of coding image sequences based on global/local motion information. The suggested method initially estimates global motion parameters and local motion vectors. Then segmentation is performed with a hierarchial clustering scheme and a quadtree algorithm in order to divide a processing image into the background and target region. Finally image coding is done by assigning more bits to the target region and less bits to background so that the target region may be reconstructed with high quality. Simulations show that the suggested algorithm performs well especially in the circumstances where background changes and target region is small enough compared with that of background.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present two algorithms for intra-and inter-frame image coding via vector quantization (VQ): (1) The intraframe image coding using Wavelet VQ (WVQ): in this case, we carried out several experimental results on the base of a combination between the Wavelet pyramid coding and VQ. (2) The interframe image coding: Here two modifications are proposed; in the first, we identify the moving vectors in each frame using a block matching technique. For each moving vector, a prediction is estimated by searching for the direction of minimum distortion in the previous reconstructed frame. Then the differential vectors are quantized via a simple modified VQ (thresholded VQ) to achieve high compression. The second modification uses moving mask technique to determine the moving object over several successive frames of an image sequence. Then, two codebooks were developed: one for the background vectors (non-moving vectors) and the other for the moving object vectors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents an overview on architectures for VLSI implementations of video compression schemes as specified by standardization committees of the ITU and ISO, focussing on programmable architectures. Programmable video signal processors are classified and specified as homogeneous and heterogeneous processor architectures. Architectures are presented for reported design examples for the literature. Heterogenous processors outperform homogeneous processors because of adaptation to the requirements of special subtasks by dedicated modules. The majority of heterogenous processors incorporate dedicated modules for high performance subtasks of high regularity as DCT and block matching. By normalization to a fictive 1.0 micron CMOS process typical linear relationships between silicon area and through-put rate have been determined for the different architectural styles. This relationship indicated a figure of merit for silicon efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new algorithm is developed for reducing the bit rate required for motion vectors. This algorithm is a generalization of block matching motion estimation in which the search region is represented as a codebook of motion vectors. The new algorithm, called macro motion vector quantization (MMVQ), generalized our earlier MVQ by coding a group of motion vectors. The codebook is a set of macro motion vectors which represent the block locations of the small neighboring blocks in the previous frame. We develop an interative design algorithm for the codebook. Our experiments show that the variances of displaced frame differences (DFDs) are reduced significantly compared to block matching algorithm (BMA) with the macroblock size.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion estimation is a key issue in video coding. In very low bitrate applications, the amount of the side information for the motion field presents an important portion of the total bitrate. This paper presents a joint motion estimation, segmentation and coding technique, which tries to reduce the segmentation and motion side information, while providing a similar or smaller prediction error when compared to more classical motion estimation techniques. The main application in mind is a region based coding approach in which the consecutive frames of the video are divided into regions having similar motion vectors with simple shapes, easy to encode.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a reliability metric for motion vectors, and applies it to the block matching method using hierarchical images to reduce estimation errors. First, the proposed reliability metric for motion vectors is derived and its properties are discussed. In order to evaluate the usefulness of the proposed reliability metric, a calculation example for an actual image is shown. Then, the derived reliability metric is applied to the block matching method using hierarchical images as a weighting function. Finally, several experimental results by the proposed estimation method are shown for a verification of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion vector (MV) estimation plays an important role in motion compensated video coding. In this research, we first examine a stochastic MV model which enables us to exploit the strong correlation of MVs in both spatial and temporal domains in a given image sequence. Then, a new fast stochastic block matching algorithm (SBMA) is proposed. The basic idea is to select a set of good MV candidates and choose from them the one which satisfies a certain spatio-temporal correlation rule. The proposed algorithm reduces matching operations to about 2% of that of the full block matching algorithm (FBMA) with only 2% increase of the sum of absolute difference (SAD) in motion compensated residuals. The excellent performance of the new algorithm is supported by extensive experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, a variable block size (VBS) motion estimation technique has been employed to improve the performance of the motion compensated transform coding (MCTC). This technique allows larger blocks to be used when smaller blocks provide little gain, saving the bit rates, especially for areas containing more complex motion. However, the employment of the VBS motion estimation technique addresses a new optimization issue for the motion compensated coding (MCC), since an increased bit rate should be allocated to the VBS motion vectors. That is, the rate allocation between the motion vector encoding and the displaced frame difference (DFD) coding is an important issue. Hence, in this paper, a rate-distortion (R-D) optimization between the hierarchical VBS motion estimation and DFD coding is described. First, to make the R-D search feasible, the hierarchical VBS motion structures age grouped into two-level model structures and an efficient R-D search method is proposed. Next, a solution for the control of the VBS motion information, based on Lagrange multiplier method, is introduced. Intensive computer simulation employing the MCTC technique shows that an overall improvement up to 1.0 dB, compared to the fixed block size motion estimation, is obtained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A method for generating a sequence of synthetic images from two different image sequences is described. The method is composed of two major processes, i.e., an object tracking and an image synthesizing. The process of the object tracking consists of two phases. The first phase is to determine an initial contour which roughly approximates the shape of the object, and the second phase is to extract an accurate contour from the initial one by using an active contour model. An initial contour of an object in the first frame is specified manually, and those in the following frames are determined by referring the extracted contour in the previous frame. The contour extracted in the current frame is deformed by detecting nonrigid object movements and the resultant shape is used as the initial contour of the active contour model in the next frame. The motion parameters representing camera motions are also estimated by calculating optical flows between two successive frames.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the problem of optimal positioning of a contour separating two moving regions using snake concepts. After a brief recall of classical snake methodology, an alternative approach is proposed, based on a reconstruction criterion for the regions delimited by the curve, and the use of parametric modeling of both the region textures and boundaries. A generic adaptive step gradient algorithm is formulated for solving the curve evolution problem, independently of the models used. The method is then more specifically applied to motion boundary localization, where the texture of mobile regions is reconstructed by motion compensation, using polynomial motion models. The generic optimization algorithm is applied to motion frontiers defined by B- spline curves. Detailed implementation of this method in this particular case is described, and considerations about its behavior are given. Some experimental results are finally reported, attesting the interest of the proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this investigation, motion analysis is carried out on image sequence using region-based feature matching. A Two-Pass search algorithm is devised for motion estimation. Each image is partitioned into a number of blocks. The movement of each block is determined by looking for the corresponding block in the previous frame of the image sequence. In the first pass, fractal dimension is estimated in the neighborhoods of each block. The coarse position of the corresponding block in the previous frame is identified based on the similarity of that parameter. This position is then regarded as the center of the search space in the second pass, which employs grey level Exhaustive search to determine the fine position of the block. The algorithm has been tested on a sequence of 16 images. The performance of the Two-Pass search is found to be much better than the grey level Three-Step search and comparable to the grey level Exhaustive search.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The structure extraction task is analyzed. The co-occurrence matrices (CMs) are the popular basis for this goal. We show that binary preparation of arbitrary texture preserves its structure. This transformation decreases the computation time of analysis and the required memory in dozens times. A number of features for detecting displacement vectors on binarized images are compared. We suggest to use CM elements jointly as the united feature for this goal. We have shown that it is a stable detector for noisy images and simpler than well- known (chi) 2 and (kappa) statistics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we consider the projected rotation group which consists of projection and rotations in 3D and give some invariant feature extractors. Based on the theory of Lie algebra, the representation of the projected rotation group is obtained. Therein it is shown that the basis of the representation can be an orthonormal basis. With this orthonormal basis, we can construct quasi moment which is a kind of a weighted moment. It is also shown that the quasi moments which are closed in their orders under the projected rotation group. By computer simulations, some experiments for 3D motion analysis with the quasi moments are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gabor filters are of particular interest to the computer vision community because the profiles of the two dimensional Gabor functions have been shown to closely approximate the receptive field profiles of particular simple cells in the visual cortex of certain mammals. However, only a few values for the parameters of the Gabor function can generate a large number of filters that makes practical implementation impossible. Moreover, the process of adjusting the parameters of these functions to obtain the 'best' set is not straightforward. In this paper we describe a new reliable and systematic method for setting up Gabor filters for texture classification. Texture is an intrinsic property of images and is thus an important feature for computer vision. Gabor filters are used to extract the features from local neighborhoods of the texture images and have been tuned for the classification of initially; naturally occurring textures from Brodatz's album and then different grades of Ceramic filters used in molten metal filtration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, an approach for hierarchical representation of graphic image objects in vector form is proposed, which is based on a decomposition of the object skeleton into a number of components, consisting of concatenations of branches and loops. To build this representation, the image vectorization method is proposed which is based on an object skeletonization and its iterative tracing. At each iteration, the skeleton components are extracted and are concatenated with already extracted components. Concatenations are built by taking into account the spatial relevance of the regions represented by the various skeleton subsets. The data structure is built which allows to store in compact way information about object components. The method can be applied to recognize graphic images, particularly cartographic, engineering drawing, and flow-chart images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a generalized zero crossing (GZC) theorem is proposed. The GZC theorem has much less constraints on filters so that the design of filters can be flexible. Then, it is shown that ramp models can be effectively approximated by step models. Based on the GZC theorem, a difference-of-exponential (DoE) operator is proposed. It is shown both theoretically and experimentally that the new operator is computationally efficient, and its edge detection performance is higher than that of the Laplacian-of-Gaussian (LOG) operator.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an efficient technique of shape feature extraction based on the application of mathematical morphology theory. A new shape complexity index for preclassification of machine printed Chinese Character Recognition (CCR) is also proposed. For characters represented in different fonts/sizes or in a low resolution environment, a more stable local feature such as shape structure is preferred for character recognition. Morphological valley extraction filters are applied to extract the protrusive strokes from four sides of an input Chinese character. The number of extracted local strokes reflects the shape complexity of each side. These shape features of characters are encoded as corresponding shape complexity indices. Based on the shape complexity index, data base is able to be classified into 16 groups prior to recognition procedures. The performance of associating with shape feature analysis reclaims several characters from misrecognized character sets and results in an average of 3.3% improvement of recognition rate from an existing recognition system. In addition to enhance the recognition performance, the extracted stroke information can be further analyzed and classified its own stroke type. Therefore, the combination of extracted strokes from each side provides a means for data base clustering based on radical or subword components. It is one of the best solutions for recognizing high complexity characters such as Chinese characters which are divided into more than 200 different categories and consist more than 13,000 characters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a newly-developed digital color image processor (CIP) for the video camera. The CIP accepts digitized image signals from a one-chip color CCD sensor, performs luminance and chrominance signal processing, and outputs NTSC Y/C and digital CCIR601 signals. In addition, the algorithmly-developed auto-focus mechanism that will be integrated with CIP is described.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a two step self-organizing method for a transformation that can elastically map one subject's MR image, called the input image, to a standard reference MR image. Linear scaling and transformation are first introduced to grossly match the input image to the reference image. Then the input image is linearly scaled and divided into several smaller cubes of equal volume. A local correspondence is used to estimate the best matching position by moving individual cubes of the input image to the reference image within a search neighborhood. Based on local correspondence, coarse displacement vectors for each cube are determined by the position difference between the original and the new cube centers. The estimated vectors provide a complete transformation that matches the entire input image to the reference image. As the process is repeated, a better transformation is obtained that improves the matching. This algorithm has been tested on simulations of 3D deformed images and been found to be successful for 3D inter-subject registration of MR images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The assessment of bone age is an important field to the pediatric radiology. It provides very important information for treatment and prediction of skeletal growth in a developing child. So far, various computerized algorithms for automatically assessing the skeletal growth have been reported. Most of these methods made attempt to analyze the phalangeal growth. The most fundamental step in these automatic measurement methods is the image segmentation that extracts bones from soft-tissue and background. These automatic segmentation methods of hand radiographs can roughly be categorized into two main approaches that are edge and region based methods. This paper presents a region-based carpal-bone segmentation approach. It is organized into four stages: contrast enhancement, moment-preserving thresholding, morphological processing, and region-growing labeling.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we develop an algorithm for obtaining the maximum a posteriori (MAP) estimate of the displacement vector field (DVF) from two consecutive image frames of an image sequence acquired under quantum-limited conditions. The estimation of the DVF has applications in temporal filtering, object tracking and frame registration in low- light level image sequences as well as low-dose clinical x-ray image sequences. The quantum-limited effect is modeled as an undesirable, Poisson-distributed, signal-dependent noise artifact. The specification of priors for the DVF allows a smoothness constraint for the vector field. In addition, discontinuities and areas corresponding to occlusions which are present in the field are taken into account through the introduction of both a line process and an occlusion process for neighboring vectors. A Bayesian formulation is used in this paper to estimate the DVF and a block component algorithm is employed in obtaining a solution. Several experiments involving a phantom sequence show the effectiveness of this estimator in obtaining the DVF under severe quantum noise conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lossless compression techniques are essential in archival and communication of medical images. However, there has been limited recent progress in lossless image coding. Algorithms available are either too complicated for fast implementation or suitable only for certain specific types of images. In this paper, a new Segmentationbased Lossless Image Coding (SLIC) scheme is proposed, which is based on a simple but efficient region growing procedure. This embedded procedure produces an adaptive scanning pattern for an image with the help of a very-few-bits-needed discontinuity index map. Along with this scanning pattern, the high correlation among image pixels is exploited by the method, and thereby an error image data part with a very small dynamic range is generated. Both the error image data and the discontinuity index map data parts are then JBIG encoded. In comparison with direct coding by JBIG, JPEG, adaptive Lempel-Ziv, and 2D Burg prediction pius Huffman error coding methods, the SLIC method performed better by at least 7% with ten 8-bit and 10-bit high-resolution digitized medical images.
Keywords: segmentation, region growing, lossless compression, image coding, medical image
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present in this paper a study of medical image compression based on an adaptive quantization scheme capable of preserving clinically useful structures appeared in the given images. We believe that how accurate can a compression algorithm preserve these structures is a good measure of image quality after compression since many image-based diagnoses are based on the position and appearance of certain structures. With wavelet decomposition, we are able to investigate the image features at different scale levels that correspond to certain characteristics of biomedical structures contained in the medical images. An adaptive quantization algorithm based on clustering with spatial constraints is then applied to the high frequency subbands. The adaptive quantization enables us to selectively preserve the image features at various scales so that desired details of clinically useful structure are preserved during the process of compression, even at a low bit rate. Preliminary results based on real medical images suggest that this clustering-based adaptive quantization, combined with wavelet decomposition, is very promising for medical image compression with structure-preserving capability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Future large visual information systems (such as image databases and video servers) require effective and efficient methods for indexing, accessing, and manipulating images based on visual content. This paper focuses on automatic extraction of low-level visual features such as texture, color, and shape. Continuing our prior work in compressed video manipulation, we also propose to explore the possibility of deriving visual features directly from the compressed domain, such as the DCT and wavelet transform domain. By stressing at the low-level features, we hope to achieve generic techniques applicable to general applications. By exploring the compressed-domain content extractability, we hope to reduce the computational complexity. We also propose a quad-tree based data structure to bind various signal features. Integrated feature maps are proposed to improve the overall effectiveness of the feature-based image query system. Current technical progress and system prototypes are also described. Part of the prototype work has been integrated into the Multimedia/VOD testbed in the Advanced Image Lab of Columbia University.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A system for offering visual information to N-ISDN video phones is proposed. The detail of the key components of this system; N-ISDN-LAN gateways, interactive service control and H.261 coded video editor are described. The features of this system are that it is flexible and the same system can be applied to a system for LAN.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Very low bit rate video coding has received considerable attention in academia and industry in terms of both coding algorithms and standards activities. In addition to the earlier ITU-T efforts on H.320 standardization for video conferencing from 64 kbps to 1.544 Mbps in ISDN environment, the ITU-T/SG15 has formed an expert group on low bit coding (LBC) for visual telephone below 64 kbps. The ITU-T/SG15/LBC work consists of two phases: the near-term and long-term. The near-term standard H.32P/N, based on existing compression technologies, mainly addresses the issues related to visual telephony at below 28.8 kbps, the V.34 modem rate used in the existing Public Switched Telephone Network (PSTN). H.32P/N will be technically frozen in January '95. The long-term standard H.32P/L, relying on fundamentally new compression technologies with much improved performance, will address video telephony in both PSTN and mobile environment. The ISO/SG29/WG11, after its highly visible and successful MPEG 1/2 work, is starting to focus on the next- generation audiovisual multimedia coding standard MPEG 4. With the recent change of direction, MPEG 4 intends to provide an audio visual coding standard allowing for interactivity, high compression, and/or universal accessibility, with high degree of flexibility and extensibility. This paper briefly summarizes these on-going standards activities undertaken by ITU-T/LBC and ISO/MPEG 4 as of December 1994.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes picture information transmission for portable multimedia terminals. The radio links used in portable multimedia terminals have narrower channel capacity and higher transmission error rates than wired links such as those used in ISDN. To transmit multimedia information of satisfactory quality over radio links, robustness against radio link errors must be improved, because picture deterioration is much more apparent than audio deterioration. First, the effects of transmission errors on picture quality are analyzed using the H.261 coding system used for ISDN picture communication. Second, the relationship among bit error rate, terminal velocity, and picture quality is analyzed and the deterioration mechanisms of picture quality are discussed. Three techniques for improving picture quality against radio link errors are proposed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper gives an overview on trends in mobile communications which is currently one of the most emerging markets. The demand for new and extended services is a driving force for the development of future mobile networks. Especially the provision of video communications is a great challenge. The current status of MPEG and ITU-T/SG15 concerning mobile communications is described and related to current projects. In the second part some aspects for the design of video codecs for mobile video telephony are outlined. Based on the coming standard H.26P and advanced video codec is described and its suitability for mobile networks is proven by some simulation results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper aims at presenting the major steps towards the elaboration of an optimum control for video transmissions on ATM network. The paper puts forward the gain in statistical multiplexing to demonstrate that transmitting at variable rates on asynchronous multiplexing links is more efficient than exploiting constant rate on synchronous links. Optimum coding and transmission require to characterize the video sources of information as entropy generator and the develop entropy rate-distortion functions in the coder and the transmission channel. Quantizers and VLC in coding, traffic and queues in transmission multiplexing lead each to performance functions expressing quality in terms of entropy rate; they are respectively the PSNR in function of the output data rate and the cell losses in terms of the network loads. The main advantage of transmitting on variable bit rate channels is to favor the generation of image sequences at constant subjective quality on the coding side, and, the saving of transmission bandwidth through a gain in statistical multiplexing on the network side. Mirror control algorithms can be implemented in the coding end and in the multiplexing nodes to optimally manage the rate-distortion functions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an exploratory view of multimedia processing and transport issues for the wireless personal terminal scenario. In this scenario, portable multimedia computing devices are used to access video/voice/data information and communication services over a broadband wireless network infrastructure. System architecture considerations are discussed, leading to identification of a specific approach based on a unified wired + wireless ATM network, a general-purpose CPU based portable terminal, and a new software architecture foe efficient media handling and quality-of-service (QoS) support. The recently proposed 'wireless ATM' network concept is outlined, and the associated transport interface at the terminal is characterized in terms of available service types (ABR, VBR, CBR) and QoS. A specific MPEG video delivery applications with VBR ATM transport and software decoding is then examined in further detail. Recognizing that software video decoding at the personal terminal represents a major performance bottleneck in this system, the concept of MPEG encoder quantizer control with joint VBR bit-rate and decoder computation constraints is introduced. Experimental results are given for software decoding of VBR MPEG video with both VBR usage parameter control (UPC) and decoder CPU constraints at the encoder, demonstrating improvements in delivered video quality relative to the conventional case without such controls.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wireless channel impairments pose many challenges to real-time visual communications. In this paper, we describe a real-time software based wireless visual communications simulation platform which can be used for performance evaluation in real-time. This simulation platform consists of two personal computers serving as hosts. Major components of each PC host include a real-time programmable video code, a wireless channel simulator, and a network interface for data transport between the two hosts. The three major components are interfaced in real-time to show the interaction of various wireless channels and video coding algorithms. The programmable features in the above components allow users to do performance evaluation of user-controlled wireless channel effects without physically carrying out these experiments which are limited in scope, time-consuming, and costly. Using this simulation platform as a testbed, we have experimented with several wireless channel effects including Rayleigh fading, antenna diversity, channel filtering, symbol timing, modulation, and packet loss.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Poster Session III: Image Filtering, Fractal, Wavelets, and Others
If a point on an object passes over two or more photoreceptors during image acquisition, a blur will occur. Under these conditions, an object or scene is said to move fast relative to the camera's ability to capture the motion. In this work, we consider the iterative restoration of images blurred by distinct, fast moving objects in the frames of a (video) image sequence. Even in the simplest case of fast object motion, the degradation is spatially variant with respect to the image scene. Rather than segmenting the image into regions where the degradation can be considered space invariant, we allow the blur to vary at each pixel and perform iterative restoration. Our approach requires complete knowledge of the blur point spread function (PSF) to restore the scene. The blur of fast moving object in a single frame is under specified. With the appropriate assumptions, an estimate of the blur PSF can be specified to within a constant scaling factor using motion information provided by a displacement vector field (DVF). A robust iterative restoration approach is followed which allows for the incorporation of prior knowledge of the scene structure into the algorithm to facilitate the restoration of difficult scenes. A bilinear approximation to the continuous PSF derived from the motion estimate is proposed to obtain results for real and synthetic sequences. We found this approach suitable for restoring motion degradations in a wide range of digital video applications. The results of this work reinforced the well known flexibility of the iterative approach to restoration and its application as an off-line image sequence restoration method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The construction of filters or filter operators is an important problem in image processing. In this paper it is described how one can construct a filter operator based on the Distance Transformation for grey-scale image. This operator is similar to max/min filters for grey-scale images. We call this operator quasi-median filter. It is not a filter from the morphological point of view like median filter, but it posses filter (mean) properties.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, MPEG-4 is being formed to study very-low-bit-rate (VLBR) video coding for applications in videotelephony. In this paper, we propose a possible postprocessing technique for VLBR coding. In videophone applications, temporal subsampling is a simple technique which can be combined with other compression schemes to achieve very large compression ratio, so as to satisfy the VLBR requirement. As a result, object motions tend to be jerky and disturbing to the human eyes. To smooth out object motions, we propose a postprocessing technique, motion compensated temporal interpolation (MCTI), to increase the instantaneous decoder frame rate. In MCTI, block-based exhaustive motion search is used to establish temporal association between two reconstructed frames. Both forward and backward searches are used to account for uncovered and newly covered areas properly. With MCTI, we show that one or more frames can be interpolated with acceptable visual quality. After showing the feasibility of MCTI, we propose a fast algorithm FMCTI with reduced computation requirement and negligible performance degradation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Previous works on subband-related signal processing were mainly dedicated to the applications of subband systems and to the formulation of multirate filter banks. Only very limited results can be found that treat statistical properties of random signals inside a multirate filter bank. In this paper, such a theoretical study is performed from the statistical viewpoint. Our main interest lies in how a multirate structure interacts with a random signal. The key statistical properties examined are stationarity, autocorrelation, cross-correlation, power spectral density, and spectral flatness measure. Exact explicit expressions are obtained. These results have their counterparts in a fullband system; however, inside a multirate structure or a subband system, the aliasing effect caused by decimation should be taken into account. In a multirate system, stationarity is not preserved when an upsampling (or expanding) operation is encountered. Furthermore the equivalent filtering operation is nonlinear. A test example of an AR-1 process is included for demonstration. From this example, an interesting phenomenon is observed. When the correlation coefficient of the AR-1 process is close to 1, the lowpassed signal is not, in any sense, a rough replica of the source. This example justifies the significance and necessity of a theoretical analysis of subband systems from a statistical viewpoint. We believe that stochastic signal processing applications of a subband structure such as estimation, detection, recognition, etc. will benefit from study of this nature.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Three dimensional (3D) object reconstruction from a series of cross sectional images has found many applications, such as computer vision, medical imaging. In this paper, we proposed a wavelet based interpolation for 3D reconstruction. In this scheme, a contour signal of interested object is decomposed by using multiresolution wavelet bases. The length of a 'filled' contours is first estimated from the two lengths of a coarsest scales of the two adjacent slices, refined by the lengths of the finer scales. The interslice contour estimation is obtained by the inverse wavelet transform. A series of CT liver images is used to test the performance of our method. Experiments show that our method can obtain satisfactory reconstruction surface. The advantages of our method are (i) no need of feature matching, which is a time consuming process and usually result in false matching results and (ii) fast algorithms for wavelet transforms can be implemented. Thus, our method is not only reliable for practical images but also computationally efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A computational model for estimation of image motion from a sequence of images obtained by an imaging sensor is developed through mathematical modeling. The paper is divided into three parts which describe the evolution of the model of image motion. An ideal model of image motion is developed in the first part. The model is unsolvable because some parameters and boundary condition are unknown. Hence, a reformulated model of image motion is derived in the second part. Although this model is solvable, it is ill-posed mainly due to the differentiation of noise- contaminated image irradiance function. The consequence of this is that the solution estimated by this model is unstable. Thus, the reformulated model is remedied and transformed into a realistic model of image motion which is discussed in the third part. The results from simulations demonstrate that this realistic model of image motion gives correct and reliable estimation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we describe an approach to detecting and tracking certain feature points in the mouth region in a talking head sequence. These feature points are interconnected in a polygononal mesh so that the detection and tracking of these points is based on the information not only at these points but also in the surrounding elements. The detection of the nodes in an initial frame is accomplished by a feature detection algorithm. The tracking of these nodes in successive frames is obtained by deforming the mesh so that, when one mesh is warped to the other, the image patterns over corresponding elements in two meshes match with each other. This is accomplished by a modified Newton algorithm which iteratively minimized the error between the two images after mesh-based- warping. The numerical calculation involved in the optimization approach is simplified by using the concept of master elements and shape functions in the finite element method. This algorithm has been applied to a SIF resolution sequence, which contains fairly rapid mouth movement. Our simulation results show that this algorithm can locate and track the feature points in the mouth region quite accurately.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lately new types of man-machine interfaces for lightening burdens of users have been studied vigorously. One important example of these interfaces is a system for users to put instruction in the noncontact way. In such systems, users put instruction into virtual environment by gestures through stereo cameras. However, the systems proposed before have several restrictions of the environment where these systems are used. Here, we note two of them, the arrangement of stereo cameras and the background of the user. They are very important in the system applications. Thus we propose a new system to put instruction into virtual environment in the noncontact way through stereo cameras, reducing these two restrictions. First, we describe the outline of the proposed system and then we describe a new algorithm to estimate the 3D motion of user's hand and the position of the head by our system. Experiments to estimate user's hand motion are made and the results are shown.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detection and tracking of facial features without using any head mounted devices may become required in various future visual communication applications, such as teleconferencing, virtual reality etc. In this paper we propose an automatic method of face feature detection using a method called edge pixel counting. Instead of utilizing color or gray scale information of the facial image, the proposed edge pixel counting method utilized the edge information to estimate the face feature positions such as eyes, nose and mouth in the first frame of a moving facial image sequence, using a variable size face feature template. For the remaining frames, feature tracking is carried out alternatively using a method called deformable template matching and edge pixel counting. One main advantage of using edge pixel counting in feature tracking is that it does not require the condition of a high inter frame correlation around the feature areas as is required in template matching. Some experimental results are shown to demonstrate the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe 5 algorithms for finding the MAT for 2D regions in this paper. There are Danielson's algorithm, Rosenfeld and Pfaltz's algorithm, interpolation/extrapolation algorithm, Newton and march algorithm and grid edge interpolation algorithm. The Rosenfeld and Pfaltz's, Danielson's, and interpolation/extrapolation methods are based on the maximal disc criterion. Whether the grid point (i,j) with distance amplitudes (a,b) to the boundary of the regions is a MA point is decided by its grid neighbors. If the discrete circle associated with the gird point (i,j) is not contained in one of the 8 discrete circles associated with its neighbors, then it is a MA point. The Newton and march and the grid edge interpolation methods are based on the equal distance criterion. Given the boundary of a region, we compute the distance transform for the discretized region as preprocessing step. With every grid point we associate the index of a nearest edge or a concave vertex, and the direction and distance to that edge or concave vertex. The main purpose of these steps is to solve the proximity problem. A system of equations will be generated and Newton method will be used to trace the MAT. If we add one more equation, such as the equation for a grid line, instead of marching MAT step by step, we can find the MA point square by square under some assumptions, this is the idea of grid interpolation method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.