Temporal coherence of retargeted video is important. Even a mild temporal inconsistency among subsequent frames can result in visually annoying artifacts, such as jittering or flickering. To reduce these artifacts, we assign a common seam for a group of consecutive frames (GoF). The video sequence is divided into GoFs, and a representative frame for each GoF is formed. Then, both the spatial saliency in each representative frame and the temporal coherence among the consecutive representative frames are considered to determine a common seam for each GoF. Since a fixed seam of the representative frame is applied to the whole frames in a GoF, there will be no jittering and flickering artifacts within the GoF. Also, since the temporal coherence between the two consecutive representative frames (i.e., two consecutive GoFs) was taken into account, the transition between two consecutive GoFs will be smooth.
Frequency features of stereo images are investigated in the DFT (Discrete Fourier Transform) domain by characterizing phase and magnitude properties originated from the horizontal disparities in the stereo images. Also, the well-known DFT properties including the conjugate symmetry property are utilized to identify essential frequency components of stereo images. Our investigation reveals that the DFT of the stereo images has useful properties that can prioritize the DFT coefficients for compact representations and compressions.
An interpolation method for a line-pruned image is presented. The goal is to alleviate the time complexity of the previous NEDI-8 (new edge directed interpolation) method. The idea of our solution is to use NEDI-4 selectively, applying the NEDI-4 for edge pixels while using a linear filter for others. For the linear filter, the interpolation coefficients for each pruned line are adaptively determined.
Although solutions to a particular attack against the quantization index modulation (QIM) method such as valumetric scaling have been proposed, no attempt has been made to a combined form of common attacks, including valumetric scaling, constant change, additive white Gaussian noise (AWGN), and compressions. Of course, the composite attacks are more realistic but more difficult to solve. In this work, we tackle composite attacks in the framework of the QIM. We seek the solution in DC values of image blocks. By embedding watermark bits into DC values, QIM becomes generically robust against AWGN and most image compression attacks. To undo the alterations caused by valumetric scaling and constant change, pilot reference signals can be embedded into some selected image blocks, where the parameters for the affine transform model of the attacks are estimated directly from the watermarked images. Four versions of block-wise QIM algorithms with pilot signals are proposed. Experimental results show their robustness against severe composite attacks, including valumetric scaling, constant change, AWGN, and JPEG compressions.
Images included in documents usually provide information that may not be readily expressible by words. For example, academic articles with similar pictures may be of interest for researchers. We deal with the problem of extracting images in digital document. Given a digital document, the optimal block size is first determined by finding the best fit of the horizontally projected gray-level pattern to a set of orthogonal basis vectors. Because the block with the optimal size is supposed to contain sufficient information to identify text regions, the proposed method is font-size independent regardless of the size of the words in the text lines. The blocks divided by the optimal block size are classified into one of image, text, and background blocks. This block classification result, in turn, is used for the initial configuration for blockwise document segmentation. The blockwise segmentation method is based on the maximum a posteriori (MAP) framework with a deterministic relaxation algorithm. After the blockwise segmentation, each boundary block in the image region is further divided into four subblocks and the class labels for these subblocks are updated. These subdivision and class updating processes are executed recursively until we have a pixel-level segmentation. Experimental results show that the proposed image extraction method yields 2.9% error rates for 232 documents in the Oulu database.
Object segmentation and extraction play an important role in computer vision and recognition problems. Unfortunately,
with current computing technologies, fully automatic object segmentation is not possible, but human intervention is
needed for outlining the rough boundary of the object to be segmented. The goal of this paper is to make the object
extraction automatic after the first semi-automatic segmentation. That is, once the semantically meaningful object such
as a house or a human body is extracted from the image under human's guidance, an image manipulation technique is
applied. There is no noticeable difference between the original and the manipulated images. However, the embedded
signature by the image manipulation can be detected automatically to be used to differentiate the object from the
background. The manipulated images, which is called automatic-object-extractible images, can be used to provide
training images with the same object but various background images.
A novel image quality assessment using the edge histogram descriptor (EHD) of MPEG-7 is presented. Neither additional data nor fragile watermarking is needed for the quality assessment and the image content authentication. Also, the original image is not needed for our method, no need to access the original image as a reference. Only the EHD metadata of the original image and the received (noisy or altered) one are required. The PSNR (Peak to Signal-to-Noise Ratio) or the mean-square error (MSE) is obtained by comparing the EHD extracted from the received image and that of the original image attached as the meta-data. Then, it is used to assess the level of the image degradation and any illicit modification of the image. Experimental results show that the PSNRs calculated from the two EHDs are similar to those calculated from the pixel-to-pixel comparisons of original and received images. This implies that one can use the EHD, instead of the image data, to calculate the PSNR for the image assessment. Also, since the EHD extracted from the received image is prone to be changed according to the alterations of the image content, one can also use the proposed method as the image authentication purpose.
In this paper, we propose a region-based image retrieval system using EHD (Edge Histogram Descriptor) and CLD (Color Layout Descriptor) of MPEG-7 descriptors. The combined descriptor can efficiently describe edge and color features in terms of sub-image regions. That is, the basic unit for the selection of the region-of-interest (ROI) in the image is the sub-image block of the EHD, which corresponds to 16 (i.e., 4x4) non-overlapping image blocks in the image space. This implies that, to have a one-to-one region correspondence between EHD and CLD, we need to take an 8x8 inverse DCT (IDCT) for the CLD. Experimental results show that the proposed retrieval scheme can be used for image retrieval with the ROI based image retrieval for MPEG-7 indexed images.
The main difficulty in segmenting a cell image occurs when there are red blood cells touching the leukocyte. Similar brightness of the touched red blood cells with the leukocytes make the separation of the cytoplasm from the red blood cells quite difficult. Conventional approaches were based on the search of the concavities created by contact of two round boundaries as two points to be connected for the separation. Here, we exploit the fact that the boundary of the leukocytes normally has a round shape and a small portion of it is disconnected due to the touching red blood cells. Specifically, at an initial central point of the nucleus in the leukocyte, we can generate the largest possible circle that covers a circular portion of the composite of nucleus and cytoplasm areas. Then, by perturbing the initial central points and selecting only those central points that do not cross the boundary, we can cover most of interior regions in the nucleus and the cytoplasm, separating the leukocyte from the touching red blood cells.
A digital video watermarking algorithm using 3D-DCT and intra-cubic correlation is proposed. More specifically, we divide the video sequence into non-overlapping image cubes and take a 3D-DCT on each image cube. The coefficients in the frequency domain in the cube are randomly selected and partitioned into two equal sets. Then, by referring the user-defined logo, a small value is added to the coefficients in one set while the same amount is subtracted from those of the other set. By taking the difference of mean values of the two sets in each cube, we can extract the watermark bits embedded into the cube. Collecting all watermark bits and visually inspecting the collected image logo, one can assert the copyright of the video. Experimental results show that we can extract over 90% of the binary logo image for various possible attacks such as MPEG compression, frame-rate changes, format conversion and frame skipping.
The block-based image segmentation method is known to alleviate the over-segmentation problem of the morphological segmentation methods. In this paper, we improve the previous block-based MAP segmentations. First, to reduce the execution time, we try to reduce the number of undecided blocks. That is, as the block size is reduced, we define new monotone regions with the undecided blocks to decrease the number of undecided blocks and to overcome the undersegmentation problem. Second, to improve the segmentation accuracy, we adopt two different block sizes. For texture block clustering process, we use a large block- size. On the contrary, for monotone and edge block classification, it is more efficient to use a small block- size. The proposed segmentation method is applied to natural images with monotone and texture regions. Experimental results show that the proposed method yields large segments for texture regions while it can also pick up some detail monotone regions to overcome the under-segmentation problem.
In this paper, we propose a new image feature extraction method for MPEG compressed video. To minimize the MPEG decoding process, we use only DC values for Y, Cr, and Cb components for each macroblock. Then, we can obtain a feature vector, using the decoded DC values of Y, Cr, and Cb components, for all macroblocks in I frame. The feature vector consists of histograms for various colors, luminance, and edge types. In obtaining histograms for colors and luminance features, we consider the ratio of contributing pure colors and luminance to the chroma DC values for each macroblock. Then, we update all contributing colors and/or luminance histograms accordingly. Otherwise, if the macroblock is classified as an edge block, we update the corresponding edge type histogram. To demonstrate the performance of the proposed feature extraction method, we apply it to a scene change detection problem.
In this paper, we propose a new image feature extraction algorithm in the compression domain. To minimize the decompression process the proposed feature extraction algorithm executes only the parsing process to the compressed bit stream. Then, by just decoding dct_dc_size in the MPEG-2 bit stream, we can determine if there exists any abrupt brightness change between two DCT blocks. According to the Huffman table for the MPEG-2 encoder, as the difference of the dc values between two succeeding DCT blocks increases, it yields longer coded bits. That is, the length of the coded dc value is proportional to the brightness change between two succeeding DCT blocks. Therefore, one can detect an edge feature between DCT blocks by just decoding the information regarding the number of bits assigned to the difference of the dc values. To demonstrate the usefulness of the proposed feature extraction method, we apply the detected edge features to find scene changes in the MPEG-2 compressed bit stream.
This paper presents a comparative study of three deterministic unsupervised image segmentation algorithms. All of the three algorithms basically make use of a Markov random field (MRF) and try to obtain an approximate solution to the maximum likelihood or the maximum a posteriori estimates. Although the three algorithms are based on the same stochastic image models, they adopt different ways to incorporate model parameter estimation into the iterative region label updating procedure. The differences among the three algorithms are identified and the convergence properties are compared both analytically and experimentally.
We propose an improved segmentation algorithm to extract an object from a forward-looking infrared (FLIR) image. The observed FLIR images are considered to be made up of three stochastic models. The first model is in charge of the noise component and is assumed to be independent Gaussian. The labeling of two regions (i.e., the object and the background) in the second model should obey the Gibbs random field (GRF). Finally, we adopt a population parameter to represent the ratio of the size of the object to that of the background. The population parameter eases the tendency to produce similar-sized segmentations. Establishing the stochastic models, we incorporate maximum <i>a posteriori</i> (MAP) estimation to determine the region labels. The optimization of the MAP criterion is achieved by a deterministic relaxation method to converge quickly to a local maximum.