Discrete cosine transform (DCT) and discrete wavelet transform (DWT) used in image compression are implemented by separable 1-D transforms in the rows and columns of images. The major shortcoming of the separable transform is that it cannot represent the anisotropic edges in the image sparsely. Recently, various lifting-based directional transforms1, 2, 3 are proposed to overcome the problem. In the traditional lifting scheme, the transform matrix is factorized into elementary matrices, which correspond to the operation of adding a multiple of a row/column to another row/column, and the augend and the addend must be in the same column/row.4, 5 While in the directional lifting transform, lifting steps can be performed along edges with any directions; thus, it can approximate edges better and represent images more sparsely (see Fig. 1).
However, it is noted that to transform along edges with arbitrary directions, the fractional pixels are used in adaptive directional lifting and the their values are interpolated from neighboring integer pixels. Denote the number of taps of the interpolation filter as , then the computation of direction transform is about times higher than the original one.1, 2 The high computation cost is a serious drawback in practice, especially for DCT, which has various faster approximations6 that are quite suitable to portable devices and real-time processing, such as satellite onboard image compression. In this letter, we proposed a fast directional DCT (FDDCT) without interpolation. Compared to the original lifting-based directional DCT (DDCT),2 the computation of FDDCT decreases by 80%, and the peak signal-to-noise ratios (PSNR) of compressed images are higher.
Direction Modes without Interpolation
It is known that, in DDCT, interpolation on the fractional pixels is the main reason that causes large amounts of computation. If we can use only the integer pixels to approximate edges, then interpolation will be avoided. In this paper, we propose to use five direction modes to achieve this. These modes are shown in Fig. 2, where no fractional pixels exist and the size of transform matrix is . The corresponding directional angles are form [mode (a), see Fig. 2] to [mode (e), see Fig. 2].
These direction modes are like the prediction modes used in the intraprediction process in the recent video-coding standard H.264.7 For example, modes (a) and (b) correspond to the diagonal down-right and the vertical-right prediction in H.264, separately, and the only difference is that polylines are used to avoid interpolating in our modes. It also find that five modes are enough to approximate well most directions of edges in images.1, 2 Supposing that the direction modes of all blocks are selected, the transform can be performed along the direction lines defined in the direction modes (see Fig. 3), and the pixels on the direction line are all integer ones.
Lifting Steps across Adjacent Blocks
Because the direction modes are defined on the individual image blocks, there may exist direction lines consisting of integer pixels, where eight-point fast DCT cannot be performed. Extending the short direction line to the adjacent block is a natural way to solve the problem. But we find that, if the direction modes of adjacent blocks are different, then some pixels will be occupied by more than one direction lines (see Fig. 3). That is to say, lifting steps in several DCTs will modify the value of these pixels simultaneously. Thus, the DCTs on these direction lines across blocks must be modified to make the transform perfect reconstruction.
It is known that DCT can be factored into lifting steps,6 and the DCT matrix can be expressed asis the identity matrix of size , is the ’th unit vector in -dimensional Euclidean space and , a real number, is the lifting coefficient. Thus, the lifting matrix is a triangular matrix, and in every lifting step only one element in the input vector is modified.
Because all lifting steps are similar, we just think of one lifting procedure. Take the pixels , , in Fig. 3, for example, where is occupied by four direction lines. To avoid modifying four times with the multiple of , we change the lifting procedure asis output and are the scaling coefficients. The corresponding inverse transform can be expressed as2], the mean of pixels on different direction lines is used to predict the target pixel, which often approximates the target better than using a single pixel on one direction line. Furthermore, it should be noted that the transform on the extended direction line is performed between adjacent blocks. Therefore, comparing to DCT, which performs on every block separately, FDDCT can partly remove the correlation between blocks.
At last, it must be reminded that the direction modes of blocks need to be selected before FDDCT. As stated above, the FDDCT coefficients of a block are only influenced by its adjacent blocks, which is similar with DDCT. Thus, the same dynamic programming algorithm in Ref. 2 is used to select the direction modes in FDDCT to minimize the rate-distortion cost.
Extensive experiments have been conducted to evaluate the performance of FDDCT. We report the experimental results of four common gray images: Lena, Barbara, Boat, and Photographer, which have different characteristics. FDDCT is compared to DCT and DWT (CDF9/7 used in JPEG2000) and DDCT in our experiments. The set partition coding developed by us in Ref. 8 is used to code the coefficients of FDDCT, DCT, and DWT, which is suitable for both the block transform and the subband transform. The direction mode of a block is first predicted from the coded modes of coded neighboring blocks and then is coded by the arithmetic coding.
Figure 4 shows that PSNRs of images compressed with FDDCT are close to those with DWT. And compared to DCT, the coding gain of FDDCT ranges from . The compression results of DDCT-based method in Ref. 2 are also given in Fig. 4, whose PSNRs are lower than the proposed FDDCT based one. The decompressed Barbara is partly shown in Fig. 5. It shows that the edges in the image compressed by FDDCT are less disturbed, and the block artifacts are nearly eliminated by FDDCT. The reason should be that, in FDDCT, many direction lines are extended to the adjacent blocks; thus, the blocks are not transformed individually. It is helpful to reduce block artifacts in the compressed images.
Finally, we give the computational complexity of FDDCT. If the direction modes of all blocks in the image are same, then no special lifting steps are needed; thus, the computation of FDDCT is the same as that of DCT. But, in practice, different blocks generally have different direction modes. As a result, the computation depends on the edges in the image. In other words, the computation of FDDCT is image dependent.
Here, we perform several transforms on many standard gray-scale images on a Pentium IV. The average executing times of DCT, DWT, and FDDCT are given in Table 1. It shows that the time of FDDCT is in direct proportion to the size of the image, which is times than that of DCT and one-third of DWT. Comparing to DDCT, whose computation is eight times than that of DCT (eight-tap interpolation is used), the computation decreases by .
Executing time (second) of different transforms.
By choosing the direction modes consisting of integer pixels, a new directional DCT called FDDCT for image compression is proposed in this paper. Because interpolations in DDCT are avoided, the computation of FDDCT decreases compared to original DDCT. Furthermore, the use of interblock information in FDDCT derives less block artifacts and better rate-distortion performance in the compressed images.
The authors are grateful to the anonymous reviewers for their comments, which have helped us to improve this work. This study is supported by the National Natural Science Foundation of China (Grant No. 10601068).