1 February 2010 Fast directional discrete cosine transform for image compression
Author Affiliations +
A fast directional discrete cosine transform (FDDCT) is proposed for efficient representation of anisotropic edges in images. The transform is performed on the predefined direction lines similar to the intraprediction mode in H.264. Comparing to the directional discrete cosine transform (DDCT) now available, no interpolation is needed in FDDCT; thus, the amount of computation decreases by 80%. Simulation results indicate that the peak signal-to-noise ratios of images compressed using FDDCT are >1 db higher than those using DDCT.
Chen, Wang, and Cheng: Fast directional discrete cosine transform for image compression



Discrete cosine transform (DCT) and discrete wavelet transform (DWT) used in image compression are implemented by separable 1-D transforms in the rows and columns of images. The major shortcoming of the separable transform is that it cannot represent the anisotropic edges in the image sparsely. Recently, various lifting-based directional transforms1, 2, 3 are proposed to overcome the problem. In the traditional lifting scheme, the transform matrix is factorized into elementary matrices, which correspond to the operation of adding a multiple of a row/column to another row/column, and the augend and the addend must be in the same column/row.4, 5 While in the directional lifting transform, lifting steps can be performed along edges with any directions; thus, it can approximate edges better and represent images more sparsely (see Fig. 1).

Fig. 1

Exemplified elementary matrix operation: (a) nondirectional and (b) directional, where the circles are the pixels and squares are half-pixels.


However, it is noted that to transform along edges with arbitrary directions, the fractional pixels are used in adaptive directional lifting and the their values are interpolated from neighboring integer pixels. Denote the number of taps of the interpolation filter as L , then the computation of direction transform is about L times higher than the original one.1, 2 The high computation cost is a serious drawback in practice, especially for DCT, which has various faster approximations6 that are quite suitable to portable devices and real-time processing, such as satellite onboard image compression. In this letter, we proposed a fast directional DCT (FDDCT) without interpolation. Compared to the original lifting-based directional DCT (DDCT),2 the computation of FDDCT decreases by 80%, and the peak signal-to-noise ratios (PSNR) of compressed images are higher.


Proposed FDDCT


Direction Modes without Interpolation

It is known that, in DDCT, interpolation on the fractional pixels is the main reason that causes large amounts of computation. If we can use only the integer pixels to approximate edges, then interpolation will be avoided. In this paper, we propose to use five direction modes to achieve this. These modes are shown in Fig. 2, where no fractional pixels exist and the size of transform matrix is 8×8 . The corresponding directional angles are form 45deg [mode (a), see Fig. 2] to 45deg [mode (e), see Fig. 2].

Fig. 2

Five direction modes in row FDDCT. The circles are integer pixels, and the dashed lines are direction lines.


These direction modes are like the prediction modes used in the intraprediction process in the recent video-coding standard H.264.7 For example, modes (a) and (b) correspond to the diagonal down-right and the vertical-right prediction in H.264, separately, and the only difference is that polylines are used to avoid interpolating in our modes. It also find that five modes are enough to approximate well most directions of edges in images.1, 2 Supposing that the direction modes of all blocks are selected, the transform can be performed along the direction lines defined in the direction modes (see Fig. 3), and the pixels on the direction line are all integer ones.

Fig. 3

Direction line across blocks.



Lifting Steps across Adjacent Blocks

Because the direction modes are defined on the individual image blocks, there may exist direction lines consisting of <8 integer pixels, where eight-point fast DCT cannot be performed. Extending the short direction line to the adjacent block is a natural way to solve the problem. But we find that, if the direction modes of adjacent blocks are different, then some pixels will be occupied by more than one direction lines (see Fig. 3). That is to say, lifting steps in several DCTs will modify the value of these pixels simultaneously. Thus, the DCTs on these direction lines across blocks must be modified to make the transform perfect reconstruction.

It is known that DCT can be factored into lifting steps,6 and the DCT matrix can be expressed as


where I is the identity matrix of size M , en is the n ’th unit vector in M -dimensional Euclidean space and c , a real number, is the lifting coefficient. Thus, the lifting matrix Ti,j(c) is a triangular matrix, and in every lifting step only one element in the input vector is modified.

Because all lifting steps are similar, we just think of one lifting procedure. Take the pixels a , bi , i=0,,3 in Fig. 3, for example, where a is occupied by four direction lines. To avoid modifying a four times with the multiple of bi , we change the lifting procedure as


where a1 is output and wi are the scaling coefficients. The corresponding inverse transform can be expressed as


Obviously, it is perfect reconstruction. In the lifting procedure [Eq. 2], the mean of pixels on different direction lines is used to predict the target pixel, which often approximates the target better than using a single pixel on one direction line. Furthermore, it should be noted that the transform on the extended direction line is performed between adjacent blocks. Therefore, comparing to DCT, which performs on every block separately, FDDCT can partly remove the correlation between blocks.

At last, it must be reminded that the direction modes of blocks need to be selected before FDDCT. As stated above, the FDDCT coefficients of a block are only influenced by its adjacent blocks, which is similar with DDCT. Thus, the same dynamic programming algorithm in Ref. 2 is used to select the direction modes in FDDCT to minimize the rate-distortion cost.


Simulation Results

Extensive experiments have been conducted to evaluate the performance of FDDCT. We report the experimental results of four common gray images: Lena, Barbara, Boat, and Photographer, which have different characteristics. FDDCT is compared to DCT and DWT (CDF9/7 used in JPEG2000) and DDCT in our experiments. The set partition coding developed by us in Ref. 8 is used to code the coefficients of FDDCT, DCT, and DWT, which is suitable for both the block transform and the subband transform. The direction mode of a block is first predicted from the coded modes of coded neighboring blocks and then is coded by the arithmetic coding.

Figure 4 shows that PSNRs of images compressed with FDDCT are close to those with DWT. And compared to DCT, the coding gain of FDDCT ranges from 0.4to1.5dB . The compression results of DDCT-based method in Ref. 2 are also given in Fig. 4, whose PSNRs are > 1dB lower than the proposed FDDCT based one. The decompressed Barbara is partly shown in Fig. 5. It shows that the edges in the image compressed by FDDCT are less disturbed, and the block artifacts are nearly eliminated by FDDCT. The reason should be that, in FDDCT, many direction lines are extended to the adjacent blocks; thus, the blocks are not transformed individually. It is helpful to reduce block artifacts in the compressed images.

Fig. 4

PSNR versus bit-rate curves of different transforms.


Fig. 5

Parts of Barbara compressed with DCT and FDDCT (bit rate: 0.5bitspixel ): (a) is compressed with DCT and (b) is compressed with FDDCT.


Finally, we give the computational complexity of FDDCT. If the direction modes of all blocks in the image are same, then no special lifting steps are needed; thus, the computation of FDDCT is the same as that of DCT. But, in practice, different blocks generally have different direction modes. As a result, the computation depends on the edges in the image. In other words, the computation of FDDCT is image dependent.

Here, we perform several transforms on many standard gray-scale images on a 2.93-GHz Pentium IV. The average executing times of DCT, DWT, and FDDCT are given in Table 1. It shows that the time of FDDCT is in direct proportion to the size of the image, which is 1.4 times than that of DCT and one-third of DWT. Comparing to DDCT, whose computation is eight times than that of DCT (eight-tap interpolation is used), the computation decreases by 80% .

Table 1

Executing time (second) of different transforms.

512×512 0.0065470.0086720.032030
1024×1024 0.0301600.0389000.129220
2048×2048 0.1318700.1799530.557810



By choosing the direction modes consisting of integer pixels, a new directional DCT called FDDCT for image compression is proposed in this paper. Because interpolations in DDCT are avoided, the computation of FDDCT decreases 80% compared to original DDCT. Furthermore, the use of interblock information in FDDCT derives less block artifacts and better rate-distortion performance in the compressed images.


The authors are grateful to the anonymous reviewers for their comments, which have helped us to improve this work. This study is supported by the National Natural Science Foundation of China (Grant No. 10601068).


1.  W. Ding, F. Wu, X. Wu, S. Li, and H. Li, “Adaptive directional lifting-based wavelet transform for image coding,” IEEE Trans. Image Process.1057-7149 16(2), 416–427 (2007). 10.1109/TIP.2006.888341 Google Scholar

2.  H. Xu, J. Xu, and F. Wu, “Lifting-based directional DCT-like transform for image coding,” IEEE Trans. Circuits Syst. Video Technol.1051-8215 17(10), 1325–1335 (2007). 10.1109/TCSVT.2007.903552 Google Scholar

3.  L. Zhang, D. Wang, and A. Vincent, “Adaptive zero-tree structure for curved wavelet image coding,” Opt. Eng.0091-3286 45, 27–30 (2006). Google Scholar

4.  D. L. Liang, L. Cheng, and Z. H. Zhang, “General construction of wavelet filters via a lifting scheme and its application in image coding,” Opt. Eng.0091-3286 42, 1949–1955 (2003). 10.1117/1.1580153 Google Scholar

5.  Y.-J. Chen and K. S. Amaratunga, “M-channel lifting factorization of perfect reconstruction filter banks and reversible M-band wavelet transforms,” IEEE Trans. Circuits Syst. Video Technol.1051-8215 50(12), 963–976 (2003). Google Scholar

6.  J. Liang and T. D. Tran, “Fast multiplierless approximations of the DCT with the lifting scheme,” IEEE Trans. Signal Process.1053-587X 49(12), 3032–3044 (2001). 10.1109/78.969511 Google Scholar

7. Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 ISO/IEC 14496-10 (AVC) (Mar. 2005). Google Scholar

8.  B. Chen, L. Cheng, and H. Wang, “LBT based low complexity image compression method,” in Proc. of Int. Conf. on Pattern Recognition, Hong Kong, Vol. 1, pp. 941–944, IEEE Computer Society, Hong Kong (2006). Google Scholar

Bo Chen, Hongxia Wang, Lizhi Cheng, "Fast directional discrete cosine transform for image compression," Optical Engineering 49(2), 020501 (1 February 2010). https://doi.org/10.1117/1.3306634


Wavelet transforms in a JPEG-like image coder
Proceedings of SPIE (September 16 1994)
Construction Of Low Noise Correlation Filters
Proceedings of SPIE (October 15 1986)

Back to Top