Fast directional discrete cosine transform for image compression

Bo Chen; Hongxia Wang; Lizhi Cheng

doi:10.1117/1.3306634

1 February 2010 Fast directional discrete cosine transform for image compression

Bo Chen, Hongxia Wang, Lizhi Cheng

Author Affiliations +

Optical Engineering, Vol. 49, Issue 2, 020501 (February 2010). https://doi.org/10.1117/1.3306634

Abstract

A fast directional discrete cosine transform (FDDCT) is proposed for efficient representation of anisotropic edges in images. The transform is performed on the predefined direction lines similar to the intraprediction mode in H.264. Comparing to the directional discrete cosine transform (DDCT) now available, no interpolation is needed in FDDCT; thus, the amount of computation decreases by 80%. Simulation results indicate that the peak signal-to-noise ratios of images compressed using FDDCT are >1 db higher than those using DDCT.

1. Introduction

Discrete cosine transform (DCT) and discrete wavelet transform (DWT) used in image compression are implemented by separable 1-D transforms in the rows and columns of images. The major shortcoming of the separable transform is that it cannot represent the anisotropic edges in the image sparsely. Recently, various lifting-based directional transforms^{1, 2, 3} are proposed to overcome the problem. In the traditional lifting scheme, the transform matrix is factorized into elementary matrices, which correspond to the operation of adding a multiple of a row/column to another row/column, and the augend and the addend must be in the same column/row.^{4, 5} While in the directional lifting transform, lifting steps can be performed along edges with any directions; thus, it can approximate edges better and represent images more sparsely (see Fig. 1).

Fig. 1

Exemplified elementary matrix operation: (a) nondirectional and (b) directional, where the circles are the pixels and squares are half-pixels.

However, it is noted that to transform along edges with arbitrary directions, the fractional pixels are used in adaptive directional lifting and the their values are interpolated from neighboring integer pixels. Denote the number of taps of the interpolation filter as $L$ , then the computation of direction transform is about $L$ times higher than the original one.^{1, 2} The high computation cost is a serious drawback in practice, especially for DCT, which has various faster approximations⁶ that are quite suitable to portable devices and real-time processing, such as satellite onboard image compression. In this letter, we proposed a fast directional DCT (FDDCT) without interpolation. Compared to the original lifting-based directional DCT (DDCT),² the computation of FDDCT decreases by 80%, and the peak signal-to-noise ratios (PSNR) of compressed images are higher.

2. Proposed FDDCT

2.1.

Direction Modes without Interpolation

It is known that, in DDCT, interpolation on the fractional pixels is the main reason that causes large amounts of computation. If we can use only the integer pixels to approximate edges, then interpolation will be avoided. In this paper, we propose to use five direction modes to achieve this. These modes are shown in Fig. 2, where no fractional pixels exist and the size of transform matrix is $8 \times 8$ . The corresponding directional angles are form $- 45 \deg$ [mode (a), see Fig. 2] to $45 \deg$ [mode (e), see Fig. 2].

Fig. 2

Five direction modes in row FDDCT. The circles are integer pixels, and the dashed lines are direction lines.

These direction modes are like the prediction modes used in the intraprediction process in the recent video-coding standard H.264.⁷ For example, modes (a) and (b) correspond to the diagonal down-right and the vertical-right prediction in H.264, separately, and the only difference is that polylines are used to avoid interpolating in our modes. It also find that five modes are enough to approximate well most directions of edges in images.^{1, 2} Supposing that the direction modes of all blocks are selected, the transform can be performed along the direction lines defined in the direction modes (see Fig. 3), and the pixels on the direction line are all integer ones.

Fig. 3

Direction line across blocks.

2.2.

Lifting Steps across Adjacent Blocks

Because the direction modes are defined on the individual image blocks, there may exist direction lines consisting of $< 8$ integer pixels, where eight-point fast DCT cannot be performed. Extending the short direction line to the adjacent block is a natural way to solve the problem. But we find that, if the direction modes of adjacent blocks are different, then some pixels will be occupied by more than one direction lines (see Fig. 3). That is to say, lifting steps in several DCTs will modify the value of these pixels simultaneously. Thus, the DCTs on these direction lines across blocks must be modified to make the transform perfect reconstruction.

It is known that DCT can be factored into lifting steps,⁶ and the DCT matrix can be expressed as

1.

T = T_{i_{0}, j_{0}} (c_{0}) T_{i_{1}, j_{1}} (c_{1}) \dots T_{i_{n}, j_{n}} (c_{n}),

T_{i, j} (c) = I + c e_{i} e_{j},

where

I

is the identity matrix of size

M

,

e_{n}

is the

n

’th unit vector in

M

-dimensional Euclidean space and

c

, a real number, is the lifting coefficient. Thus, the lifting matrix

T_{i, j} (c)

is a triangular matrix, and in every lifting step only one element in the input vector is modified.

Because all lifting steps are similar, we just think of one lifting procedure. Take the pixels $a$ , $b_{i}$ , $i = 0, \dots, 3$ in Fig. 3, for example, where $a$ is occupied by four direction lines. To avoid modifying $a$ four times with the multiple of $b_{i}$ , we change the lifting procedure as

2.

a^{1} = a + c \sum_{i = 0}^{3} w_{i} b_{i}, i = 0, \dots, 3,

\sum_{i = 0}^{3} w_{i} = 1, w_{i} = w_{j}, i \neq j,

where

a^{1}

is output and

w_{i}

are the scaling coefficients. The corresponding inverse transform can be expressed as

Eq. 3

a = a^{1} - c \sum_{i = 0}^{3} w_{i} b_{i}, i = 0, \dots, 3 .

Obviously, it is perfect reconstruction. In the lifting procedure [Eq. 2], the mean of pixels on different direction lines is used to predict the target pixel, which often approximates the target better than using a single pixel on one direction line. Furthermore, it should be noted that the transform on the extended direction line is performed between adjacent blocks. Therefore, comparing to DCT, which performs on every block separately, FDDCT can partly remove the correlation between blocks.

At last, it must be reminded that the direction modes of blocks need to be selected before FDDCT. As stated above, the FDDCT coefficients of a block are only influenced by its adjacent blocks, which is similar with DDCT. Thus, the same dynamic programming algorithm in Ref. 2 is used to select the direction modes in FDDCT to minimize the rate-distortion cost.

3. Simulation Results

Extensive experiments have been conducted to evaluate the performance of FDDCT. We report the experimental results of four common gray images: Lena, Barbara, Boat, and Photographer, which have different characteristics. FDDCT is compared to DCT and DWT (CDF9/7 used in JPEG2000) and DDCT in our experiments. The set partition coding developed by us in Ref. 8 is used to code the coefficients of FDDCT, DCT, and DWT, which is suitable for both the block transform and the subband transform. The direction mode of a block is first predicted from the coded modes of coded neighboring blocks and then is coded by the arithmetic coding.

Figure 4 shows that PSNRs of images compressed with FDDCT are close to those with DWT. And compared to DCT, the coding gain of FDDCT ranges from $0.4 to 1.5 dB$ . The compression results of DDCT-based method in Ref. 2 are also given in Fig. 4, whose PSNRs are $> 1 dB$ lower than the proposed FDDCT based one. The decompressed Barbara is partly shown in Fig. 5. It shows that the edges in the image compressed by FDDCT are less disturbed, and the block artifacts are nearly eliminated by FDDCT. The reason should be that, in FDDCT, many direction lines are extended to the adjacent blocks; thus, the blocks are not transformed individually. It is helpful to reduce block artifacts in the compressed images.

Fig. 4

PSNR versus bit-rate curves of different transforms.

Fig. 5

Parts of Barbara compressed with DCT and FDDCT (bit rate: $0.5 bits ∕ pixel$ ): (a) is compressed with DCT and (b) is compressed with FDDCT.

Finally, we give the computational complexity of FDDCT. If the direction modes of all blocks in the image are same, then no special lifting steps are needed; thus, the computation of FDDCT is the same as that of DCT. But, in practice, different blocks generally have different direction modes. As a result, the computation depends on the edges in the image. In other words, the computation of FDDCT is image dependent.

Here, we perform several transforms on many standard gray-scale images on a $2.93 - GHz$ Pentium IV. The average executing times of DCT, DWT, and FDDCT are given in Table 1. It shows that the time of FDDCT is in direct proportion to the size of the image, which is $\sim 1.4$ times than that of DCT and one-third of DWT. Comparing to DDCT, whose computation is eight times than that of DCT (eight-tap interpolation is used), the computation decreases by $\sim 80 %$ .

Table 1

Executing time (second) of different transforms.

Size	Transform
Size	DCT	FDDCT	DWT
$512 \times 512$	0.006547	0.008672	0.032030
$1024 \times 1024$	0.030160	0.038900	0.129220
$2048 \times 2048$	0.131870	0.179953	0.557810

4. Conclusion

By choosing the direction modes consisting of integer pixels, a new directional DCT called FDDCT for image compression is proposed in this paper. Because interpolations in DDCT are avoided, the computation of FDDCT decreases $\sim 80 %$ compared to original DDCT. Furthermore, the use of interblock information in FDDCT derives less block artifacts and better rate-distortion performance in the compressed images.

Acknowledgments

The authors are grateful to the anonymous reviewers for their comments, which have helped us to improve this work. This study is supported by the National Natural Science Foundation of China (Grant No. 10601068).

References

1.

W. Ding, F. Wu, X. Wu, S. Li, and H. Li, “Adaptive directional lifting-based wavelet transform for image coding,” IEEE Trans. Image Process., 16 (2), 416 –427 (2007). https://doi.org/10.1109/TIP.2006.888341 1057-7149 Google Scholar

2.

H. Xu, J. Xu, and F. Wu, “Lifting-based directional DCT-like transform for image coding,” IEEE Trans. Circuits Syst. Video Technol., 17 (10), 1325 –1335 (2007). https://doi.org/10.1109/TCSVT.2007.903552 1051-8215 Google Scholar

3.

L. Zhang, D. Wang, and A. Vincent, “Adaptive zero-tree structure for curved wavelet image coding,” Opt. Eng., 45 27 –30 (2006). 0091-3286 Google Scholar

4.

D. L. Liang, L. Cheng, and Z. H. Zhang, “General construction of wavelet filters via a lifting scheme and its application in image coding,” Opt. Eng., 42 1949 –1955 (2003). https://doi.org/10.1117/1.1580153 0091-3286 Google Scholar

5.

Y.-J. Chen and K. S. Amaratunga, “M-channel lifting factorization of perfect reconstruction filter banks and reversible M-band wavelet transforms,” IEEE Trans. Circuits Syst. Video Technol., 50 (12), 963 –976 (2003). 1051-8215 Google Scholar

6.

J. Liang and T. D. Tran, “Fast multiplierless approximations of the DCT with the lifting scheme,” IEEE Trans. Signal Process., 49 (12), 3032 –3044 (2001). https://doi.org/10.1109/78.969511 1053-587X Google Scholar

7.

, ITU-T Rec. H.264 ISO/IEC 14496-10 (AVC) (Mar. (2005) Google Scholar

8.

B. Chen, L. Cheng, and H. Wang, “LBT based low complexity image compression method,” 941 –944 (2006). Google Scholar

Citation Download Citation

Bo Chen, Hongxia Wang, and Lizhi Cheng "Fast directional discrete cosine transform for image compression," Optical Engineering 49(2), 020501 (1 February 2010). https://doi.org/10.1117/1.3306634

Published: 1 February 2010

Access the abstract

JOURNAL ARTICLE
3 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 4 scholarly publications.

Explore citations on Lens.org

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Image compression

Discrete wavelet transforms

Optical engineering

Signal to noise ratio

Defense technologies

Device simulation

Image filtering

1.

Introduction

Fig. 1

2.

Proposed FDDCT

2.1.

Direction Modes without Interpolation

Fig. 2

Fig. 3

2.2.

Lifting Steps across Adjacent Blocks

1.

2.

Eq. 3

3.

Simulation Results

Fig. 4

Fig. 5

Table 1

4.

Conclusion

Acknowledgments

References

Show All Keywords

Keywords/Phrases

Search In:

Publication Years