## 1.

## Introduction

Due to the coarse quantization of the block-based discrete cosine transform (DCT) coefficients in prevalent video compression techniques, neighboring blocks might have noncontinuous border effects that are particularly eye catching at low bit rates. The so-called postprocessing scheme is designed to reduce blocking artifacts, and thus improve the subjective quality of the video. Many deblocking methods have been proposed. Those methods can be roughly classified into three categories according to their operating domain, namely, in the spatial domain,^{1, 2, 3, 4} in the DCT domain,^{5, 6} and in the wavelet transform domain.^{7} The algorithms operating in the spatial domain are usually simple but their results are not very satisfactory. The algorithms operating in the DCT or the wavelet domain yield better results, but the transform itself is complex and is not easy for hardware implementation. Many methods utilized prior knowledge of quantization parameters,^{2, 4, 5, 6} but the deblocking methods without knowledge of quantization parameters are more versatile in practical applications.

In this letter, we propose an adaptive postprocessing algorithm without requiring quantization parameters, which preserves object edges and image details while reducing blocking artifacts significantly. The proposed method is based on simple but effective discrete Hadamard transform (DHT), thus, the computational complexity of the algorithm is quite low. Furthermore, the algorithm exploits some cues of the human visual system (HVS) implicitly, and thus improves the visual quality well.

## 2.

## Deblocking Algorithm

Figure 1 shows a flowchart of the proposed deblocking algorithm. The algorithm takes the decoded $YCbCr$ sequences as input. To preserve the object edges, we use an edge detection module to acquire the edge information. We then calculate the local activity of each block employing a DHT. The local activity is used to adaptively control the size of a low-pass filter in the DHT domain. Finally, we perform inverse discrete Hadamard transform (IDHT) to acquire output.

## 2.1.

### Adaptive Edge Detection

Adaptive edge detection consists of two steps, direct current (DC) image generation and edge detection of the DC image. First, the input frame is divided into
$4\times 4$
nonoverlapping blocks. The mean value of every
$4\times 4$
block is calculated and the DC image is formed as the 2-D array of the mean values. Second, the Sobel operator is employed to differentiate the edges and the monotone areas in the DC image. The edge pixel in the DC image is then identified with an adaptive threshold^{6} given by

## 1

$$T=\sqrt{3}[\sum _{i=1}^{{N}_{r}}\sum _{j=1}^{{N}_{c}}\nabla X(i,j)\u2215\left({N}_{r}{N}_{c}\right)],$$## 2.2.

### DHT

Here we adopt a sequence-ordered $4\times 4$ Hadamard matrix:

## 2

$$\mathsf{H}=\frac{1}{2}\left(\begin{array}{cccc}\hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill \\ \hfill 1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill -1\hfill & \hfill 1\hfill & \hfill -1\hfill \end{array}\right).$$The DHT is computed exactly in integer arithmetic, thus avoiding the inverse transform mismatch problems of DCT and minimizing computational complexity significantly. The IDHT matrix is identical to Eq. 2, so the transform and inverse transform module are reusable when implemented in hardware.

## 2.3.

### Block Activity

If we let ${H}_{m,n}(u,v)$ be the $4\times 4$ DHT coefficients of the block with top-left point $(m,n)$ , the value of activity can be calculated as

## 3

$$\mathrm{Act}=\frac{\sum _{u=0}^{3}\sum _{v=0}^{3}\mid {H}_{m,n}(u,v)\bullet \mathrm{Mask}(u,v)\mid}{{H}_{m,n}(0,0)}-1.0,$$## 4

$$\mathrm{Mask}=\left(\begin{array}{cccc}\hfill {\mathrm{a}}_{0}\hfill & \hfill {\mathrm{a}}_{1}\hfill & \hfill {\mathrm{a}}_{2}\hfill & \hfill {\mathrm{a}}_{3}\hfill \\ \hfill {\mathrm{a}}_{1}\hfill & \hfill {\mathrm{a}}_{2}\hfill & \hfill {\mathrm{a}}_{3}\hfill & \hfill {\mathrm{a}}_{4}\hfill \\ \hfill {\mathrm{a}}_{2}\hfill & \hfill {\mathrm{a}}_{3}\hfill & \hfill {\mathrm{a}}_{4}\hfill & \hfill {\mathrm{a}}_{5}\hfill \\ \hfill {\mathrm{a}}_{3}\hfill & \hfill {\mathrm{a}}_{4}\hfill & \hfill {\mathrm{a}}_{5}\hfill & \hfill {\mathrm{a}}_{6}\hfill \end{array}\right).$$^{8}

## 2.4.

### Adaptive Filter

Motivated by the fact that the blocking artifacts in smooth regions are more eye-catching, while preserving image details and object edges, the adaptive filter with a $(2h+1)\times (2h+1)$ variable size window is mathematically formulated as

## 5

$${\widehat{H}}_{m,n}(u,v)=\frac{1}{W}\sum _{k=-h}^{h}\sum _{l=-h}^{h}{\omega}_{k,l}{H}_{m+k,n+l}(u,v),$$## 7

$${\omega}_{k,l}=\{\begin{array}{cc}3.0\hfill & (k,l)=(0,0)\hfill \\ 1.0\hfill & \mathrm{otherwise}.\hfill \end{array}$$## 8

$$h=\{\begin{array}{cc}3\hfill & \phantom{\rule{1em}{0ex}}\mathrm{Act}<T\u2215250\hfill \\ 2\hfill & \phantom{\rule{1em}{0ex}}T\u2215250\le \mathrm{Act}\le T\u221550\hfill \\ 1\hfill & \phantom{\rule{1em}{0ex}}\mathrm{Act}>T\u221550.\hfill \end{array}$$To avoid overfiltering a block centered at $(m,n)$ in texture area, its neighboring block located at $(m+k,n+l)$ is excluded from the filtering operation if Eq. 9 is satisfied.

Note that $\eta $ is set to be 0.1 empirically in this paper.## 3.

## Experimental Results

The proposed algorithm was applied to video sequences compressed by the SVC codec from Microsoft Research Asia that could cover all testing points of core experiments 1 (Ref. 9). The Microsoft SVC coding scheme is based on block-based motion-compensated temporal filtering followed by 2-D spatial wavelet decomposition.

The “ $\mathrm{Foreman}\_352\times 288\_15\_96$ ” is a “Foreman” sequence decoded with an image size of $352\times 288$ at frame rate of $15\phantom{\rule{0.3em}{0ex}}\mathrm{frames}\u2215\mathrm{s}$ , and a bit rate of $96\phantom{\rule{0.3em}{0ex}}\mathrm{kbits}\u2215\mathrm{s}$ is taken as input. To evaluate the performance of the proposed algorithm, three existing methods (see Refs. 1, 3, 7) without using prior knowledge of quantization parameters are compared. Since some reference methods are designed for postprocessing of images, for the fairness of comparison, only $Y$ components are used for comparison. Their postprocessed images are given in Fig. 2b, 2c, 2d. From Fig. 2, it is evident that our proposed method is able to outperform the compared methods by removing blocking artifacts effectively while retaining edge sharpness. It validates the adaptive filtering process in the DHT domain.

In the preceding experiments for one frame, our method on average takes $0.7\phantom{\rule{0.3em}{0ex}}\mathrm{s}$ and methods in Refs. 1, 3 all take no more than $1\phantom{\rule{0.3em}{0ex}}\mathrm{s}$ , while the wavelet-based method in Ref. 7 consumes $14\phantom{\rule{0.3em}{0ex}}\mathrm{min}$ . We chose the video with image size of $1920\times 1080$ as input to validate the simplicity of DHT. The method adopting DCT in Ref. 5 takes about $12\phantom{\rule{0.3em}{0ex}}\mathrm{min}$ for a frame, while our method on average takes no more than $5\phantom{\rule{0.3em}{0ex}}\mathrm{s}$ . All experiments were run on a $1.8\text{-}\mathrm{GHz}$ Pentium PC. Of course, the evaluated algorithms were not optimized for real-time applications. Thus, the data of computational complexity given here shows only that the proposed method may be closer to practical applications from the viewpoint of hardware simplicity.

Table 1 gives the peak SNR (PSNR)- $Y$ results comparing the objective quality. Although we see that PSNR is not a good measure to evaluate such techniques, our proposed approach achieves higher PSNR gain than the method in Ref. 7.

## Table 1

PSNR- Y comparison in decibels.

“Foreman” Frame | Decoded Video | Pixel1 | Wavelet7 | H.263 (Ref. 3) | Proposed |
---|---|---|---|---|---|

8 | 32.1991 | 32.3428 | 30.6064 | 32.0186 | 30.9718 |

15 | 32.6702 | 32.8170 | 30.8215 | 32.4060 | 31.1641 |

21 | 32.0794 | 32.3156 | 30.8771 | 31.9253 | 31.1429 |

Because human eyes are the final judges of video, we made a subjective test of some deblocking results according to double stimulus continuous quality scale method suggested by ITU-R BT.500-10 (Ref. 10). The mean opinion scores (MOS) were rescaled to a range of 0 to 100. The difference mean opinion scores (DMOS) were calculated as the difference between the original video and the test video. The DMOS of the method in Ref. 3 and the proposed method are compared in Table 2, which shows that the subjective rating of the proposed method is significantly better.

## Table 2

DMOS comparison.

Sequence | Decoded Video | H.263 (Ref. 3) | Proposed |
---|---|---|---|

$\mathrm{Foreman}\_352\times 288\_15\_96$ | 38.02 | 37.41 | 26.02 |

$\mathrm{Crew}\_352\times 288\_15\_128$ | 40.59 | 40.33 | 29.97 |

$\mathrm{Carphone}\_352\times 288\_30\_72$ | 35.47 | 34.29 | 28.17 |

$\mathrm{Hall}\_\mathrm{monitor}\_352\times 288\_30\_48$ | 39.13 | 38.02 | 30.34 |

$\mathrm{Akiyo}\_352\times 288\_30\_36$ | 34.51 | 32.84 | 29.86 |

## 4.

## Conclusions

A postprocessing algorithm for blocking artifact removal was proposed in the DHT domain. The algorithm can remove blocking artifacts effectively while preserving image details and object edges well. It is a versatile method that does not require prior knowledge of quantization parameters and features low computational complexity. Since the basic operation unit in our method is a $4\times 4$ block and DHT is inherently simple and computationally efficient, the algorithm is easy for hardware implementation and promising for real-time video postprocessing utilized in handheld devices.

## Acknowledgments

This work was supported by National Natural Science Foundation of China under Grant No. 60502034 and the Shanghai Rising-Star Program under Grant No. 05QMX1435.