## 1.

## Introduction

With advancements in mobile communication devices, technology now allows people to communicate while looking at each other’s face. This technology is also referred to as videoconferencing and basically transmits images to a display system so users can see each other while talking, as shown in Fig. 1(a). Many market analysts predict the number of subscribers to image communication services grows exponentially every year because of lower mobile device prices and aggressive marketing of communication companies, as shown in Fig. 1(b). As image communication services come into wide use, consumers want high-quality services. Although image communication services already exist over third-generation (3G) wireless networks, such as the high-speed downlink packet access (HSDPA), there are still obstacles that prevent high-quality communications because of limited bandwidth (maximum uploading and downloading speeds are 14.4 and 5.76 Mbps, respectively). Consequently, more research is required to overcome the limited bandwidth of current communications systems and achieve high-quality image reconstruction in mobile devices. In terms of image processing, core technologies for high-quality image reconstruction are face hallucination and compression artifact reduction.

Face hallucination technology, which is also referred as face super-resolution (SR), is very important for image communications because the main interests of consumers are facial regions, as shown in Fig. 2. A number of related face hallucination methods have been proposed in recent years. Among them, learning-based methods have received much attention because they can achieve a high magnification factor and produce good SR results compared with other methods. Baker and Kanade^{1}^{,}^{2} first introduced a face hallucination method which constructs the high frequency components from a parent-structure resorting to the training set. Wang and Tang^{3} presented a principal component analysis (PCA)-based face hallucination algorithm to globally infer the high-resolution face image. Liu et al.^{4} developed a two-step statistical modeling approach which integrates a global model and a local model corresponding to the common and specific face characteristics, respectively. Although complicated probabilistic models are required in Liu et al.’s method,^{4} the idea of the two-step approach became more and more popular since then. Recently, a novel face hallucination method based on position-patch has been proposed. The position-patch based method hallucinates the high resolution (HR) image patch using the same position image patches of training images.^{5}6.^{–}^{7} Thus, it is able to save computational time and produce high-quality SR results compared to manifold learning-based methods.

With respect to the compression artifact reduction, several compression artifacts inevitably occur because of the loss of high frequency components caused by lossy compression techniques such as H.264 or MPEG-4 (representative artifact: blocking artifact). They seriously degrade the picture quality and are annoying to viewers of the reconstructed images as shown in Fig. 3.^{8}^{,}^{9} Accordingly, compression artifact reduction is also very important for image communications. Blocking artifacts appear as grid noise along the block boundaries because each block is transformed and quantized independently. Blocking artifacts occur because of the independent transform and quantization of each block without considering inter-block correlations. Up to now, many studies have been conducted to reduce blocking artifacts from compressed images. Among them, image restoration techniques are commonly used to reduce blocking artifacts and recover the original image;^{10} projection-onto-convex-sets (POCS)-based methods are representative research results of such techniques. In the POCS-based methods, prior information was represented as convex sets for reconstruction, and blocking artifacts were reduced by iteration procedures.^{11} POCS-based methods are very effective in reducing blocking artifacts because they are easy to impose smoothness constraints around block boundaries. Total variation (TV)-based methods are actively studied for image deblocking.^{12}^{,}^{13} TV provides an effective criterion for image restoration, and thus can be successfully used as prior information for image deblocking. Alter et al.^{13} proposed a constrained TV minimization method to reduce blocking artifacts without removing perceptual features. By the TV minimization, edge information was effectively preserved while reducing blocking artifacts. Moreover, a field of experts (FoE) prior was successfully applied to image deblocking.^{10} In this method, the image deblocking problem was solved by the maximum a posteriori (MAP) estimation based on the FOE prior. The two technologies are associated with inverse problems in image processing. In this article, we provide an outline of recent studies on face hallucination and compression artifact reduction.

The rest of this article is organized as follows. In Sec. 2, we describe the inverse problems in image processing. In Sec. 3, we explain recent research trends and results related to face hallucination, and we address them related to compression artifact reduction in Sec. 4. In Sec. 5, we discuss practical considerations and possible solutions to implement two technologies in mobile applications. Finally, conclusions are made in Sec. 6.

## 2.

## Inverse Problems in Image Processing

Inverse problems involve estimating parameters or data from inadequate observations; the observations are often noisy and contain incomplete information about the target parameter or data due to physical limitations of the measurement devices. Due to lack of sufficient information in the indirect observations, solutions to inverse problems are usually nonunique and challenging. That is, they are ill-posed problems, and thus, some other reconstruction technologies are required to solve them including machine learning, Bayesian inference, convex optimization, sparse representation, and so on.^{14}15.^{–}^{16}

Indeed, many problems in image processing can be represented as inverse problems. They are modeled by relating the observed image $g(r)$ to the unknown original image $f(r)$. A general form for the relation is as follows:^{14}

Figure 4 shows the observation model in image processing which can be formulated as inverse problems. In image processing, there are many inverse problems such as image denoising, image SR, image deblurring, image decompression, and so on. Above all, we inevitably meet several inverse problems in image communications because transmission bandwidth is strictly limited in a mobile communication environment. Consequently, image sequences are compressed and transmitted using lossy compression techniques such as H.264 and MPEG-4, and thus, undesired image distortions also occur because of compression artifacts resulting from lossy compression techniques. In this article, we deal with two representative inverse problems in image processing: face hallucination and image deblocking.

## 3.

## Face Hallucination

Since the concept of face hallucination is introduced by Baker and Kanade,^{1}^{,}^{2} a number of related face hallucination methods have been proposed during the past decade. In general, there are two classes of SR techniques: multiframe SR (from inputs images only) and single-frame SR (from other training images). From a methodological viewpoint, it can be widely divided into interpolation-based,^{17}^{,}^{18} reconstruction-based,^{19}20.21.22.23.^{–}^{24} and learning-based^{3}^{,}^{6}^{,}^{7}^{,}^{25}26.27.28.29.^{–}^{30} methods.

First, the basic interpolation methods include nearest-neighbor interpolation, bilinear interpolation and bicubic interpolation, etc.^{17}^{,}^{18} Given one low resolution (LR) image, they only use the information of the original pixel and several pixels around it to estimate the missing pixels. It is simple and fast and can get some results when the interpolation factor is small. However, when the interpolation factor is large, the performance is not good because the high frequency information is missed. Second, reconstruction-based methods firstly build an observation model to connect the original HR image and realistic LR image, simulating the process to get a LR image from a HR image. There are many reconstruction-based methods, such as POCS,^{19} MAP method,^{20} iterative back-projection method,^{21}^{,}^{22} regular method,^{23} and mixed method,^{24} etc. All of them need some locality prior assumptions, and can make the blur and saw-tooth effects to a certain extent. Since the prior knowledge is somewhat little, the information provided by LR images may not satisfy with the demand for HR images. Third, learning-based methods have received much attention in recent years because they can achieve a high magnification factor and produce good SR results compared with other methods. The basic idea is to compute the neighborhood between the patch of test images and the patches of training images set, and construct the optimal coefficients to approximate the HR image using the learned prior knowledge. In this article, we focus on learning-based face hallucination methods and introduce some representative works and our research results.

## 3.1.

### Example-Based Image SR

In 2001, example-based image SR was proposed by Freeman et al. Its core idea was to learn the fine details from HR images of training datasets, and use the learned relationships between LR and HR to predict fine details of a test image. Above all, Freeman et al. employed a nonparametric patch-based prior along with the Markov random field (MRF) model to generate the desired HR images. A large dataset of HR and LR patch pairs was generated and used for seeking the nearest neighbors of the LR input patches. The selected HR patch neighbors were treated as the candidates for the target HR patch. The block diagram of the method is shown in Fig. 5. As shown in the figure, the key procedure of this method is to predict the missing high frequencies using the training datasets.

## 3.2.

### Neighbor-Embedding Based Image SR

In 2004, Chang et al. proposed a novel method for solving single-image SR problems. In this method, given an LR image as input, a set of training examples were used to recover its HR counterpart. While this formulation resembled other learning-based methods for SR, this method was inspired by manifold learning-based methods, particularly locally linear embedding (LLE). More specifically, small image patches in LR and HR images formed manifolds with similar local geometry in two distinct feature spaces. Then, multiple nearest neighbors were selected in the feature space, and SR images were reconstructed by the corresponding HR patches of the nearest neighbors. Since then, this method has been extensively applied to solving image SR problems including face hallucination.

## 3.3.

### PCA-Based Face Hallucination

In 2005, a new face hallucination method using eigen-transformation was proposed by Wang et al. In contrast to conventional methods based on probabilistic models, this method viewed face hallucination as a transformation between different image styles. PCA was used to fit the input face image as a linear combination of the LR face images in the training dataset. The HR image was rendered by replacing the LR training images with HR ones, while retaining the same combination coefficients. Since face images were well structured and had similar appearances, they spanned a small subset in the high dimensional image space. In the work of Penev and Sirovich,^{31} face images were shown to be well reconstructed by PCA representation with 300 to 500 dimensions. The system diagram of this method is shown in Fig. 6. As shown in the figure, this method first employed PCA to extract useful information as much as possible from an LR face image, and then rendered an HR face image by eigen-transformation.

## 3.4.

### Sparse Coding Based Face Hallucination

In 2008, a new approach to single-image SR based on sparse signal representation was proposed by Yang et al. This method was motivated by the image statistics that image patches could be well-represented as a sparse linear combination of elements from an appropriately chosen overcomplete dictionary. They found sparse representation for each patch of the LR input, and then used the coefficients of this representation to generate the HR output. Theoretical results from compressed sensing suggested that under mild conditions, the sparse representation could be correctly recovered from the down-sampled signals. By jointly training two dictionaries for the LR and HR image patches, they made the similarity of sparse representations between the LR and HR pairs with respect to their own dictionaries. Therefore, the sparse representation of an LR patch was applied to the reconstruction of SR images with the HR patch dictionary. The learned dictionary pair was a more compact representation of the patch pair compared to previous approaches, and simply sampled a large amount of image patch pairs reducing the computational cost effectively.

## 3.5.

### Position-Patch Based Face Hallucination

In 2010, a novel face hallucination approach was proposed by Ma et al. In contrast to most of the conventional methods based on probabilistic models or manifold learning, the position-patch based method hallucinated the HR image patch using the same position image patches of each training images. The optimal weights of the training image position-patches were estimated and the hallucinated patches were reconstructed using the same weights. The final SR face images were formed by integrating the hallucinated patches. It was able to save computational time and produce high-quality SR results compared to conventional manifold learning based methods. The position-patch based face hallucination method is briefly described in Algorithm 1.

## Algorithm 1

Position-patch based face hallucination.5

Step 1: Denote the input LR image, LR training image, and HR training image in overlapping patches as {XLP(i,j)}Np=1, {YLmP(i,j)}Np=1, and {YHmP(i,j)}Np=1, respectively, for m=1,2,…,M. |

Step 2: For each patch XLP(i,j): (a)Compute the reconstruction weights w(i,j) by least square estimation(b)Synthesize the HR patch XHP(i,j) |

Step 3: Concatenate and integrate the hallucinated HR patches to form a facial image, which is the target HR facial image {XHP(i,j)}Np=1. |

## 3.6.

### Convex-Optimization-Based Face Hallucination

Inspired by the position-patch based face hallucination method, a new convex optimization based face hallucination method is proposed. The position-patch based method has employed least square estimation to get the optimal weights for face hallucination; however, the least square estimation approach can provide biased solutions when the number of the training position-patches is much larger than the dimension of the patch. To overcome this problem, we make use of constrained convex optimization instead of least square estimation to obtain the optimal weights for face hallucination. The optimal weights ($w$) are computed by solving the following convex optimization problem:

## (3)

$$\underset{w}{\mathrm{min}}{\Vert w\Vert}_{1}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\Vert {X}_{L}^{P}-{Y}_{L}^{P}\xb7w\Vert}_{2}^{2}\le \epsilon ,$$By Eqs. (3) and (4), we can get more stable reconstruction weights for face hallucination because ${l}_{1}$-norm is more suitable for this problem, and because each patch can be approximated with a smaller subset of patches than ${l}_{2}$-norm. In contrast, ${l}_{2}$-norm provides nonzero weights for all patches. Figure 7 shows the face hallucination results by bi-cubic interpolation, example-based image SR,^{25} neighbor-embedding based image SR,^{26} position-patch based face hallucination,^{5} and convex optimization based face hallucination.^{7} We performed experiments on the CMU-PIE face database which contains 41,368 images obtained from 68 subjects. We took the frontal face images with 21 different illumination conditions. Thus, the total number of images was 1,428. Among them, 630 images of 30 subjects were used in the training stage, and the rest were used in the synthesis stage. In the neighbor-embedding method, the HR patch size of ${{Y}_{H}}^{m}$ was $12\times 12$ pixels, while the corresponding LR patch size of ${{Y}_{L}}^{m}$ was $3\times 3$ pixels. In addition, the number of the neighbor patches for reconstruction was 5. The size of the image patches in position-patch and convex optimization methods was $3\times 3$ pixels. The size of LR images for training and synthesis was $25\times 25$ pixels, while that of hallucinated results was $100\times 100$ pixels. That is, interpolation factor was 4. As shown in the figure, learning based methods generally produce better face hallucination results than traditional bicubic interpolation. Above all, the hallucinated results of Refs. 25 and 26 are somewhat blurred and with some artifacts; however, results of Refs. 5 and 7 produce more natural looking facial images. Further examination of the results reveals that Ref. 7 is more effective in preserving the edge and image details in the nose and mouth areas than Ref. 5.

For a more quantitative test, average peak-to-noise ratio (PSNR) and structural similarity (SSIM) values of the face hallucination results are provided in Table 1. The SSIM is a complementary measure of the PSNR, which gives an indication of image quality based on known characteristics of the human visual system.^{32} Here, the unit of PSNR is dB. As shown in the table, our method achieves the best hallucination performances in terms of the PSNR and SSIM. Here, the bold numbers represent the best PSNR and SSIM values.

## Table 1

Average PSNR and SSIM values of different methods.

Measure | Bicubic | Example-based (Ref. 25) | Neighbor embedding (Ref. 26) | Position-patch (Ref. 5) | Convex optimization (Ref. 7) |
---|---|---|---|---|---|

PSMR | 24.5388 | 26.0954 | 26.3758 | 28.1613 | 28.2437 |

SSIM | 0.7278 | 0.7544 | 0.7444 | 0.8146 | 0.8178 |

## 4.

## Compression Artifact Reduction

Block-based discrete cosine transform (BDCT) has been widely used in image and video compression due to its energy compacting property and relative ease of implementation.^{33}34.35.^{–}^{36} Thus, BDCT has been adopted in most image/video compression standards including JPEG (joint photographic experts group) and MPEG (motion picture experts group). However, BDCT has a major drawback, which is usually referred as blocking artifacts. Blocking artifacts appear as grid noise along the block boundaries because each block is transformed and quantized independently. Usually, the lower the bit rate is, the more serious the blocking artifacts are. Blocking artifacts occur because of the independent transform and quantization of each block without considering inter-block correlations.

## 4.1.

### Main Techniques for Image Deblocking

There are two main techniques to deal with the blocking artifacts: in-loop filtering and postprocessing methods. The in-loop filters operate within coding loop while the postprocessing methods are applied after the decoder and make use of decoded parameters. Table 2 lists the deblocking filters employed by current video coding standards.^{37} As listed in the table, in-loop filters have been optionally or not used because of the need of changing the encoder structure. Thus, postprocessing methods are promising solutions to this problem and comparable results have been achieved by researchers.

## Table 2

Deblocking filters for video coding standards.37

Standard | Deblocking filter |
---|---|

H.261 | Optional in-loop filter |

MPEG-1 | None |

MPEG-2 | None, post-processing often used |

H.263 | None |

MPEG-4 | Optional in-loop filter, post-processing suggested |

H.264 | Mandatory in-loop filter, post-processing suggested |

## 4.2.

### Postprocessing Methods For Image Deblocking

Since early 1980s, postprocessing of low bit-rate BDCT coded images has a lot of research attention. Postprocessing methods are classified into three main groups: filtering-based, denoising-based, and restoration-based methods.^{10}

First, some researchers viewed the distortions around the block boundaries as spatial, high-frequency components. Thus, many filtering-based methods have been proposed to reduce them. In 1984, Lim and Reeve^{38} first applied low-pass filtering to the pixels along the boundary to remove the blocking artifacts. Then, in 1986, Ramamurthi and Gersho^{39} proposed a nonlinear space-variant filter to perform filtering in parallel with the edges. Since then, many filtering-based methods have been presented, and the representative work is the adaptive deblocking filter, which has been used in the H.264/MPEG-4 advanced video coding (AVC) standards to reduce the distortions.^{40}

Second, some researchers viewed deblocking as a denoising problem. They proposed some efficient noise models and some deblocking methods based on the wavelet technique. In 1997, Xiong et al.^{41} exploited cross-scale correlation by the overcomplete wavelet transform, and used the thresholds to reduce the distortions. In 2004, Liew and Yan^{34} made a theoretical analysis of the blocking artifacts, and used the three-scale overcomplete wavelet scheme to reduce them.

Third, many researchers viewed deblocking as a restoration problem, and proposed restoration-based deblocking methods. The POCS-based method was a representative approach of the restoration-based methods for deblocking.^{42} In the POCS-based methods, prior information was represented as convex sets for reconstruction, and blocking artifacts were reduced by iteration procedures. The POCS based methods were very effective for reducing blocking artifacts because they were easy to impose smoothness constraint around block boundaries. In 2003, Kim et al.^{11} proposed a new smoothness constraint set (SCS) and an improved QCS to improve performances of the POCS-based methods. Furthermore, the TV-based methods were actively studied for image deblocking. TV provided an effective criterion for image restoration, and thus could be successfully used as prior information for image deblocking.^{13}^{,}^{43} In 2004, Alter et al. proposed a constrained TV minimization method to reduce blocking artifacts without removing perceptual features. In 2010, a human visual system (HVS)-based TV method using a new weighted regularization parameter was proposed by Do et al.^{44} In 2007, a FoE prior^{45}^{,}^{46} was successfully applied to image deblocking by Sun and Cham.^{10} In this method, the image deblocking problem was solved by the MAP estimation, based on the FOE prior. In addition, they employed the narrow quantization constraint set (NQCS) for further PSNR gain.^{47} Consequently, this method achieved a high PSNR gain and produced state-of-the-art results on deblocking.

## 4.3.

### Sparse Representation Based Image Deblocking

Recently, sparse representation has been actively studied to solve various restoration problems in image processing.^{48}49.50.51.^{–}^{52} Some researchers have made significant contributions to image denoising, restoration and SR using sparse representation. Sparse representation assumes that original signals can be accurately recovered by several elementary signals called atoms.^{50}^{,}^{53} Thus, it has been proven very effective for image restoration tasks. Inspired by recent results of sparse representation, we provided a novel deblocking method based on sparse representation.^{48} To remove blocking artifacts, we obtain a general dictionary from a set of training images using K-singular value decomposition (K-SVD) algorithm, which can effectively describe the content of an image. Then, an error threshold for orthogonal matching pursuit (OMP) is automatically estimated to use the dictionary for image deblocking by the quality of compressed image. Our deblocking method is comprised of two main procedures: generation of a deblocking dictionary using K-SVD algorithm, and image deblocking by the deblocking dictionary. That is, the deblocking dictionary is generated in the training stage, and blocking artifact reduction is performed in the testing stage.

## 4.3.1.

#### Deblocking dictionary design using K-SVD algorithm

In the training stage, image patches are selected to generate a dictionary for image deblocking. From the image patches, a deblocking dictionary is trained by the K-SVD algorithm. Here, to solve the optimization problem, the batch-OMP method is used.^{54} The K-SVD algorithm is an iterative method to generate an overcomplete dictionary that fits training examples well. It is simple and designed to be truly direct generalization of the K-Means algorithm.^{52}53.54.55.^{–}^{56} In general, it alternates between sparse coding and dictionary update while training.

Let $\overline{\mathbf{X}}=[{x}_{1};\dots ;{x}_{p}]$ be an $n\times P$ matrix of $P$ training patches of $n$-length pixels, used to train an overcomplete dictionary $\mathbf{D}$ of size $n\times K$ with $P\gg K$ and $K>n$. For generating $\mathbf{D}$, the objective function of the K-SVD algorithm is defined as follows:^{55}^{,}^{57}

## (5)

$$\underset{\mathbf{D},\mathbf{\Theta}}{\mathrm{min}}{\Vert \overline{\mathbf{X}}-\mathbf{D}\xb7\mathbf{\Theta}\Vert}_{\mathrm{F}}^{2}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\Vert {\theta}_{i}\Vert}_{0}\le S,$$## Algorithm 2

Dictionary generation by the K-SVD algorithm.

Step 1: Initialize a dictionary D (an overcomplete DCT dictionary) |

Step 2: Repeat n times (n: number of training iterations) a)Sparse coding stage: compute θi using OMP for i=1,2,…,P minD,Θ‖X¯−D·Θ‖F2subject to ‖θi‖0≤Sb)Dictionary update stage: update the dictionary atom dk and coefficient θk for k=1,2,…,K b-1)Obtain the set of all indices corresponding to the training patches that use dk and θk.b-2)Compute the matrix of residuals Ek: Ek=X¯−∑j≠kdj−θjb-3)Restrict Ek by selecting only the columns corresponding to those elements that initially used dk in their representation, and obtain EkR.b-4)Apply SVD decomposition EkR=UΔVT, and update dk=u1, θkR=Δ(1,1)·v1 where Δ(1,1) is the largest singular value of EkR; and u1 and v1 are the corresponding left and right singular vectors, respectively. |

## 4.3.2.

#### Automatic estimation of error threshold

The deblocking dictionary $\mathbf{D}$ is employed to reduce blocking artifacts. The objective function for image deblocking is as follows:

## (6)

$$\underset{\mathbf{\Theta}}{\mathrm{min}}{\Vert \mathbf{\Theta}\Vert}_{1}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\Vert \mathbf{Y}-\mathbf{D}\xb7\mathbf{\Theta}\Vert}_{2}\le T,$$First, the standard deviation of the quantization noise, ${\sigma}_{N}$, is estimated as shown in Fig. 8. Since the blocking artifacts mostly occur around the block boundaries, ${\sigma}_{N}$ is computed from the intensity difference *Diff* between two boundary pixels on both sides of a boundary between two blocks as follows:

*Diff*is the absolute value of one-half the intensity difference between two pixels, ${s}_{1}$ and ${s}_{2}$. In computing

*Diff*, only horizontal or vertical block discontinuities are considered as mentioned in Ref. 34. In the figure, pixels ${s}_{1}$ and ${s}_{2}$ belong to ${\text{Block}}_{1}$ and ${\text{Block}}_{2}$, respectively; $I(s)$ is the intensity of a pixel $s$. Accordingly, we compute ${\sigma}_{N}$ of the compressed blocky image from

*Diff*.

Then, $T$ is computed based on ${\sigma}_{N}$. In the previous works for image denoising,^{34}^{,}^{53}^{,}^{57} ${T}_{\mathrm{old}}$ is obtained by the following equation:

Here, the noise gain $C$ is set to 1.15. In the JPEG coding standard, the most important parameter is the quality $q$, which contains a value between 0 and 100. The higher $q$ is, the less image degradation due to compression is; however, when $q$ is high, the resulting file size is large. For image deblocking, we found that ${T}_{\mathrm{old}}$ fits well when ${T}_{\mathrm{old}}$ is only 10 by various experiments. In other cases, ${T}_{\mathrm{old}}$ do not follow the distribution of the error threshold $T$ of Eqs. (6) by (8). Instead, we found that ${T}_{\mathrm{new}}/{T}_{\mathrm{old}}$ follows nonlinear distribution according to a given quality $q$ as shown in Fig. 9. Thus, we modify Eq. (8) as follows:

## (9)

$${T}_{\mathrm{new}}={T}_{\mathrm{old}}\xb7(\frac{a}{q+b}+c)=C\xb7{\sigma}_{N}\xb7(\frac{a}{q+b}+c),$$As shown in Fig. 10, six typical images were used for the tests, *Barbara*, *Lena*, *Boat*, *Peppers*, *Baboon*, and *Fruits*, whose sizes were $512\times 512$ pixels. In the training stage, total 91 natural images provided by the Yang et al.’s work^{51} were used to generate a general dictionary. Dictionary size and all parameters including $C$, $a$, $b$, and $c$ of Eqs. (8) and (9) are determined on the training data set. In addition, the dictionary was trained from randomly sampled 100,000-image patches using K-SVD, i.e., the size of each patch is $8\times 8$ pixels. Thus, the size of the training data was $64\times \mathrm{100,000}$ pixels. We performed the experiments until $q$ was 20 because the blocking effects mainly occur when $q$ was from 0 to 20.^{36} The dictionary with the 512 atoms is used in our experiments. Figure 11 shows the generated dictionary from the training data. Figures 12 and 13 show the JPEG compressed images and their deblocked results of the *Barbara* and *Baboon* images, respectively, according to different quality values, i.e., $q$ is 1, 5, 10, 15, or 20. It can be observed that the lower $q$ is, the more blocking artifacts occur along block boundaries in the compressed images. This is because transform coefficients of blocks are quantized independently in BDCT based image compression. As can be seen in (a)-(e) of the figures, the blocking artifacts are degrading the quality of picture seriously. In addition, the blocking artifacts are remarkably reduced as the quality increases. In the figures, (f)-(j) show the reduction results of the blocking artifacts by the proposed method. It can be observed that the proposed method suppresses the blocking artifacts efficiently and improves the picture quality, especially along block boundaries where the block discontinuities are severe.

To provide more reliable performance evaluation of the results, we compare our method with the latest state-of-the-art one which is based on the FoE prior.^{10} It has been reported that the method has achieved the best deblocked results in terms of PSNR. As evaluation metrics, the PSNR and SSIM are considered to measure the quality of the estimated images. To simulate various types of BDCT compression, three quantization tables, usually denoted as Q1, Q2, and Q3, have been commonly used by many researchers.^{10}^{,}^{34} The Q1, Q2, and Q3 tables correspond to a medium to high compression level, similar to what can be obtained by using JPEG with $q=11$, $q=9$, and $q=5$, respectively.^{9} Accordingly, in our experiments, the values of $q$ are used instead of the quantization tables when the performance of our method is evaluated because our method is based on the quality information. Table 3 lists the PSNR and SSIM values of the deblocked results obtained by the FoE prior-based method and ours. In the FOE prior-based method,^{10} the FoE prior captures the statistics of natural images, and thus, has been effectively employed for image denoising and inpainting.^{45}^{,}^{46} The FOE prior has been successfully applied to deblocking of BDCT compressed images.^{10} We have obtained the corresponding software for evaluation at http://www.cs.brown.edu/ dqsun/research/software.html. In the experiments, the FoE filter size is $5\times 5$ pixels and the maximum number of iterations is 200. In the FoE prior-based method,^{10} the narrow quantization constraint set (NQCS)^{47} have been used for the higher PSNR gain of deblocked results, and thus we also report the improved PSNR values by NQCS (see the 7th column). Combined with the NQCS method,^{47} our method generally achieves the best PSNR and SSIM results about the test images. In the table, the bold numbers represent the best PSNR and SSIM values of each image at each quality.

## Table 3

Performance evaluation results from test images using the proposed and FoE prior-based methods.a

Image | Quality | Metric | JPEG | FoE-based method (Ref. 10) | Our method | Ours+NQCS (Ref. 47) |
---|---|---|---|---|---|---|

Barbara | q=11 | PSNR | 26.0311 | 26.7018 | 26.8194 | 26.9108 |

SSIM | 0.7761 | 0.7998 | 0.7966 | 0.8081 | ||

q=9 | PSNR | 25.5054 | 26.2071 | 26.3213 | 26.4032 | |

SSIM | 0.7466 | 0.7780 | 0.7732 | 0.7846 | ||

q=5 | PSNR | 24.0165 | 24.4042 | 24.981 | 25.0092 | |

SSIM | 0.6579 | 0.6751 | 0.7121 | 0.7172 | ||

Lena | q=11 | PSNR | 30.7633 | 31.9666 | 31.9513 | 31.9696 |

SSIM | 0.8271 | 0.8626 | 0.8627 | 0.8641 | ||

q=9 | PSNR | 29.9766 | 31.3018 | 31.2704 | 31.2902 | |

SSIM | 0.8069 | 0.8515 | 0.8506 | 0.8521 | ||

q=5 | PSNR | 27.319 | 27.7019 | 28.8602 | 28.8650 | |

SSIM | 0.7394 | 0.7620 | 0.8065 | 0.8070 | ||

Boat | q=11 | PSNR | 28.4561 | 29.4076 | 29.3438 | 29.3956 |

SSIM | 0.77 | 0.7979 | 0.7937 | 0.7997 | ||

q=9 | PSNR | 27.7544 | 28.7647 | 28.6988 | 28.7522 | |

SSIM | 0.7441 | 0.7789 | 0.7721 | 0.7794 | ||

q=5 | PSNR | 25.4801 | 25.8192 | 26.6330 | 26.6683 | |

SSIM | 0.6514 | 0.6708 | 0.6976 | 0.7030 | ||

Peppers | q=11 | PSNR | 30.7451 | 32.0341 | 31.8679 | 31.8787 |

SSIM | 0.7951 | 0.8356 | 0.8324 | 0.8328 | ||

q=9 | PSNR | 30.011 | 31.4735 | 31.3072 | 31.3119 | |

SSIM | 0.7761 | 0.8276 | 0.8239 | 0.8238 | ||

q=5 | PSNR | 27.4385 | 27.8965 | 29.1197 | 29.1138 | |

SSIM | 0.7074 | 0.7339 | 0.7864 | 0.7854 | ||

Baboon | q=11 | PSNR | 24.5851 | 24.9784 | 25.0323 | 25.0877 |

SSIM | 0.6891 | 0.6833 | 0.6789 | 0.6949 | ||

q=9 | PSNR | 24.048 | 24.4971 | 24.5315 | 24.5883 | |

SSIM | 0.6535 | 0.6517 | 0.6424 | 0.6604 | ||

q=5 | PSNR | 22.3936 | 22.5909 | 23.0026 | 23.0460 | |

SSIM | 0.5245 | 0.5356 | 0.5217 | 0.5389 | ||

Fruits | q=11 | PSNR | 30.1973 | 31.4000 | 31.3322 | 31.3977 |

SSIM | 0.7961 | 0.8391 | 0.8378 | 0.8414 | ||

q=9 | PSNR | 29.4625 | 30.7641 | 30.7147 | 30.7725 | |

SSIM | 0.7758 | 0.8275 | 0.8262 | 0.8294 | ||

q=5 | PSNR | 27.0479 | 27.5133 | 28.5934 | 28.623 | |

SSIM | 0.7043 | 0.7297 | 0.7819 | 0.7829 |

## a

In the FoE prior-based method,10 the results combined with the NQCS method47 are reported. The bold numbers represent the best PSNR and SSIM values of each image at each quality. The unit of PSNR is dB.

## 5.

## Practical Considerations for Mobile Applications

Currently, high-end mobile phones, which are usually referred to as smartphones, support multiple radio standards and a rich suite of applications including advanced radio, audio, video, and graphics processing. They provide more advanced computing ability and connectivity than contemporary feature phones using multiple chips such as a baseband processor and an application processor. Moreover, it is expected that new functionalities are being added to smartphones at an increasing rate; however, the increases in battery capacity have not matched increases in functionality.^{58}59.60.61.^{–}^{62} In fact, battery capacities have not been growing more than 10%every year, whereas the number of features and applications.^{59} Thus, the needs for low power and high performance are growing at a significantly higher rate. As listed in Table 4, the present workload of a 3.5 G smartphone amounts to nearly 100 giga operations per second (GOPS). This workload increases at a steady rate, roughly by an order of magnitude every 5 years. The workload is partitioned by application processing, radio processing, media processing, and 3D graphics. Among them, about 60% of the workload is used for radio and application processing. More than 30% of the workload is assigned to media processing including the functions such as display processing, camera processing, video decoding, and encoding. Here, video encoding requires the most amount of operations, i.e., 17 GOPS. In the workload for media processing, 10 GOPS is available, and thus two new functions (e.g., face hallucination and image deblocking) can be realized using it. Recently, the multicore architecture for mobile applications has been proposed to support a workload of 100 GOPS with 1 W.^{58} We believe the multicore architecture can be effectively employed for implementing the new functions.

## Table 4

Mobile phone trends in 5-year intervals.58

Year | 1995 | 2000 | 2005 | 2010 | 2015 |
---|---|---|---|---|---|

Cellular generation | 2 G | 2.5-3 G | 3.5 G | Pre-4 G | 4 G |

Cellular standards | GSM | GPRSUMTS | HSPA | HSPALTE | LTELTE-A |

Downlink bitrate (Mb/s) | 0.01 | 0.1 | 1 | 10 | 100 |

Battery capacity (Wh) | 1 | 2 | 3 | 4 | 5 |

Phone CPU Clock (MHz) | 20 | 100 | 200 | 500 | 1000 |

Phone CPU Power (W) | 0.05 | 0.05 | 0.1 | 0.2 | 0.3 |

Workload (GOPS) | 0.1 | 1 | 10 | 100 | 1000 |

#Programmable cores | 1 | 2 | 4 | 8 | 16 |

Another way to implement them is to use the graphics processing units (GPU)-based parallelization technology. Fortunately, due to the strong computational locality of video processing algorithms, video processing is highly amenable to parallel processing. Such locality makes it possible to divide video processing tasks into smaller, weakly interacting pieces for parallel computing.^{63} The GPU-based parallelization technology drastically reduces the amount of operations, and thus, effective parallel architectures and programming also can be used to implement the new functions for mobile applications.

## 6.

## Conclusions

In this article, we provided two core technologies for high-quality image communications from the point of view of image processing: face hallucination and compression artifact reduction. The technologies have a close relation to inverse problems in image processing, and thus, we have described recent studies and our related research results to deal with the inverse problems effectively. When image data are transmitted over mobile communication networks, data loss inevitably occurs in the high frequency components of images because of lossy compression techniques. Thus, the quality of facial regions (i.e., main interests of image communications) is reduced and several compression artifacts inevitably occur. We have demonstrated that convex optimization and sparse representation can be effectively employed for solving the inverse problems and achieving high-quality image communications. In addition, to implement the technologies in actual mobile devices, power management is a critical issue due to the limited capacity of batteries. Therefore, this article also discusses practical considerations and possible solutions to implement two technologies in mobile applications.

Nowadays, displays of many different sizes, including mobile displays, have come into wide use. They also have the same problems of high-quality image reconstruction. We believe the two technologies can be effectively employed for enhancing image quality in various displays.

## Acknowledgments

The authors would like to thank all the anonymous reviewers for their valuable comments and useful suggestions on this paper. This work was supported by the National Natural Science Foundation of China (Nos. 61050110144, 60803097, 60972148, 60971128, 60970066, 61072106, 61075041, 61003198, 61001206, and 61077009), the National Research Foundation for the Doctoral Program of Higher Education of China (No. 200807010003 and 20100203120005), the National Science and Technology Ministry of China (Nos. 9140A07011810DZ0107 and 9140A07021010DZ0131), the Key Project of Ministry of Education of China (No. 108115), and the Fundamental Research Funds for the Central Universities (Nos. JY10000902001, K50510020001, and JY10000902045).

## References

## Biography

**Cheolkon Jung** received the BS, MS, and PhD degrees in electronic engineering from Sungkyunkwan University, Republic of Korea, in 1995, 1997, and 2002, respectively. He is currently a professor at Xidian University, China. His main research interests include computer vision, pattern recognition, image and video processing, multimedia content analysis and management, and 3D TV.

**Licheng Jiao** received the BS degree from Shanghai Jiao Tong University, China, in 1982, and the MS and PhD degrees from Xian Jiao Tong University, China, in 1984 and 1990, respectively. From 1990 to 1991, he was a postdoctoral fellow in the National Key Lab for Radar Signal Processing at Xidian University, China. Since 1992, he has been with the School of Electronic Engineering at Xidian University, China, where he is currently a distinguished professor. He is the dean of the School of Electronic Engineering and the Institute of Intelligent Information Processing at Xidian University, China. His current research interests include signal and image processing, nonlinear circuit and systems theory, learning theory and algorithms, computational vision, computational neuroscience, optimization problems, wavelet theory, and data mining.

**Bing Liu** received the BS degree in electronic engineering from Henan Polytechnic University, China, in 2009. He is currently pursuing the MS degree in Xidian University, China. His research interests include image processing and machine learning.