Single-image superresolution based on local regression and nonlocal self-similarity

Abstract. The challenge of learning-based superresolution (SR) is to predict the relationships between low-resolution (LR) patches and their corresponding high-resolution (HR) patches. By learning such relationships from external training images, the existing learning-based SR approaches are often affected by the relevance between the training data and the LR input image. Therefore, we propose a single-image SR method that learns the LR-HR relations from the given LR image itself instead of any external images. Both the local regression model and nonlocal patch redundancy are exploited in the proposed method. The local regression model is employed to derive the mapping functions between self-LR-HR example patches, and the nonlocal self-similarity gives rise to a high-order derivative estimation of the derived mapping function. Moreover, to fully exploit the multiscale similarities inside the LR input image, we accumulate the previous reconstruction results and their corresponding LR versions as additional example patches for the subsequent estimation process, and adopt a gradual magnification scheme to achieve the desired zooming size step by step. Extensive experiments on benchmark images have validated the effectiveness of the proposed method. Compared to other state-of-the-art SR approaches, the proposed method provides photorealistic HR images with sharp edges.


Introduction
Image superresolution (SR) aims to generate a highresolution (HR) image from a single image or a set of lowresolution (LR) images.With the fast emergence of high-definition displays at consumer electronics market, such as Nexus 7 (1920 × 1200) and iPad3 (2048 × 1536), there is a great demand to super-resolve existing LR images, so that they can be enjoyably viewed on such readily available HR devices without any visual artifacts.Therefore, the SR problem has gained popularity in image processing community in recent years.Ever since the pioneer work proposed by Huang and Tsai in 1984, 1 many improvements have been made in this particular research field.Generally, SR methods that are implemented in spatial domain can be roughly categorized into three classes: the interpolation-based methods, [2][3][4][5][6][7][8] the reconstruction-based methods, [9][10][11] and the learning-based methods, [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]28,29,30 each of which has its own distinctive advantages, prior assumptions, and requirements for additional information. For a coprehensive literature survey, the interested reader is referred to Refs. 31 and 32.The simplest interpolation-based SR methods, such as bicubic and Lanczos, use linear interpolation kernels to predict unknown pixels in a higher grid.Because these linear kernels are usually derived under the assumption that natural images are either spatially smooth or band-limited, the interpolationbased SR methods tend to produce smooth HR images with several visual artifacts, such as ringing, aliasing, blocking, and blurry.2,4 In recent years, some sophisticated methods, [2][3][4][5][6][7][8] in which the kernel coefficients are locally adaptive, have been proposed to reduce these unwanted artifacts.For instance, the regression model used in Refs.5-7 and the adaptive normalized convolution technique used in Ref. 8 are very helpful to approximate local structures and obtain somewhat sharp edges.
The reconstruction-based SR methods regard the SR problem as an inverse imaging procedure.Moreover, considering that far more pixels than the number given need to be determined, this inverse problem is intrinsically ill-posed.To cope with this, certain priors are incorporated.For example, the gradient profile prior 10 and the soft edge smooth prior 11 are commonly used for yielding sharp edges.Besides the prior knowledge, the embedded back-projection policy guarantees the consistency between the resultant HR and the LR input images.In general, methods in this category perform better than the interpolation-based methods, but it is still difficult to recover high-frequency (HF) details.Besides, Lin and Shum 33 have validated that only under a 2× magnification, can such reconstruction-based methods effectively generate fine details, which, of course, greatly limits their applications.
To exceed the limits of the reconstruction-based SR approaches, the learning-based SR methods aim at estimating the HF details that are not explicitly found in the LR input images.In the pioneer work proposed by Freeman et al., 13 these lost HF details are learned from a set of universal LR-HR patch pairs.Several extensions [14][15][16][17][18][19][20][21] have been proposed thus far to better predict the correspondences between LR-HR patches from varieties of training images.Although their successful SR capabilities have been substantiated by extensive experimental results, several researchers 10,[22][23][24] have pointed out that false HF details are very likely to be introduced if the LR input image is incompatible with the training images.Therefore, many self-learning algorithms [22][23][24][25][26][27]29 have been proposed recently to use the LR input image itself as the only source of LR-HR example patches. Suc approaches are all based on the observation that the small patches tend to redundantly repeat themselves many times within the original image scale 34 as well as across different image scales.[22][23][24]27,35 Compared to SR approaches using an external database, these self-learning approaches are able to exploit more relevant examples.22,24 In this paper, we propose a self-learning SR method by taking advantage of local structural information and nonlocal patch redundancy (self-similarity) property [22][23][24]27,35 in natural images.First, a set of self LR-HR patch pairs is formed by splitting the LR input image and its smoothened version into overlapped patches.Based on these example patch pairs, a regression model is established to derive the mapping functions that characterize relationships between the LR-HR patches.By exploiting the multiscale patch redundancy in LR image, we successfully obtain the high-order derivation estimation as well as the approximation error of each LR-HR mapping function.Moreover, since cross-scale selfsimilarity holds better for small scaling factors, 23 we adopt a gradual up-sampling scheme, and accumulate the previous reconstructed HR images and their corresponding LR versions for succeeding estimation.
Contributions of this paper are twofold: 1. Reliable SR reconstruction.In our scheme, both local structural information and nonlocal self-similarity are intimately related in a unified framework.With the aid of the patch redundancy from multiple image scales, we give a closed-form solution to a high-order derivative estimation of the LR-HR mapping function.2. Adaptive structure preservation.Since the mapping function, from low to high resolution grid, is particularly specified for each patch in the initial LR image, the local structure is therefore implicitly preserved.Besides, the high-order derivative estimation also leads to the preservation of complex structures, such as edge orientations (first-order derivatives) and curvatures (second-order derivatives).
The remainder of this paper is organized as follows.The next section gives a brief overview of related work on local regression model and nonlocal similarity in self-learning SR approaches, respectively.Section 3 details the proposed algorithm.Experimental results and comparisons are demonstrated in Sec. 4. Finally, Sec. 5 concludes this paper.

Local Regression Model
The local regression model relates two dependent measurements, y i and x i , via an unspecified regression function fð•Þ (1) in which ε i is the estimation error and ideally ε i → 0 under the smoothness assumption.One typical application for this model is to predict the function value fðxÞ at any point x from the sample pairs ðx i ; y i Þ i ¼ 1; • • • ; J. Without any specific assumptions on the samples or regularization terms, this prediction model is usually solved by an N-term Taylor series, which approximates the observation y i as follows: where f 0 ð•Þ and f ðNÞ ð•Þ denote the first and Nth derivatives of the regression function.Because Eq. ( 2) is derived from a local signal representation of the regression model, it is reasonable to estimate the unknown coefficients ff ðnÞ ðxÞg N n¼0 by using the samples in the vicinity.With respect to this notion, Eq. ( 2) is reformulated as a weighted least-squares optimization problem where J is the number of samples that are near x, and Eq. ( 3) is then expressed in the following matrix form: The function diagð•Þ defines a diagonal operator that for a vector A ¼ ½a 1 ; a 2 ; • • • ; a J T .The solution of Eq. ( 4) is easily given by the weighted least-squares estimation b ¼ ðX T WXÞ −1 X T Wy: (5) The approaches in Refs.5-7 and 28 have validated the effectiveness of the local regression model for recovering complex structures in the SR problem by achieving higherorder derivative approximations (N ≥ 1), 5 i.e., the first-order derivatives correspond to edge orientations and the secondorder derivatives are related to curvatures.

Nonlocal Similarity in Self-Learning
27,35 In particular, Glasner et al. 22 have unified the reconstruction-based SR and the self-learning SR in the same framework, where the same-scale patch recurrence forms the constraints used in reconstruction-based SR and the across-scale patch recurrence generates exemplar LR-HR patch pairs used in the learning-based SR.By confirming that the local self-similarity assumption holds better for small zooming factors, Freedman and Fattal 23 have performed multiple magnification steps of small zooming factors to achieve the desired magnification size, wherein the example patch pairs were extracted from the LR input image and its smoothed version.Both Glasner's and Freedman's works have confirmed the effectiveness of the internal patch recurrence in selflearning SR approaches.Further, Zontak and Irani 24 have shown that the image statistic that is derived from the internal patch recurrence is more predictive and powerful than the statistic derived from the external images.Accordingly, this powerful internal statistic is adopted by several selflearning SR approaches [25][26][27] as an effective regularization term.For example, by introducing the nonlocal similarity constraint, the nonlocal autoregression model (NARM) 25 has ensured the incoherence and sparsity of representation dictionary and consequently made the sparse representation model more suitable for the SR problem.Moreover, under the maximum a posteriori probability framework, Zhang et al. 26 have successfully assembled both the nonlocal prior that is obtained from the self-similarity in the same scale and the local prior that is derived from a specific regression model 6 to regularize the ill-posed SR problem.Later on, they have extended to multiscale self-similarity for a better image detail synthesis. 27Proposed Approach The proposed approach is based on a local regression model and the multiscale nonlocal self-similarity, which will be described respectively in Secs.3.1 and 3.2.Further, an adaptive patch processing strategy is provided in Sec.3.3 to reduce the time consumption.
In the following, we use matrices I and H to represent the LR input and the HR output images; L n is the smoothed version of I, and L nþ1 is an enlarged version of I.The bolded lowercase p and q represent the column vectors of two s × s image patches that are extracted from I and H, respectively; m and n denote the column vectors of two s × s image patches that are extracted from L n and L nþ1 , respectively.Without specific notification, we assume that p, q, m, and n are related.That is, m is the most similar patch to n among all other patches in image L n ; patches mðnÞ and pðqÞ have the same coordinate for the center pixel, as presented by the dashed lines in Fig. 1.Given that image I has more HF details than L n , fm; pg constitutes the self-exemplar LR-HR patch pairs in the proposed method.In addition, because L nþ1 lacks the HF details in H, fn; qg is another group of LR-HR patch pairs.

Local Patch Representation Based on a Local Regression Model
As mentioned above, the task of learning-based SR approaches is to find the relations between LR-HR patches.In the proposed method, this relation is interpreted as an unspecified mapping function that associates each LR-HR patch pair fn; qg in the following way q ¼ fðnÞ.The assumption that m and n are similar patches can also be interpreted as that m is "near" n in terms of geometric layout.Therefore, as in Sec.2.1, this mapping function f can also be estimated via Taylor series where ⊙ denotes the Hadamard product (also known as element-wise product).By neglecting noise and blurry in LR image, we come to p ¼ fðmÞ (which is a very basic assumption in several self-learning based SR approaches, such as Refs.23 and 28), Eq. ( 6) is therefore reformulated as where r m represents the residual error.Although m and n are both taken from smooth images, it is agreed that n comparably contains more HF details than m does, 23 and therefore n − m leaves only the HF information. 13,23,28In view of this, Eq. ( 7), in fact, generates HR patches by imposing HF details to the related LR patches, which is actually the basic scheme of learning-based SR approaches.From Eq. ( 7), we have to infer f 0 ðmÞ and f 0 0 ðmÞ first so as to obtain each HR patch q.It is worth noting that for many existing SR approaches, only the first-order derivative is inferred.For example, in Refs.13 and 23, the first-order derivative is assumed to be a constant, whereas in Ref. 28, this first-order derivative is estimated from an external database.However, it is generally agreed that only using the firstorder derivative is insufficient to recover complex structures, such as curvatures; moreover, using external images is highly probable to introduce unfaithful image details.In the next subsection, we will present how to extend the first-order derivative estimation to the second order by exploiting the multiscale self-similarity property in a given LR image, and also provide a closed-form solution to this second-order estimation.

Derivative Estimation Based on Multiscale Local
Self-Similarity According to nonlocal patch redundancy, each patch is very likely to have many "neighbors," which are geometrically similar to it but are spatially far from it.Suppose m i is a similar patch to m in image L n , as in Eq. ( 6), fðm i Þ can be approximated as follows: The column vector p i is m i 's paired HR patch in image I, and the column vector ξ i also refers to the approximation error as r m does in Eq. ( 7).As in Sec.2.1, by incorporating the Jmost similar patches fm i g J i¼1 and their paired HR patches fp i g J i¼1 , Eq. ( 8) can be transformed into the following optimization problem: where w h ðm i − mÞ ¼ expð−km i − mk 2 2 ∕2h 2 Þ measures the similarity between patches m and m i .Similar to Eq. (3), the above optimization problem could be easily solved by the weighted least-squares estimation that b ¼ ½f 0 ðmÞ; f 0 0 ðmÞ T ¼ ðX T WXÞ −1 X T Wy as shown in Eq. (5).We should remind the reader that unlike Eq. ( 3), in which all the variables are scalars, the variables inside k • k 2 2 in Eq. ( 9) are all in the column vector forms.Thus, we should redefine the specific formation of X, W, and y by taking account of the vector forms and the Hadamard product in Eq. ( 9).In the proposed method, these three variables are devised as follows: where 1 is an s 2 × 1 column vector with all elements equal to one, and y is of size Js 2 × 1 that cascades each column vector After plugging the least-squares solution of Eq. ( 9) back into Eq.( 8), the residual error ξ i is easily obtained.Further, by simply averaging all the ξ i i ¼ 1; • • • J can the approximation error r m in Eq. ( 7) be also obtained.In summary, we not only provide a closed-form solution of the derivative estimation, but also approximate the residual error that is caused by omitting higher-order derivative terms.
Intuitively, for the derivative estimation to be reliable, the number of independent equations in Eq. ( 9) must exceed the number of unknown derivative coefficients.With the assumption that f 0 ðmÞ and f 0 0 ðmÞ are not constant within the patch m (it is a reasonable assumption, for example, in an edge patch, the pixels across this edge are very likely to have larger first-order derivative than pixels along it), the number of unknown derivative coefficients is then related to the patch size and the derivative order.For instance, if a patch contains s × s pixels and a second-order derivative estimation is required, then the derivative coefficients are up to 2s 2 , implying that the demanded number of equations or the number of similar patches m i should be above 2s 2 .Naturally, one question arises: are we able to find such enough similar patches in a single image?A statistical analysis made in Ref. 22 shows that the smooth patches recur much more frequently than the detailed patches do.However, in our case, we have to find the same patch numbers to ensure a reliable second-order derivative estimation, no matter whether these patches are highly detailed or not.Recently, Zontak and Irani 24 have shown that the patches with different gradient magnitudes need to search at different distance to ensure the same number of similar patches, and have also provided an empirical function.According to this function, we surprisingly find that the search region should be even beyond the image size for some highly textured patches so as to get a reliable second-order derivative estimation.This reveals that for highly textured patches, search for their similar patches in the same image scale is very insufficient.
To overcome this insufficient patch problem, we extend the same-scale local search to multiple scales.More specifically, based on the gradual magnification scheme, not only the down-scaled images of the current image resolution, but also all the images from the previous estimates, including the initial input image, are employed to find the similar patches.Take a two-iteration estimate for example.In the first iteration, we respectively search the d2ðs 2 þ 1Þ∕3e similar patches in L 0 n and its two down-scaled images to estimate f 0 ðmÞ and f 0 0 ðmÞ (these two down-scaled images of L 0 n are obtained by down sampling with the bicubic interpolation under scale 0.9 sd , sd ¼ 1, 2).After that, the intermediate HR image L 1 is generated by merging all the reconstructed HR patches according to Eq. ( 7), with a fusion of the overlapping regions between the adjacent patches.In the second iteration, we search the dðs 2 þ 1Þ∕2e similar patches, respectively, in L 0 n , L 1 n and the two down-scaled versions of L 1 n to estimate the second derivatives.Subsequently, we obtain the second intermediate HR image L 2 , and generate the HR output image by resizing L 2 to the desired size using the bicubic interpolation.We can see that for the derivative estimation in higher resolution levels, the required number of similar patches with respect to each image in the training set is reduced because more images in lower resolutions can be utilized.
Although incorporating multiscale images guarantees a reliable derivative estimation, the additional exhaustive search for similar patches in other images would inevitably increase time complexity.Therefore, similar patches in coarser scales and lower resolutions are selected at the "relative" coordinates.For instance, in the second iteration process of the previous example, if the similar patch m i found in L 1 n is centered at ðj; kÞ, then the additional similar patch m ds i in the down-scaled image of L 1 n is centered at (bj × 0.9 sd cbk × 0.9 sd c), where sd ¼ 1; 2 denotes the downscale level, and the similar patch m ls i in the lower-resolution image L 0 n is centered at (bj∕tcbk∕tc), where t > 1 denotes the gradual up-scaling factor.

Computational Speed-Ups
In Sec.3.1, we translate the estimation of the relations between LR-HR patches to the estimation of the regression function derivatives by leveraging a local regression model.In Sec.3.2, we further adopt the nonlocal similarity across multiple image scales to provide a closed-form solution.
In this subsection, an adaptive patch processing strategy is introduced to reduce the time consumption.
Intuitively, there is no need to estimate the high-order derivatives for smooth regions since their intensity values are nearly constant.Moreover, since natural images generally contain much more smooth regions than the detailed ones, 22,28,36 it is efficient to process different patch types (smoothed or detailed) in different ways.To discriminate the textured areas from the flat areas, an effective pixel classification criterion 37 TM is adopted, which provides a quantitative assessment of image geometric structures using the structure tensor and the intensity standard deviation of each patch.More specifically, TM ¼ ½ðs Algorithm 1 The proposed method.
Input: LR image I, patch size s × s, overlapping step o, search region size K × K , gradual up-scaling factor t , and desired up-scaling factor r .
Output: HR image H.
(2) Iteration: (e) For each patch m in Q iter : if its center pixel's TM is lower than T 2 , assign f 0 ðmÞ ¼ 0, f 0 0 ðmÞ ¼ 0, and r m ¼ 0; if its center pixel's TM is larger than T 1 , search a total number of dð2s 2 þ 2Þ∕ðiter þ 3Þe × ðiter þ 3Þ K-NN neighbors in L iter n , the two down-scaled versions of L iter n , and the lower-resolution images fL i n g iter−1 i¼0 , after that, these similar patches and their paired HR patches are used to compute f 0 ðmÞ, f 0 0 ðmÞ, and ξ i according to Eqs. (10), (5), and (8), and r m is obtained by averaging all ξ i s; if its center pixel's TM is in between, only f 0 ðmÞ and ξ i are estimated with a total number of dðs 2 þ 1Þ∕ðiter þ 2Þe × ðiter þ 2Þ similar patches in L iter n , its one down-scaled image, and the lower-resolution images fL i n g iter−1 i¼0 , assign f 0 0 ðmÞ ¼ 0, and r m is obtained by averaging all ξ i s.
(f) For each patch n in R iter , search its nearest neighbor m in Q iter , generate HR patch q using Eq.(7).Current HR image H temp is obtained by merging all q s.(g) Image estimate update: End for where TM ði; jÞ is the TM value of the pixel at location ði; jÞ; T 1 and T 2 are the bin values that correspond, respectively, to 70% and 30% in the cumulative TM histogram; c 1 , c 2 , and c 3 respectively represent the hard-detailed, medium-detailed, and smooth regions.
After the classification, an adaptive derivative order estimation scheme is implemented: for hard-detailed regions, we estimate their second-order derivatives to recover complex structures; for medium-detailed regions, a first-order derivative estimation is sufficient; for smooth regions, we simply paste the results from the previous HR estimation.Experiments show that such an adaptive patch processing strategy greatly reduces the time consumption of secondorder scheme, which estimates the second-order derivative for both hard-and medium-textured regions, to 0.625 times without any obvious visual quality compromise.Algorithm 1 describes the entire SR process of the proposed method.

Experimental Results and Analysis
To evaluate the effectiveness of the proposed method, experiments on 14 standard images (shown in Fig. 2) are conducted.Several state-of-the-art SR approaches, including Glasner's method, 22 Fattal's method, 10 Freedman's method, 23 the GPR method, 29 the SC-based method, 19 Shan's method, 9 Kim's method, 21 the ANR method, 30 and the NARM method, 25 are selected as comparison baselines.The results of the first three approaches are straightly downloaded from http://www.cs.huji .ac.il/~raananf/projects/lss_upscale/sup_images/index.html and for the rest six approaches we run the source codes that are available at their authors' homepage.
Two quantitative measures, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index, 38 are adopted to evaluate the objective performance.A high PSNR score indicates that the reconstructed HR image contains little distortion, and a SSIM value near 1 implies that the resultant HR image has a very similar structure to the ground truth.It is worth noting that due to the gradual magnification scheme, any minor structure distortions and estimation errors may propagate and accumulate over several iterations. 28hus, the HR image obtained by the proposed method is inconsistent with the initial LR image.To alleviate inconsistency, a simple back-projection policy 19 is adopted to postprocess the reconstructed HR image.Unless otherwise specified, the presented SR results are postprocessed by the back-projection.

Experimental Configuration
In the following experiments, we straightly use the images shown in Fig. 2(b) as the LR input images, while the other LR input images are generated by down sampling (using the bicubic interpolator) the images shown in Fig. 2(a).In the proposed method, the patch size is 5 × 5 and four pixels are overlapped between the adjacent patches.To reduce the computational complexity, similar patches are found within the 13 × 13 searching region.The parameter h in Eq. ( 9), which is used to compute the similarity, is fixed to 10.14.The gradual upscaling factor is set to 1.5.The smooth image L iter n is generated by convolving with a Gaussian filter with a standard deviation of 0.65.The upscaled image L iter nþ1 is obtained by bicubic interpolation.

Experimental Results
For color images, we only super resolve their luminance channel since human eyes are more sensitive to illuminant changes.Accordingly, we compare the quantitative difference only on the luminance channel between the ground truth and the HR output image.Table 1 summarizes the quantitative comparisons on images shown in Fig. 2(a).Among those listed approaches, ANR, 30 Kim's method, 21 and SC-based 19 are all trainingbased SR approaches, whereas GPR, 29 Shan's method, 9 NRAM, 25 and the proposed approach are training-free.It is worth noting that Kim's method not only uses external dataset as ANR 30 and SC-based 19 do, but also postprocesses the resultant image by imposing an image edge prior.As in Table 1, Kim's method on average gives the best PSNR and SSIM scores.However, even for the image Butterfly, where Kim's method presents nearly the highest gain in both PSNR and SSIM over the proposed method, we observe no apparent visual difference between these two methods (see Fig. 3 for details), despite that there are some jaggies in our method.Nevertheless, such jaggy effects are in fact introduced by the back-projection process, as illustrated in Figs.3(b) and 3(c).Since jaggies are highly probable to cause the degradation of the objective performance of the proposed method, we believe that using certain de-jagged techniques could further improve our method, a most recent one of which can be found in Ref. 39.
In general, among all the training-free SR approaches, our method gives the best performance in terms of PSNR and SSIM.As for the other training-based SR approaches, we have to point out that they use thousands of training images to assure that enough relevant information can be utilized in the SR reconstruction.However, by using a single image only, the proposed method is able to reach a comparable objective performance, which validates that the relevant example LR-HR patch pairs can be found by fully exploiting the multiscale patch similarity.
For visual comparison, we carry out 3× magnification on LR images obtained from Fig. 2(a) and 4× magnification on images in Fig. 2(b).Since the pretrained dictionaries in the SC-based 19 approach are only for a zoom factor of 2, this approach is excluded in the following visual comparisons.In the first 3× magnification case, the proposed method is compared with the GPR method, 29 Shan's method, 9 Kim's method, 21 the ANR method, 30 and the NARM method. 25igure 4 shows the experimental results on images Barbara, Girl, Fence, and Hat, in which the local regions of interest (ROIs) in blue boxes are presented at the lower corner for providing a better comparison.Based on a universal dictionary, which is pretrained from thousands of images, the ANR method 30 appears to generate natural-looking results.Nevertheless, it is clumsy in preserving sharp edges, e.g., the blurry eyelashes in image Girl, due to the fact that a universal dictionary fails to accurately represent the complex details.Leveraged with a generic image prior model, Kim's method can effectively preserve some tiny details (see the ROI in image Girl), but it is prone to create ringing artifacts along some complicated edges, e.g., the characters in the image Hat.As for the NARM 25 and Shan's methods, 9 they are both somewhat prone to smooth HF details.The GPR method 29 seems to have difficulty in synthesizing faithful details, i.e., the characters in image Hat are badly recovered.By contrast, the proposed algorithm is able to provide sharper edges and more photorealistic details than other methods.For example, in images Barbara and Fence, the details shown in our ROIs are more distinctive in comparison with other methods; in images Girl and Hat, very fine details, Table 1 Comparisons on peak signal-to-noise ratio (PSNR) values (dB) and structural similarity (SSIM) values (PSNR/SSIM) for 2× magnification.
ANR 30 GPR 29 Kim 21 The proposed SC-based 19 NARM 25 Shan 9 such as eyelashes and the contour of characters, are well preserved.
In addition, the objective comparisons of the HR results under 3× magnification are listed in Table 2, where our result appears not as superior as in the case of 2× magnification.However, we also find that our method, in fact, provides the best appealing visual details, i.e., the ROIs in Fig. 4(d) are consistently more vivid than those in Fig. 4(b), although the latter gives the almost highest PSNR and SSIM values; Hat image is poorly reconstructed by GPR but it has higher PSNR value than ours.This paradox suggests that the objective quality degradation of our method can be only attributed to the inconsistency caused by the gradual magnification scheme.Although this inconsistence can be alleviated by the back-projection policy, Freedman and Fattal 23 have further claimed that such inconsistency should be alleviated at an early stage.Their claim also explains why our method gives superior quantitative performance in 2× magnification case but fails to do so in 3× magnification case, since in 2× magnification, only two iterations are required in the proposed method.
To further inspect the effectiveness of our method, we continue to compare with more algorithms in Figs.5-7 under a 4× magnification.Since the code of the NARM 25 method is only devised for zoom factors of 2 and 3, we also exclude this method in the subsequent visual comparisons.Among all the reconstructed HR images shown in Figs.5-7, Freedman's method 23 tends to blur HF details since it simply assumes that the first-order derivative of the mapping function is a constant value.Although Fattal's method 10 applies imposed edge statistics to generate sharp edges, it produces staircase artifacts along edges as shown in Fig. 6(g).Glasner's method 22 avoids these unwanted staircase artifacts and provides somewhat pleasant results.Nevertheless, this particular method fails to reproduce plausible details, such as the character "9" in Fig. 7(f).On the contrary, as reflected in all the ROIs, the proposed method outperforms the others by providing more reasonable details.Although there are some jaggies in our results, such as Fig. 7(e), we should point out that they are mainly introduced by the simple back-projection process.As reflected in Fig. 8(b), prior to the back-projection, our result presents more distinguishable details.Although Fig. 8(b) presents a kind of overestimation, we in fact cause this artifact on purpose since the subsequent back-projection step weakens the details.Moreover, quantitative comparisons are also made.Since no ground truth is available in 4× magnification, we adopt a no-reference image assessment natural image quality evaluator (NIQE) 40 to indicate the quality of reconstructed HR image.NIQE reflects how the perceived image deviates from the statistical regularities calculated from natural images, a small value of which suggests a better image quality.From all the results in Figs.5-7, we see that our method provides the lowest NIQE values.
To sum up, the advantages of the proposed method are attributed to the combination of the local regression model and the multiscale patch redundancy property: (1) we estimate the higher-order derivative of the mapping function, so that complex details such as curvature can be well preserved and enhanced; (2) exploiting the self multiscale redundancy helps to find more related patches than using an external database, without introducing false details.

Discussion on Down-Sampling Method
As illustrated in Ref. 7, the down-sampling method also affects the SR algorithms, because in the synthesis experiments, the LR input images are usually obtained by down sampling the original image.Therefore, if a proper downsampling method is applied, the LR image will have fewer artifacts, which lead to a better reconstructed HR image.In Table 3, the influences of four down-sampling methods on the proposed method, namely, "Nearest," "Bilinear," "Bicubic," and "Lanczos," are quantified in terms of PSNR and SSIM, using the test images in Fig. 2(a) under a 3× magnification.Note that we do not postprocess the SR results (back-projection) in this case, since we want to examine how the down-sampling method influences our algorithm.Surprisingly, the listed numerical results seem to be a violation of the claim in Ref. 7, since the simple bilinear method outperforms others.However, we notice that the SR results on images down sampled by bicubic are always sharper than the results by bilinear method, which, we believe, causes the degradation of objective performance of the bicubic method since overestimation may occur in some highly textured regions.However, we also found in our experiments that after the back-projection, the average PSNR gain obtained by bicubic over bilinear is higher than 4 dB.Kim's method, 21 the proposed approach, NARM, 25 GPR, 29 and Shan's method, 9 respectively.

Discussion on the Derivative Order
Here, we conduct an empirical study on the effect of derivative orders on the recovery of complex details.Four different derivative order settings in the nonsmooth patches are adopted for comparison, such as the zero-order estimation, the first-order estimation, the second-order estimation, and the adaptive order estimation used in the proposed method.The zero-order derivative estimation refers to assigning and r m ¼ 0 in Eq. ( 7), and a similar notion goes for the first-order derivative estimation by assigning f 0 0 ð•Þ ¼ 0. As indicated in Fig. 9, higher-order derivative estimation helps to produce fine textural details.
ANR 30 GPR 29 Kim 21 The proposed NARM 25 Shan 9 For example, the zero-order estimation causes very severe block effects.On the other hand, the other three derivative estimation types successfully restore the complex details.Moreover, achieving second-order derivative estimation even preserves the subtle structure changes in kid's eyelashes, as shown in Figs.9(c) and 9(d).Although there are no apparent visual differences between Figs. 9(c) and 9(d), the adaptive estimation strategy runs 1.6× faster than estimating second-order derivative on all detailed patches.
To sum up, the adaptive order strategy not only synthesizes complex details, but is also time efficient.

Discussion on Parameters
For the proposed method, it is important to choose proper parameters so as to obtain good SR performance.In general, there are six parameters to be determined: the patch size, the overlapping step, the deviation of Gaussian filter, the gradual  30 GPR, 29 Kim's method, 21 Shan's method, 9 the proposed approach, Glasner's method, 22 Fattal's method, 10 and Freedman's method, 23 respectively.GPR, 29 Kim's method, 21 Shan's method, 9 the proposed approach, Glasner's method, 22 Fattal's method, 10 and Freedman's method, 23 respectively.The numbers in brackets denote NIQE 40 values.
scaling factor, the size of search region, and the smoothing parameter h in Eq. ( 9) to compute similarity.We found that our algorithm is insensitive to the last three parameters in a reasonable range.In the following, we will elaborate the influences of the first three parameters, respectively.

Influence of patch size and overlapping step
Intuitively, using a large patch size tends to produce a smooth result and using a small patch size may introduce unwanted artifacts in flat areas.In the proposed method, the patch size also determines the number of derivative coefficients that need to be estimated.For example, if the patch size is a little large, say 9 × 9, for a second-order derivative estimation, we have to find at least 162 similar patches in multiscale versions of the input image.Finding such a large number of patches with high similarity is unrealistic.In addition, the ones that are not the first few nearest patches are very likely to be quite different from the target patch.In light of this, patch size cannot be very large in the proposed method.Therefore, we only discuss the effects caused by different patch sizes of 3 × 3, 5 × 5, and 7 × 7, and show their reconstructed results on image Girl and Hat under a 2× magnification in Fig. 10.
As shown in Fig. 10, the freckles in image Girl are well preserved by using patch size 3 × 3, whereas the other two patch size settings are prone to blurring such details.Nevertheless, some unwanted artifacts are introduced in image Hat by using patch size 3 × 3, such as the black dots along the hat edge, while the results obtained by other patch sizes are more pleasant.Besides, we have found that applying the patch size of 7 × 7 in our approach greatly increases the running time, which takes 10 times more than what is taken by running with 3 × 3 patches.Therefore, to balance the reconstruction quality and the time efficiency, we suggest using the patch size of 5 × 5 in the proposed method.In addition, when patch size is fixed, a larger overlapping step leads to a better SR result.To this end, we use an overlapping step of four in the proposed method, as suggested in the SC-based SR 19 method.

Influence of deviation of Gaussian filter
In the proposed method, the Gaussian filter is used to generate the blurred image L n .A large deviation of the Gaussian filter leads to a blurrier L n , and consequently presents a more obvious HF enhancement according to Eq. (7).A supported example is shown in Fig. 11, in which a higher Gaussian derivation is prone to overestimation.However, considering that the subsequent back-projection will weaken the image details, we recommend a large Gaussian derivation in the first SR process.

Conclusion
In this paper, we present a new learning-based SR method for single image.Without any external training database, the proposed method establishes the relations between self LR-HR patches by leveraging the local regression model and the multiscale nonlocal patch redundancy.To better guarantee the cross-scale similarity, a gradual up-scaling scheme is also adopted, and previous estimates are accumulated for the next derivative estimation.In addition, an adaptive patch-based processing strategy is employed to reduce time consumption.Extensive experimental results, further, show how this proposed method improves over many existing learning-based SR approaches.In general, benefiting from a closed-form solution to high-order derivative estimation, the proposed method yields more photorealistic HR images with sharp details.
In the future, we plan to extend the proposed method in the following ways.(1) Adaptive patch size.As illustrated in Sec.4.5.1, a large patch size is appropriate in smooth regions and a small patch size helps to recover subtle image details.Therefore, applying these two complementary properties in a combined adaptive patch size policy will further improve the proposed method.(2) Suitable patch features.Several works have pointed out that the way to represent the image patches also affects SR algorithms.In the proposed method, the patch itself is directly used as a basic feature.Other feature descriptors can also be employed, such as contrast-normalized patch and the patch derivative.(3) Contaminated image.Currently, the proposed method could only deal with clean images, and directly applying it to contaminated images  would either introduce unwanted artifacts, e.g., the noise amplification effect in noisy images, or no significant improvement in blurry images.For noisy images, the assumption in Eq. ( 6) that p ¼ fðmÞ is no longer valid; for blurry images, the blur kernel used in generating L n should be refined. 41][43]

Fig. 1
Fig. 1 Patch relations in the proposed method.
in which si;1 and si;2 are the eigen values of the structure tensor computed from each image patch, and ζ i is the second moment of a gray-level cumulative histogram of that patch.By analyzing the cumulative TM histogram, three regions with different texture magnitudes are obtained: ði; jÞ ∈ 8 < :

( 3 )
Interpolation: Use bicubic interpolation to resize image L T to the desired size for the final HR image H. Journal of Electronic Imaging 033014-5 May∕Jun 2014 • Vol.23(3) Hu and Luo: Single-image superresolution based on local regression and nonlocal self-similarity

Fig. 2
Fig. 2 Test images used in our experiments.(a) Original HR images used in 2× and 3× magnification case.(b) LR images used in 4× magnification case.

Fig. 8
Fig. 8 Comparison of images.(a) after back-projection and (b) before back-projection.

Fig. 9
Fig. 9 The SR results (before back-projection) obtained by different derivative orders.(a) Zero order.(b) First order.(c) Adaptive order.(d) Second order.Note that we make the overestimate in (b)-(d) on purpose since the subsequent back-projection step weakens the details.

Fig. 11
Fig. 11 The SR results (before back-projection) obtained by different Gaussian deviations: (a) 0.45, (b) 0.5, (c) 0.55, (d) 0.6, and (e) 0.65. 2 ðx 2 − xÞ; • • • ; w h ðx J − xÞ T g, and • • • ; y J T , b ¼ ½fðxÞ; f 0 ðxÞ; • • • ; f ðNÞ ðxÞ T , W ¼ diagf½w h ðx 1 − xÞ; w h Calculate L iter n by convolving L iter with a Gaussian filter.(b) Calculate TM for L iter n , and obtain thresholds T 1 and T 2 from the cumulative TM histogram.(c) Calculate L iter nþ1 by upscaling L iter with the bicubic interpolation under the t× magnification factor.(d) Partition L iter and L iter n into s × s image patches in raster-scan order to construct the HR and LR training set P iter ¼ fpg and Q iter ¼ fmg, respectively; partition L iter nþ1 into s × s image patches with o pixels overlapped in raster-scan order to construct the LR test patch set R iter ¼ fng.

Table 3
Comparisons of different down-sampling methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).