Recently, experiments show that deep super-resolution (SR) networks with error feedback mechanisms can achieve better accuracies than purely feed-forward networks. However, such error feed-back networks use one-step mapping, which increases the difficulty for training with large-scale factors. We first propose a stage-progressive back-projection network to progressively reconstruct images by simply dividing the entire back-projection stage into several levels. We convert one-step large-scale sampling into multiple samplings with a moderate scale factor through such division. At each level, we process low-resolution-to-high-resolution/high-resolution-to-low-resolution mapping with a scale factor of 2. In order to enhance the effect of dense connections and to further improve the performance, we propose the unit-progressive back-projection network, in which we construct progressive projection units to avoid one-step mappings with large-scale factors. Additionally, we recommend a subpixel convolutional layer and its inverse transform as the mapping layer since it computes each pixel of the outputs covering more pixels from the inputs. Extensive quantitative and qualitative evaluations on benchmark datasets show that our algorithms substantially improve performance on large-scale SR tasks.
The loss function plays an important role in model training for the single-image super-resolution task. Most convolutional neural network-based models adopt conventional pixel-wise loss functions to make impressive advances in peak signal-to-noise ratio and structural similarity index. However, these losses tend to find the average of plausible solutions, which lead to overly smoothed SR results with a lower visual perception. We propose a loss function combining the statistics loss with semantic priors and the quality assessment loss to produce an HR image with high visual quality while maintaining natural image statistics, as perceived by human observers. Our statistics loss measures the similarity of deep feature distributions in different semantic blocks and contributes to the maintenance of natural internal statistics in image restoration. Additionally, a no-reference quality metric that focuses on several aspects of human perceptual preferences for lighting, tone, and sharpness is introduced in our loss function to provide a more visually compelling approximation of human visual perception for perceptual image super-resolution. Experiments prove that our loss function can effectively guide the network to generate images of high-perceptual quality while considering the structural distortion for single-image super-resolution.