Deep convolutional neural networks (CNNs) have achieved considerable success in terms of image denoising. However, previous CNN denoisers have been restricted by rigid kernel convolution that applies equal spatial treatment across images. To fully utilize the local differences, we propose a kernel prediction network that examines each pixel region and predicts unique pixel-wise kernels. Several optimizations have been further designed to gather sufficient information for single-image denoising task. We adopt dilated residual blocks to view the local pixel region at varying receptive fields. Then, kernel fusion assembles the information from different scopes and generates accurate kernels for each pixel. Instead of applying the predicted kernels to the original image, we construct a compressed feature map as a substitution such that more relevant local features are collected. Experiments are used to demonstrate that our network achieves favorable results compared with state-of-the-art methods and is adequate for practical applications.
Deep convolutional neural networks (CNNs) have achieved considerable success with image denoising. However, they still lack consistent performance across different noise types and levels. We extend noise scenarios to four categories: Gaussian, random-impulse, salt-and-pepper, and Poisson. We also propose a multinoise-type blind denoising network (MBDNet) that solves the blind denoising task using a uniform deep CNN architecture. The network can be divided into two stages where a concise CNN is first used to estimate auxiliary noise-type and noise-level information. Estimation results are then integrated as additional channels of the noisy image and are fed to the subsequent denoising stage. A unique two-branch structure is further adopted in the residual denoising CNN, wherein a shallow branch predicts the filter-flow mask and adaptively adjusts the feature extraction of the parallel deep branch. Extensive experiments on synthetic noisy images validate the effectiveness of the noise-estimation and denoising subnetworks and show that MBDNet is highly competitive as compared to state-of-the-art methods in both denoising performance and model runtime.
A traditional image pyramid is an effective way to extract multiscale features. However, for image restoration, downsampling and upsampling lose details in the original resolution and result in an over smooth result. To overcome this defect, we propose a receptive pyramid (RP) that replaces downsampling by dilated convolution to extract multiscale features on the original resolution and make a full resolution correction. We present an RP-based convolution neural network named receptive pyramid convolutional network (RPCN) for efficient color image compression artifact reduction (CAR). Specifically, we propose residual connected convolution blocks as the baseline of RPCN. RPs work in convolution blocks to extract the hierarchical multiscale features for local feature correction. Moreover, the global feature fusion and vector correction are also introduced to further exploit the hierarchical features from the baseline. Benefiting from the RP, our RPCN achieves state-of-the-art performance with a much smaller model and much faster running speed for the CAR task.
Video super-resolution (VSR) aims to restore a high-resolution frame from its corresponding reference frame and a series of neighboring frames in low resolution. Because of the displacement between the reference frame and each neighboring frame, which is caused by the motion of the camera or observed objects, VSR methods usually initially align the neighboring frames with the reference frame. However, the commonly used motion estimation and compensation methods highly depend on the predicted optical flow and they are affected by the lighting change. We propose a phase-aided deformable alignment network (PDAN) to alleviate the above problems. In PDAN, a phase-based method is introduced in the motion estimation subnetwork along with the traditional U-Net structure, which improves the restoration robustness in scenarios of lighting change. A deformable convolutional network is further adopted in the alignment subnetwork to enhance the alignment for objects with an irregular shape and motion distortion, which also avoids the dependence on explicit motion representation accuracy. Moreover, the reconstruction module is optimized for improved restoration. Extensive experiments conducted demonstrate that PDAN achieves state-of-the-art quantitative and qualitative performances.
Traditional frame interpolation algorithms typically find dense correspondences to synthesize an in-between frame. Finding correspondences is often sensitive to occlusion, disocclusion, and changes in color or luminance. We present a phase-feature-aided multiframe interpolation network that aims to estimate multiple in-between frames in one pass and handle challenging scenarios such as extreme light changes and occlusion. We first model the relation between multiple in-between frames together to enhance the temporal consistency. Two candidate optical flow fields are produced for a given in-between frame, one predicted from our network and the other estimated from those of neighboring frames using a flow fusion map. We also employ an image fusion map to combat occlusion problems in the warping processes, producing two candidate interpolated images that are fed to a shallow network with a residual structure to obtain the final interpolated image. To handle challenging scenarios, we apply a set of Gabor filters to extract phase variations in the feature domain with a multiscale phase subnetwork. Our entire neural network is end-to-end trainable. Our experiments show that this method outperforms the state-of-the-art approaches and achieves marked visual improvement in challenging scenarios.
The use of convolutional neural networks (CNNs) for general no-reference image quality assessment (NR-IQA) has seen tremendous growth in the research community. Most these methods used the patches cropped from the original images for training. For these patch-based methods, the ‘ground truth’ quality of patches is essential. In practice, these methods often took the quality score of an original image directly as the labels of its patches’ quality. However, the perceptual quality of image patches generally differs from the corresponding image quality. Thus, the noise in patches’ labels may hinder effective training of the CNN. In this paper, we propose a CNN with two branches for general noreference image quality assessment. One branch of this model predicts the patch quality, and the other predicts the uncertainty, which denotes the degree of deviation of the patch quality from the image quality. Our model can be trained in an end-to-end manner by minimizing a joint loss. We tested our model on widely used image quality databases and showed that it performed better or comparable with those of state-of-the-art NR-IQA algorithms.
We propose a multitask convolutional neural network (CNN) for general no-reference image quality assessment (NR-IQA). We decompose the task of rating image quality into two subtasks, namely distortion identification and distortion-level estimation, and then combine the results of the two subtasks to obtain a final image quality score. Unlike conventional multitask convolutional networks, wherein only the early layers are shared and the subsequent layers are different for each subtask, our model shares almost all the layers by integrating a dictionary into the CNN. Moreover, it is trained in an end-to-end manner, and all the parameters, including the weights of the convolutional layers and the codewords of the dictionary, are simultaneously learned from the loss function. We test our method on widely used image quality databases and show that its performance is comparable with those of state-of-the-art general-purpose NR-IQA algorithms.
Owing to recent advancements, very deep convolutional neural networks (CNNs) have found application in image denoising. However, while deeper models lead to better restoration performance, they are marred by a high number of parameters and increased training difficulty. To address these issues, we propose a CNN-based framework, named dilated residual encode–decode networks (DRED-Net), for image denoising, which learns direct end-to-end mappings from corrupted images to obtain clean images using few parameters. Our proposed network consists of multiple layers of convolution and deconvolution operators; in addition, we use dilated convolutions to boost the performance of our network without increasing the depth of the model or its complexity. Extensive experiments on synthetic noisy images are conducted to evaluate DRED-Net, and the results are compared with those obtained using state-of-the-art denoising methods. Our experimental results show that DRED-Net leads to results comparable with those obtained using other state-of-the-art methods for image denoising tasks.
We propose a deep convolutional neural network (CNN) for general no-reference image quality assessment (NR-IQA), i.e., accurate prediction of image quality without a reference image. The proposed model consists of three components such as a local feature extractor that is a fully CNN, an encoding module with an inherent dictionary that aggregates local features to output a fixed-length global quality-aware image representation, and a regression module that maps the representation to an image quality score. Our model can be trained in an end-to-end manner, and all of the parameters, including the weights of the convolutional layers, the dictionary, and the regression weights, are simultaneously learned from the loss function. In addition, the model can predict quality scores for input images of arbitrary sizes in a single step. We tested our method on commonly used image quality databases and showed that its performance is comparable with that of state-of-the-art general-purpose NR-IQA algorithms.