Except for recovering the image detail texture information, the main difference between video super-resolution (VSR) and single-image super-resolution (SR) is that VSR focuses on alleviating the deficiency of temporal coherence between video frames. Motion estimation and motion compensation is the common technique used to strengthen the temporal correlation between frames. Most motion estimation methods are based on optical flow. The optical flow method has three basic assumptions: the movement scale is small; the luminance channel is constant; and every pixel in the same image has the same moving trend. In some scenes with complex motion, the accuracy of the underlying optical flow estimator is limited, which leads to artifacts in the video reconstruction. In recent years, generative adversarial network (GAN) has been widely used for VSR reconstruction, which can acquire more realistic texture details for single frame reconstruction. Based on the above reasons, we explore a GAN-based VSR method by optical flow-free motion estimation and compensation [optical flow-free generative and adversarial network (COFGAN)], which completes motion estimation by producing temporal dimension. COFGAN develops better motion estimation result and improves the performance of VSR without optical flow. To verify the motion estimation effect in complex scenes, long-term sequence real dynamic scene dataset realistic and dynamic scenes is applied for training and testing. We compare the performance of proposed COFGAN with earlier works such as video enhancement with task-oriented flow (TOFlow), frame-recurrent video super-resolution (FRVSR), learning temporal coherence via self-supervision for GAN-based video generation (TecoGAN), and so on. Our method achieves significant performance in the temporal coherence metrics performance of learning perceptual image patch similarity (tLP) (0.47) and performance of optical flow estimation (tOF) (7.07) with ×4 up-scaling factor. Compared to the best performance method TecoGAN in previous work, the proposed method promotes 29% of tLP and 26% of tOF. Moreover, COFGAN reaches the best accuracy on the commonly used video sequence datasets Vid4 and ToS3. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Video
Motion estimation
Temporal coherence
Optical flow
Education and training
Convolution
Visualization