Registration of image collections and video sequences is a critical component in algorithms designed to extract actionable intelligence from remotely sensed data. While methodologies for registration continue to evolve, the accuracy of alignment remains dependent on how well the approach tolerates changes in capture geometry, sensor characteristics, and scene content. Differences in imaging modality and field-of-view present additional challenges. Registration techniques have progressed from simple, global correlation-based algorithms, to higher-order model fitting using salient image features, to two-stage approaches leveraging high-fidelity sensor geometry, to new methods that exploit high-performance computing and convolutional neural networks (ConvNets). The latter offers important advantages by removing model assumptions and learning feature extraction directly through the minimization of a registration cost function. Deep learning approaches to image registration are still relatively unexplored for overhead imaging, and their ability to accommodate a large problem domain offers potential for several new developments.
This work presents a new network architecture that improves accuracy and generalization capabilities over our modality-agnostic deep learning approach to registration that recently advanced the state of the art. A thoroughly tested ConvNet pyramid remains the core of our network approach, and has been optimized for registration and generalized to begin addressing derivative applications such as mosaic generation. Further modifications, such as objective function masking and reduced interpolation, have also been implemented to improve the overall registration process. As before, the trained network ingests image frames, applies a vector field, and returns a version of the input image that has been warped to the reference. Qualitative and quantitative performance of the new architecture is evaluated using several overhead still and full-motion video (FMV) data sets.
|