Conventional image stitching methods were developed under the assumption or condition that (1) the optical center of a camera is fixed (fixed-optical-center case) or (2) the camera captures a plane target (plane-target case). Hence, users should know or test which condition is more appropriate for the given set of images and then select a right algorithm or try multiple stitching algorithms. We propose a unified framework for the image stitching and rectification problem, which can handle both cases in the same framework. To be precise, we model each camera pose with six parameters (three for the rotation and three for the translation) and develop a cost function that reflects the registration errors on a reference plane. The designed cost function is effectively minimized via the Levenberg–Marquardt algorithm. For the given set of images, when it is found that the relative camera motions between the images are large, the proposed method performs rectification of images and then composition using the rectified images; otherwise, the algorithm simply builds a visually pleasing result by selecting a viewpoint. Experimental results on synthetic and real images show that our method successfully performs stitching and metric rectification.