The widespread success of Kinect enables users to acquire both image and depth information with satisfying
accuracy at relatively low cost. We leverage the Kinect output to efficiently and accurately estimate the camera pose in presence of rotation, translation, or both. The applications of our algorithm are vast ranging from camera tracking, to 3D points clouds registration, and video stabilization. The state-of-the-art approach uses point correspondences for estimating the pose. More explicitly, it extracts point features from images, e.g., SURF or SIFT, and builds their descriptors, and matches features from different images to obtain point correspondences. However, while features-based approaches are widely used, they perform poorly in scenes lacking texture due to scarcity of features or in scenes with repetitive structure due to false correspondences. Our algorithm is intensity-based and requires neither point features’ extraction, nor descriptors’ generation/matching. Due to absence of depth, the intensity-based approach alone cannot handle camera translation. With Kinect capturing both image and depth frames, we extend the intensity-based algorithm to estimate the camera pose in case of both 3D rotation and translation. The results are quite promising.