Planospheric camera model

Abstract. Realtime stereo processing, particularly in computationally constrained environments such as the Mars 2020 Perseverance rover, requires image rectification to enable efficient one-dimensional correlation. A camera model has been designed to perform better image rectification for fish-eye lenses with a particular emphasis on the engineering cameras on Perseverance. The primary innovation is the use of a unit projection sphere rather than an image plane as the rectifying surface. For stereo rectification, the rows (for a horizontal pair) or the columns (for a vertical pair) of both cameras must lie on the same epipolar plane. Thus, we use a pair of moving planes to define the rows and columns. Each plane rotates about a dedicated static axis passing through the sphere center. Virtual pixels are located where the planes intersect with each other and the unit sphere.


Introduction
Researchers at NASA's Jet Propulsion Laboratory (JPL) have been developing and working with geometric camera models for decades. In the late 1970s, the Yakimovsky and Cunningham linear, perspective-projection model was developed, 1 followed in the 1990s by Gennery's update of that model to include nonlinear distortion, 2 and then in the 2000s his further update supporting both perspective-projection and fish-eye optics in a single hybrid model. 3 In the 1990s, the linear model was used to perform stereo correlation in near realtime for a horizontal stereo pair. To enable this with limited computational power, the disparity search was restricted to the horizontal dimension. For this to work, corresponding rows must be made to line up in the left and right images, i.e., they must be brought into epipolar alignment. No real-world stereo camera pair is perfectly epipolar-aligned. Early systems approximated epipolar alignment by mechanically aligning the cameras. Given the low-resolution systems of the time, the residual misalignment was often not apparent. However, the process was tedious and unreliable.
To achieve epipolar alignment for more demanding cases, a technique was developed in which virtual, epipolar-aligned, co-located, camera models were generated synthetically from the true camera models. The real images were then resampled as if by the virtual cameras into corresponding, epipolar-aligned images. This resulted in images well suited to performing onedimensional stereo correlation. 4 The technique graduated from research to a real application in 2003 with NASA's Mars Exploration Rover (MER) mission's rovers Spirit and Opportunity to support obstacle detection while driving. 5 It was used again in 2011 on the Mars Science Laboratory (MSL) mission's Curiosity rover. 6 These missions had nonlinear camera optics, with both perspective-projection lenses that included radial-like distortion in some cases (Navcams) and fish-eye-like optics in others (Hazcams). 7,8 The rectification technique worked well for the former, but less well for the latter. This is not surprising given that the virtual, rectified models were represented as Yakimovsky and Cunningham linear models. 9 This choice was reasonable for optics that were fundamentally perspective-projection, even with some (modest) radial distortion. Rectifying fish-eye images to linear models, however, significantly distorts the pixel data.
In Fig. 1, the top row shows a stereo pair of images taken by MER Hazcams mounted on a test rover with a view of an indoor sandbox. The middle row shows those same images rectified using linear models. Note in particular that the pixels are clearly stretched around the perimeter of the images. See Fig. 2 from MSL Hazcams on the surface of Mars for a view that shows more clearly how the pixels in the middle of the images are compressed. For the earlier missions, the disadvantages of linear rectification were mitigated in part by using the perspective-projection Navcams preferentially. On the new Mars 2020 mission's Perseverance rover, that is not possible since all the engineering cameras have fish-eye optics. 10 The bottom rows of Figs. 1 and 2 show new images created using the planospheric model described in this paper. The model's better fit to fish-eye geometry is evidenced by less distortion at the pixel level throughout the images. Work on the JPL planospheric camera model began in 2002 when the deficiencies of the existing approach for the MER Hazcams became evident. Since then similar concepts, albeit with different representations and less specialized objectives, have been pursued independently in the research community. 11,12

Planospheric Camera Model
As with the earlier rectification approach, a virtual camera model is generated coincident with the model for the real camera. Consider a unit sphere whose center coincides with the center of projection of the virtual lens. The image passes through the center and inverts before falling on the back inner surface of the sphere. This is the essence of the model's conceptual geometry for a new type of imaging surface. As with perspective projection, it is more intuitive to picture oneself looking out at the world from the sphere center, and seeing the image as appearing on the forward half of the sphere, upright. All descriptions below will follow this convention.

Model Definition
Define the image coordinates as x = column, horizontal coordinate, increasing to the right, y = row, vertical coordinate, increasing downward. Then let the model parameters be c = 3D position of the sphere center, a x = unit column-plane rotation axis, passing through the sphere center, typically vertical and pointing down so that positive rotations (by the right-hand rule) will rotate the forward half of the plane in the (rightward) direction of increasing column (as projected on the forward hemisphere), a y = unit row-plane rotation axis, passing through the sphere center, typically horizontal and pointing left so that positive rotations (by the right-hand rule) will rotate the forward half of the plane in the (downward) direction of increasing row (as projected on the forward hemisphere), n x = unit normal vector to the column plane when x equals zero, pointing in the same direction as the cross product of a x with an outward-pointing vector that also lies in the plane, n y = unit normal vector to the row plane when y equals zero, pointing in the same direction as the cross product of a y with an outward-pointing vector that also lies in the plane, s x = column scale factor to convert between x coordinate and rotation around a x , expressed in radians/pixel, s y = row scale factor to convert between y coordinate and rotation around a y , expressed in radians/pixel. The external coordinate system in which the vectors c, a x , a y , n x , and n y are expressed is arbitrary. Later, it will be called the 3D "world" coordinate system. It is any convenient coordinate system, in which the camera may be posed arbitrarily. For the Mars rover missions, the vehicle body's coordinate system is chosen.
The scalars s x and s y can be the same, but this is not required. They would be identical if one models square pixels, where the rows and columns are equally spaced. Where they are not, these scalars will differ. See Figs. 3 and 4 for renderings of a horizontal stereo pair of planospheric models from both the front and back. The line containing the row-plane rotation axis vectors passes through both the left and right projection centers, and the column-plane rotation axis vectors are parallel. The virtual pixels are located where the planes intersect with each other and with the forward half of the unit sphere. Note that the row planes in this case are the epipolar planes of the camera system (in the projective-geometry sense) and are independent of any camera-model formulation.
The descriptions of a x and a y above suggest that they are orthogonal. They typically are, but they need not be. If there is a reason to make those axes not quite orthogonal, the model can support that. This is also true of the earlier models. Consider a trinocular-stereo system where, say, there are two cameras in a common horizontal configuration, and a third camera directly above or below one of the first two. Here, to prevent performing completely independent rectification for each stereo pair, a common geometry for all three can be chosen such that the row plane passes through the horizontal pair and the column plane passes through the vertical pair (see Fig. 5). In any real-world system, these two plane-rotation axes will not be precisely orthogonal.

Projecting from World to Image
The connection between 2D image coordinates and 3D world coordinates is made by rotating the planes through angles corresponding to the pixel address (x, y), and then following the line of intersection where the planes cross. Where the line intersects the sphere identifies the image pixel, all 3D world points that fall along the line project onto that pixel.
The projection, therefore, from a 3D world point p to a 2D image coordinate is E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 2 ; 1 1 6 ; 2 4 7 x ¼ 1 s x atan2 ðsin θ x ; cos θ x Þ; y ¼ 1 s y atan2 ðsin θ y ; cos θ y Þ; where E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 2 ; 1 1 6 ; 1 6 7 sin θ x ¼ ðn x × n 0 x Þ · a x ; sin θ y ¼ ðn y × n 0 y Þ · a y ; cos θ x ¼ n x · n 0 x ; cos θ y ¼ n y · n 0 y ;

Projecting from Image to World
Using the same geometric image as before, the projection from a 2D image coordinate (x,y) to an outward-facing 3D unit vector u in world coordinates, anchored at the sphere's center, is n 0 x ¼ n x rotated around a x by s x x; n 0 y ¼ n y rotated around a y by s y y; which, using Rodriguez' rotation formula 13 simplified for our case of orthogonal unit vectors, is n 0 x ¼ n x cosðs x xÞ þ ða x × n x Þ sinðs x xÞ; n 0 y ¼ n y cosðs y yÞ þ ða y × n y Þ sinðs y yÞ: The partial derivatives of the 3D unit pointing vector with respect to the 2D coordinates x or y are where E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 3 ; 1 1 6 ; 6 5 9 v ¼ n 0 x Þ × n 0 y ; ∂v=∂y ¼ s y n 0 x × ða y × n 0 y Þ: To derive these partial derivatives start by noting that The difference between the models can be seen when performing the calculations required to prepare for image rectification. This involves projecting from image to world using the virtual camera model, either linear or planospheric, and then from world to image using the calibrated model, for each pixel in the image. When performing this for a pair of 1280 × 960 images on the Mars 2020 flight processor, it takes approximately 47 s when using the linear model and 53 s when using the planospheric model. For Mars 2020, where these calculations are performed during initialization and not on the fly, this is considered to be an acceptable cost.

Preliminary Results
The planospheric camera model has been incorporated into the flight software of the Mars 2020 rover Perseverance. Operators can select using the legacy linear model or the new planospheric model to rectify the images used by stereo correlation and visual odometry. A preliminary comparison of the models has been performed with images taken in JPL's MarsYard.
To evaluate the effects of the different geometries, we can look at visual odometry's use of template matching to find corresponding features from one frame to the next. 14 Such matching is more robust the less feature appearances change as a function of location in the rectified image. A rectification model that results in more uniform appearance across the image should produce better feature-matching behavior. We see this with the planospheric model.
Testing was performed in mid-2020 on a test rover, "Scarecrow," a mobility test platform inherited from MSL, and upgraded to exercise Mars 2020's mobility software. It was outfitted with cameras similar to those on the flight vehicle. Visual odometry using the new model detected and matched approximately 44% more features on average for turns in place of 20 deg and 30 deg than it did with the old model. For circular drives, the advantage was 10% on average (see Table 1).  In late 2020, additional testing was performed on a more flight-like test rover, "VSTB," a higher-fidelity testbed than Scarecrow, designed specifically for the Mars 2020 mission with many engineering-model and other flight-like components. Visual odometry using the planospheric model found about 34% more features during turns in place of 10 deg, 20 deg, and 30 deg than it did with the linear model. For 1-m drives of 0 deg, 10 deg, and 20 deg arcs, the comparison was about 9% in the new model's favor (see Table 2).

Conclusion
The planospheric camera model described here is potentially a better fit for rectifying fish-eye images than the legacy linear model JPL has been using in all its one-dimensional near-realtime stereo processing to date. As the Mars 2020 mission unfolds, and the operators experiment with the old and new models, we will see how this potential proves out in real operations on the Martian surface.