Projective geometry is an important topic in computer vision because it provides a useful camera imaging model and its fundamental properties.1 Some applications of this topic are found in camera motion,2 camera calibration,3,4 pose estimation for augmented reality,5 perspective correction,6 and three-dimensional (3-D) surface imaging7 among others.
Theoretical concepts of projective geometry are analyzed simply and elegantly using homogeneous coordinates.8,9 However, projective geometry is commonly presented in abstract form, leaving a gap in how to apply it in computer vision problems.10 Moreover, homogeneous coordinates are used with a notation that masks basic geometrical aspects and may confuse the inexperienced readers.11
In this paper, a simple and intuitive approach to expose some useful concepts of projective geometry is addressed. For this, an alternative notation for homogeneous coordinates based on operators is suggested. To highlight the relevance of this topic in computer vision, the presentation is motivated by a specific problem, namely the perspective correction for a “camera scanner” application.
First, the proposed operators for homogeneous coordinates are defined in Sec. 2. Next, some basic concepts of projective geometry in the one- (1-D) and two-dimensional (2-D) cases are presented in Secs. 3 and 4, respectively. Then, the pinhole camera model is derived in Sec. 5. A perspective correction method, useful for camera document scanning, is described in Sec. 6. Finally, the conclusions of this work are given in Sec. 7. The paper is complemented with two appendices. Appendix A presents the direct linear transformation method for homography matrix estimation. Finally, a simple method to obtain the camera parameters from homographies is explained in Appendix B.
Definition of Operators
A point in an -dimensional space will be represented by a vector of the form
The last entry of a homogeneous vector is known as the scale and will be recovered by the scale operator . This operator returns the last entry of any given vector. For instance, for the vectors in Eqs. (1) and (2), we have
The operator sets the scale to unity. Another operator that sets the scale to zero is needed. For this, we define the operator
The operators and can be considered as two particular cases of a more general operator defined as
The procedure of adding an extra entry to vectors is reverted by returning the given vector except its last entry. For this, we define the inverse operator as follows. For any -dimensional vector
The inverse is a linear operator. That is, for any two scalars and , we have
In general terms, the homogeneous operator carries the representation of a point from - to -dimensional vectors while the inverse homogeneous operator returns the representation from - to -dimensional vectors. An important transformation emerges when, in the -dimensional space, a linear mapping is applied. Mathematically, we describe this transformation by the projection operator defined as
Using the equations in Eq. (12), the operator can be expressed in terms of the projection operator, namelyTable 1. For a more comprehensible reading of this paper, the reader is encouraged to demonstrate all the equalities in Table 1.
Some useful equalities of the operators S, Hs, and PM,s. In all cases, we consider λ≠0, γ1 and γ2 are any scalars, x is a n-dimensional vector as given in Eq. (1), Ξs is the matrix defined in Eq. (13), y=λHs[x], M is a matrix of size (n+1)×(n+1), and W is a matrix of size m×(n+1).
In the following sections, the defined operators are studied from an intuitive geometrical approach for the 1-D and 2-D cases. Then, the usefulness of this theoretical framework is illustrated by addressing the perspective correction problem for camera document scanning.
The 1-D real space can be represented as a line as shown in Fig. 1(a). In this space, a point at a finite distance from the origin is represented by a real number ; otherwise, the point is represented by the symbol .
Alternatively, the 1-D space can be represented by the projective line in the -plane as shown in Fig. 1(b). Thus, the coordinate of a point in the line becomes the vectorFig. 1(b). This is described mathematically as Fig. 1(b)] because the intersection between lines is unaltered. In other words, as stated by Eq. (11).
Homogeneous coordinates provide a different form to identify points of the real line. Consider the unit vectorFig. 2.
Intuitively, the real line in Euclidean representation has two points at infinity, namely and . However, in projective geometry, the real line has only a single point at infinity given by the homogeneous coordinatesFig. 2. It could be argued that corresponds to while to . However, note that is the opposite of . Hence, they represent the same point.
Note that is consistent with the notion that represents a point at infinity distance from the origin. According to the concepts of projective geometry, the vector represents an ideal point, see Eq. (5).
The line can be transformed to any other line by applying a rotation and a translation . Thus, a point in the line , represented by the scalar , becomes a point in the -plane given by the vector
If the matrix is singular, the vectors and are collinear. In this case, the origin is a point of the transformed line (the distance of the line from the origin is zero). The matrix is nonsingular when and are linearly independent. In this case, the origin is not a point of the transformed line.
Let in Eq. (25) be the homogeneous coordinates of a point in the line. Thus, we obtain the 1-D projectionFig. 3.
Points and Lines in the Plane
Any point in the 2-D space can be represented as the vectorFig. 4(a). Note that takes the 2-D vector (in the plane ) and converts it to the 3-D vector , where is unaltered but now it lies in the projective plane . It is worth mentioning that the vector can be recovered from as the point of intersection of the line with points and , see Eq. (11). That is,
Let and be two different points in the -plane. The vector of the line passing through and can be obtained by the cross product as
Consider two lines defined by the vectors and . If is the intersection point of these lines, then is orthogonal to and . That is
Two different lines are parallel if its defining vectors are of the formFig. 4(b). Two points of each line are
It is worth mentioning that, if is the vector of a line with direction [see Eq. (40)], then the vector is orthogonal to , namely
Ideal Points and the Line at Infinity
In the Euclidean geometry, two parallel lines in the plane do not intersect. However, in the projective geometry, two different lines always intersect at a point. Consider the parallel lines given by the vectors in Eq. (36). Using Eq. (35), the intersection point is
The vector is associated with the direction of the line . This is verified by taking into account that is orthogonal to [Eq. (42)] as well as to (), then
All ideal points given by Eq. (44) are collinear. The vector of such a line, known as the line at infinity, is
The ideal point in Eq. (44) was obtained as the intersection of two parallel lines and . However, the intuition suggests that the same result could be obtained by computing the intersection of the line and the line at infinity . In fact, we have that
Similar to the 1-D case, homogeneous coordinates provide a different form to identify points of the plane. Consider the unit vector
The points of the plane at a finite distance from the origin are given by with and , i.e., the upper hemisphere of the unit sphere, see Fig. 5. The points of the plane at infinity distance from the origin are parameterized by and . These points have the homogeneous coordinatesFig. 5.
Any plane in the 3-D space can be obtained as the plane after a rotation and a translation . Thus, the points represented by , becomes
The matrix is singular when , , and are coplanar. In this case, the origin is a point of the transformed plane (the distance of the reference plane from the origin is zero). Otherwise, is a nonsingular.
Let in Eq. (53) be the homogeneous coordinates of a point in the projective plane. Thus, the relation between the points and is given by the 2-D projection , namelyFig. 6.
Properties of the Two-Dimensional Projection
As shown in Fig. 6, the 2-D projection excludes several geometrical properties; e.g., shape, angles, lengths, and ratio of lengths. Fortunately, there are some geometrical properties that are preserved. Particularly, we are interested in three of them that are very useful in practice: namely straightness, line–line intersection, and parallelism of the normal and line at infinity vectors.
This property states that a 2-D projection transforms lines to lines.12 This can be shown as follows. Consider a line with vector and points , that is
Preservation of the line–line intersection by a 2-D projection refers to the following. If
Parallelism of the normal and line at infinity vectors
The normal of the -plane and the vector of the line at infinity are parallel. When the projection is applied, the normal of the reference plane (with parameters ) and the new line at infinity still remain parallel; i.e., , . Actually, the reference plane has the normal
In the following section, the developed theoretical framework is applied in a real problem.
Pinhole Camera Model
In practice, the imaging process is performed by a camera lens device as shown in Fig. 7(a). This device produces high quality images because of a complicated system of lenses that minimizes aberration and distortion. However, the imaging process can be modeled using a single thin lens as shown in Fig. 7(b). Moreover, the imaging model can be easily derived using the equivalent pinhole camera as shown in Fig. 7(c).
In the pinhole camera, the origin of a coordinate system is fixed at the pinhole and the -axis is parallel to the optical axis. The plane , where is the focal length, is the actual image plane. Note that the image is inverted; therefore, the and axes are reverted to describe the image as a magnified version of the object. The inversion of the axes is avoided using the conjugate image plane as shown in Fig. 7(c).
Centered Pinhole Camera
A typical representation of a pinhole camera is shown in Fig. 8. The coordinate system is known as the camera reference frame. LetFig. 8. The sampling can be described as
Given a point (in pixel coordinates), the actual coordinates of an image point (physical coordinates on the image plane ) can be obtained from Eq. (75) as
Noncentered Pinhole Camera
Let us consider that the pinhole camera is at an arbitrary position and orientation with respect to a world coordinate system as shown in Fig. 9. The position and orientation of the camera are defined by the vector and the rotation matrix , respectively. Let
In general terms, Eq. (83) describes a transformation of points of the 3-D space to points of the 2-D one. A very useful transformation is obtained when represents points of a plane in the 3-D space. In this case, Eq. (83) is reduced to a transformation from the 2-D space to itself.
Consider that represents the points of a plane in the 3-D space; mathematically, see Eq. (53)
The homography matrix is singular when the pinhole is at a point of the reference plane. For any other case, and Eq. (85) can be inverted asA, the direct linear transformation method for homography estimation is described.
Perspective Correction for Document Scanning
A camera document scanning application performs several image processing tasks, such as quadrilateral detection, perspective correction, resampling, and image enhancement. In this section, the perspective correction task is addressed to illustrate the application of the proposed approach.
In Appendix A, we show that the perspective of a flat object can be easily corrected using the associated homography. For this, at least four correspondences must be provided. However, for practical document scanning, the coordinates are unknown. Instead, it is assumed that the document to be digitized is rectangular and the orthogonality and parallelism properties of its edges are exploited.
The estimation of the homography is greatly simplified by assuming a centered pinhole camera with known intrinsic parameters; e.g., by a previous camera calibration, see Appendix B. Thus, we only require to estimate the reference plane parameters , i.e., the rotation matrix and the translation vector , see Eq. (84).
Estimation of the Reference Plane Parameters
Consider a coordinate system in the reference plane with origin at the center of the document to be scanned as shown in Fig. 10(a). The - and -axes of this coordinate system are parallel with the upper/lower and left/right sides of the paper, respectively. The corners of the document to be digitized have coordinates given by the vectorsFig. 10(b). The vectors and are related by Eqs. (85) and (89); however, the vectors and the homography are unavailable. Only the vectors are available, which are easily obtained from the image by pointing the vertexes of the imaged document.
The points are used to compute the following lines, see Fig. 10(b),
Note that the points and are the projections of the ideal points
The translation vector is obtained by taking into account that preserves the line–line intersection. Thus, from Eq. (65) we have with . Therefore, from Eq. (88) we haveFig. 10(a), then , which leads to
The reference plane is fully characterized by six degrees of freedom (DOF), namely position (three coordinates) and orientation (three angles). The vectors and provide five DOFs. Specifically, the vector provides three DOFs that fix the position while provides two DOFs defining the orientation by the azimuth and polar angles given, respectively, by
From Eqs. (84) and (53), we haveFig. 10(c). The vectors and are related by
The functionality of the presented algorithm is illustrated by the following example. The camera described in Appendix B and the estimated intrinsic parameters given in Eq. (156) are used here.
Figure 11(a) shows the image of a rectangular object acquired by the camera. Then, the four corners of the quadrilateral are marked from the image as shown by the yellow circles in Fig. 11(b). The points , , and are indicated by the red circles in Fig. 11(b). It is worth mentioning that , or , or both could be points at infinity. Even in these cases, the presented methodology is valid.
The information estimated with the four corners areFig. 11(c).
With the correction of perspective, the yellow circles in Fig. 11(b) become the green ones in Fig. 11(c). The region of interest is the rectangle with corners marked by green circles in Fig. 11(c). Finally, a zoom of the region of interest is shown in Fig. 11(d).
An operator-based approach for homogeneous coordinates was proposed. Several basic geometrical concepts and properties of the operators were investigated. With the proposed approach, the pinhole camera model and a simple camera calibration method were described. The study of this work was motivated by developing a perspective correction method useful for a camera document scanning application. Several experimental results illustrate the analyzed theoretical aspects. The proposed approach could be a good starting point to introduce inexperienced students in the scientific discipline of computer vision.
Estimation of the Homography Matrix
In this appendix, we illustrate the method known as direct linear transformation for homography matrix estimation. This method is very useful for illustration purposes because of its simplicity. However, the highest accuracy and robustness are reached with other advanced methods available in the literature.9,13
Let be the homography matrix defined in Eq. (86). Consider that the matrix is row partitioned as follows:
Equation (85), which relates points of the reference and image planes, can be rewritten as
Furthermore, Eq. (123) can be written in matrix form as
Equation (124) relates a single point on the reference plane with the corresponding point on the image plane. If pairs , with , are available, the corresponding equations of the form Eq. (124) can be written as
The nontrivial solution of Eq. (126) can be obtained using the constraint . Thus, by using the singular value decomposition of , the solution for is the right-singular vector corresponding to the smallest singular value of , see Appendix C of Ref. 14.
The application of this method is illustrated as follows. Consider the image shown in Fig. 12(a). A letter size paper printed with the Melencolia I by Albrecht Dürer is in the scene. Using the aspect ratio of the letter paper, the coordinates of the corners are fixed to
The coordinates of the imaged corners areFig. 12(a). With these four pairs , we obtain the homography
The homography fully defines a pinhole imaging process. Thus, it can be inversed to obtain an undistorted view of the reference plane from its perspective distorted image. Specifically, using Eq. (89) all points of the image are transformed to points of the reference plane. Then, the pixels of the image are displayed at the points as shown in Fig. 12(b). Note that corners of the paper in the corrected image are at the coordinates specified by Eq. (128).
The least number of point correspondences for two-dimensional homography estimation is four. However, the accuracy of the estimation is improved when more than four point correspondences are provided. For this reason, checkerboard patterns15 and gratings16,17 are useful target objects. In this appendix, the corner points of the imaged rectangle where obtained manually from the image. However, the corner points can be obtained automatically using checkerboard patterns or gratings along with grid detection18 or phase demodulation,19 respectively.
Camera Parameters from Homographies
The homography matrix involves both intrinsic and extrinsic camera parameters as well as the reference plane parameters . In this appendix, we show how to obtain the intrinsic and extrinsic camera parameters from several homographies.
Intrinsic Camera Parameters
Consider that the reference plane is the -plane of the world coordinate system; i.e., and
In this case, the homography , defined in Eq. (86), is reduced to
Thus, Eq. (132) can be written as
Since and are orthonormal vectors ( and are rows of a rotation matrix), we have the following two constraints and , which can be written as
The bilinear form can be rewritten as
Then, the constraints given by Eq. (136) become
A nontrivial solution of Eq. (141) for can be obtained using several homographies , . For this, we compute the homographies of different images where the position and orientation of the reference plane (or the camera, or both) are varying (in an unknown manner) while the intrinsic camera parameters remain constant. Thus, we solve the new matrix equation
In general, at least three homographies () are required. However, two homographies are sufficient assuming zero-skew.
Equation (143) can be solved for using the singular value decomposition method, see Appendix C of Ref. 14. Since the obtained solution, labeled as , is unique up to scale, the associated matrix is related to by
It is worth mentioning that the intrinsic camera parameters (, , , , , and ) cannot be obtained using only the matrix . Fortunately, the matrix is sufficient for many computer vision tasks. For the case where the intrinsic camera parameters are required explicitly, we can assume that the skew and size of the pixel are known (e.g., , , and are consulted in the datasheet of the camera sensor). Thus, the estimation of the remaining intrinsic parameters is a linear problem with the least-squares solution
Extrinsic Camera Parameters
Once the matrix is available, the rotation matrix and the translation vector can be estimated for each provided homography as follows. First, we compute the estimate of the matrix as
Then, the rotation matrix is obtained from ensuring the orthogonality condition of rotation matrices. For this, the singular value decomposition is obtained and the required rotation matrix is determined as
Finally, the translation vector is computed as
As an example, we describe a simple experiment to obtain the intrinsic parameters of a camera. A camera with a pixel size of (square pixel), resolution of , and imaging lens with focal length of 6 mm was used. The checkerboard pattern shown in Fig. 13(a) was printed on a letter paper. Then, 15 images of the printed pattern lying on the reference plane were captured from different unknown viewpoints, see Figs. 13(b)–13(i).
We use the coordinates of the corners shown in Fig. 13(a) as the known points on the reference plane. The corresponding points in the image plane were obtained by marking the corners of the checkerboard pattern in the image. Then, with the pairs , an homography matrix was computed for each acquired image. With these homographies, the matrix defined in Eq. (144) was created. Then, Eq. (143) was solved for , the resulting matrix is
From this, the intrinsic parameter matrix was recovered as
For validation purposes, we estimate the focal length using the known information , and . The reader should note that the quantities and are defined in this experiment asFigs. 13(b)–13(i). From the matrix in Eq. (156), the focal length was estimated using Eq. (148). The result is , which is very close to the nominal focal length (6 mm) of the employed camera lens.
O. Faugeras, Q.-T. Luong and T. Papadopoulo, The Geometry of Multiple Images: the Laws that Govern the Formation of Multiple Images of a Scene and Some of Their Applications, MIT Press, Cambridge (2004).Google Scholar
Y. Zhao and Y. Li, “Camera self-calibration from projection silhouettes of an object in double planar mirrors,” J. Opt. Soc. Am. A 34, 696–707 (2017).JOAOD60740-3232http://dx.doi.org/10.1364/JOSAA.34.000696Google Scholar
T. Taketomi et al., “Camera pose estimation under dynamic intrinsic parameter change for augmented reality,” Comput. Graphics 44, 11–19 (2014).http://dx.doi.org/10.1016/j.cag.2014.07.003Google Scholar
H. H. Ip and Y. Chen, “Planar rectification by solving the intersection of two circles under 2D homography,” Pattern Recognit. 38(7), 1117–1120 (2005).PTNRA80031-3203http://dx.doi.org/10.1016/j.patcog.2004.12.004Google Scholar
B. Cyganek and J. P. Siebert, An Introduction to 3D Computer Vision Techniques and Algorithms, John Wiley & Sons Ltd., Chichester, West Sussex (2009).Google Scholar
O. Faugeras, Three-Dimensional Computer Vision: a Geometric Viewpoint, Artificial Intelligence, MIT Press, Cambridge (1993).Google Scholar
R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, Cambridge (2003).Google Scholar
J. L. Mundy and A. Zisserman, Appendix—Projective Geometry for Machine Vision, pp. 463–519, MIT Press, Cambridge (1992).Google Scholar
W. Burger, “Zhang’s camera calibration algorithm: in-depth tutorial and implementation,” Technical Report HGB16-05, University of Applied Sciences Upper Austria, School of Informatics, Communications and Media, Department of Digital Media, Hagenberg, Austria (2016).Google Scholar
H. Zeng, X. Deng and Z. Hu, “A new normalized method on line-based homography estimation,” Pattern Recognit. Lett. 29(9), 1236–1244 (2008).PRLEDG0167-8655http://dx.doi.org/10.1016/j.patrec.2008.01.031Google Scholar
Z. Zhang, “A flexible new technique for camera calibration,” Technical Report MSR-TR-98-71, Microsoft Research (1998).Google Scholar
R. Juarez-Salazar et al., “Camera calibration by multiplexed phase encoding of coordinate information,” Appl. Opt. 54, 4895–4906 (2015).APOPAI0003-6935http://dx.doi.org/10.1364/AO.54.004895Google Scholar
R. Juarez-Salazar, L. N. Gaxiola and V. H. Diaz-Ramirez, “Single-shot camera position estimation by crossed grating imaging,” Opt. Commun. 382, 585–594 (2017).OPCOB80030-4018http://dx.doi.org/10.1016/j.optcom.2016.08.041Google Scholar
A. Herout, M. Dubská and J. Havel, Vanishing Points, Parallel Lines, Grids, pp. 41–54, Springer, London (2013).Google Scholar