We introduce a navigation filter fused with information of visual optical flow and data collected from an inertial measurement unit during GPS signal degradation. Under the assumption that the tracked feature points are located on a level plane, the feature depth can be explicitly expressed and an exact measurement model was derived. Moreover, an error model analysis for a block-matching-based optical flow algorithm has been investigated. The measurement error follows a Gaussian distribution, which is a prerequisite for leveraging the error-state Kalman filter. Subsequently, a local observability analysis of the proposed filter was performed yielding the expression of three unobservable directions. We emphasize the ability of the proposed filter to estimate the aircraft’s state, especially for accurate altitude estimation, without any help of prior knowledge or extra sensors. Finally, an extensive Monte Carlo analysis was used to verify the findings in the observability results showing that all states can be estimated except the absolute horizontal positions and rotation around the gravity vector. The effectiveness of the proposed filter is demonstrated through experimental hardware used to acquire outdoor flight test data.
During the past decades, there has been a growing research interest in developing vision-based or vision-aided navigation systems for unmanned aerial vehicles (UAVs) under circumstances without prior information.1–4 The traditional GPS/inertial navigation system (INS) may face severe drift and inaccuracy for the estimation, especially in environments where GPS is not reliable. Integrated inertial systems using sensors such as radar, LiDAR, or sonar can be regarded as a possible method to improve accuracy.5 However, radar or LiDAR may increase the cost, weight, and power consumption for an airborne platform, while low-cost sonar always shows limited performance for flights in outdoor scenes. Camera-aided navigation and visual odometry, on the other hand, provide rich information about the environment and the camera pose, which seems to be a superior solution to the above problems.
In recent years, simultaneous localization and mapping gathered a lot of attention since it can provide accurate and real-time estimates of six-dimensional parameters. Nevertheless, it is subject to a series of strict requirements such as high-quality images for feature extraction, hardware with powerful onboard processor needed as computational complexity increase related to the quantity of features, and repeated tracking for the same image features over longer periods of time is needed to realize loop-closure.6 Moreover, there are few applications for fixed-wing aircraft that are faster and higher than quadcopters in an outdoor unstructured environment. Optical flow-aided INS evades the need for mapping and landmark data and shows better tolerance in terms of handling scenes with less salient features and little contrast texture. In addition, it has a compact image representation and does not require the extra burden for image storage without loss of rich navigation information on the translation and rotational velocity as well as the scene depth.7 It is more suitable for implementation in systems with limited resources. Previous tests have been conducted to compare the performance between different optical flow algorithms for UAV navigation, as introduced in Ref. 8. The optical flow observation error is usually modeled with the default zero mean normal distribution without any explanations.9 We turn to computer vision scientists for a better answer, but it was found that they mainly focus on the evaluation methodology to give new statistics for indicating the performance.10 They have raised definitions such as “absolute flow endpoint error” and “frame interpolation error,”11 rather than proposing a specific model analysis for the flow error. Furthermore, the use of INS aided with optical flow generated by a single camera often adopts an additional ultrasound range sensor to measure the distance to the ground,12 which solves the ambiguity of the scale factor between the translational velocity magnitude and the scene depth.9 Another commonly used method assumes that the specific maneuvers for UAV could be designed to keep a constant height or attitude. Five optimization constraints-based methods for the estimation of ego-motion were presented in Florian Raudies’ review,10 which can successively solve the problem. As an example, Thomas P. Webb13 leveraged epipolar constraint of feature points as the measurement equation in an implicit extended Kalman filter, also termed the essential or coplanarity constraint, and optical flow subspace constraint is similar to its differential version. However, the filter does not perform well in estimating the velocity states as the epipolar constraint weakly relates to velocity. A bilinear constraint-based visual-inertial scheme for relative motion estimation has been introduced.7 The observability analysis of INS errors for explicit measurements of a subspace constraint-based optical flow method was investigated in Jacques Waldmann’s work.14 These approaches require sufficient numbers of optical flow features to obtain the corresponding linear equations for the unknowns. Xu et al.2 presented an autonomous landing method using planar terrain features tracked in sequential images. Using the assumption of a level plane, therefore allowing for the scale of the line-of-sight vector to be expressed explicitly.
Compared to existing work, the main contributions of the proposed research focused on (1) a detailed error model analysis for optical flow measurement based on block-matching algorithm was investigated and showed to obey a Gaussian distribution and therefore suitable for error-state Kalman filter (ESKF); (2) the feature depth was obtained without extra-measurement device through the assumption of flat terrain, which could be reasonably used for outdoor navigation missions when UAVs fly at a high altitude. In the flat terrain assumption, the height proved to be observable by local observability analysis, which was further verified by simulation and experimental results. In addition, a more detailed development of the ESKF is proposed, which fused the information from IMU and optical flow originally described in Refs. 15 and 16. The paper is organized as follows: Sec. 2 describes the problem formulation, introduces the adopted sensors’ dynamical model, reviews the computation of an optical flow block-matching algorithm, and derives the measurement error analysis. Section 3 outlines the structure of the ESKF to estimate the full states. Section 4 presents the observability analysis. Sections 5 and 6 show the simulation, experimental results, and corresponding discussion; the final conclusions are given in Sec. 7.
The geometry of the general case of an aircraft equipped with an IMU and a monocular camera is shown in Fig. 1. Symbol represents the aircraft body-fixed frame, and the vehicle relative position to the global inertial reference frame is denoted by the vector . The camera frame is fixed to the aircraft with a mounting displacement . We denote the coordinates of the ’th feature point in the frame by , while denotes the relative position of w.r.t. the camera frame . The transformation matrices and describe the rotations from to and to , respectively, where is dependent on the mounting of the camera to the aircraft. The relative position of can be expressed in the camera frame as
Considering the differential equation with respect to the reference frame would require calculating the derivative of the rotation transformation matrix. Therefore, the rotational velocities of the vehicle expressed in the frame and are represented by and , respectively, resulting as
Pinhole Camera Model
This work used a pinhole camera model, whose details are shown in Fig. 1. The camera coordinate frame is defined by paralleling to the optical axis, and matching the horizontal and vertical image directions. The projection results of the feature on the image plane are expressed as
When considering the flight environment for fixed-wing aircraft, which generally holds the flight height above hundreds of meters, flat terrain such as plain and farm field the topographic relief can be neglected compared to the flight height. In such scenes, we assume that the visual features lie in a level plane. Further, without loss of generality, by assuming the feature plane contains the origin of the global inertial frame, we obtained , where . We also adopted the unit quaternion to represent the rotation from the global coordinate frame to the body-fixed coordinate frame at the time . Hence, can be replaced by which describes the rotational matrix corresponding to . Thus, the scale of line-of-sight vector and also the depth component of , namely in Eq. (5) can be represented as
Optical Flow Computation
After the characterization of the motion equation of the camera and pinhole camera model, we focused on the optical flow computation method. In this work, the optical flow was determined as the displacement between two successive images divided by the time interval between their capture. Based on the local brightness constancy assumption and small motion assumption, the block-matching algorithm approximates the image motion by a displacement, which yielded the best match between image regions. The concept of the procedure, shown in Fig. 2, as well as the specific algorithm for displacement computation outlined in Fig. 3, starts with distinct feature extraction and ends up with minimizing the SSD cost function among the search window,9,17 and the result provided the displacement of the current feature, which can be formulated as
Furthermore, we proved that the error of optical flow computation based on the SSD block-matching algorithm is subject to a Gaussian distribution, which is not fully explained by most reviewed articles, such as Farid Kendoul’s work.9 Therefore, we redefined the whole algorithm as a maximum likelihood estimation problem. The displacement of the pixel can be seen as a warp function , which transforms a pixel point in the original image frame () into a point in the current image frame () by a 2-D deformation vector , namely , . Based on the local brightness constancy assumption, which is given as
Hence, we derived the likelihood function for the pixel samples as
The maximum likelihood estimate (MLE) is proven to be asymptotically normally distributed,18,19 and the estimated covariance error of MLE is given by the inverse of the observed Fisher information matrix; therefore, the maximum likelihood estimator can be expressed as18 Since the optical flow was calculated with the displacement motion divided by interframe time, we saw that the optical flow error maintained a Gaussian distribution, which was the prerequisite for adopting the Kalman filter.
Next, we deduce the measurement equation of optical flow. The projected pixel coordinates of the feature point on the image plane are given by Eq. (4), taking the time-derivatives of both sides of Eq. (4) leads to the expression for optical flow velocity
Consider the relative motion between the camera and feature point by replacement of in Eq. (3) into Eq. (15), and eliminating with based on Eqs. (4) and (6), the optical flow equation can be computed as
IMU Measurement22 and 23, they can be obtained by the Allan variance technique.
With the purpose to integrate the two different sources of information collected by IMU and monocular-camera, an ESKF was designed. While the IMU information served to make predictions to the filter, the optical flow vision information was used to correct the filter. Compared to the general extend Kalman filter, which consists of states prediction step and update step, the ESKF considers the error states induced by the reset process, producing more precise results by reducing the long-term error drift in odometry system.
System State and Propagation
The estimator vector was the vector, and all the symbol definition can be found in Sec. 2.
The system kinematic equation that describes the time evolution of the state was similar to Hesch’s model,21 of which the corresponding linear discretized form of predicted state estimate is provided as
In order to derive the process noise covariance matrix, the error dynamic needed to be derived from the kinematic equation along its nominal form. We defined the error state in the estimate of a quantity for aircraft position, velocity, and IMU biases as . And it is exceptive for the quaternion error, which was defined by the angle-error vector , as . Then, the linearized first-order approximation of continuous-time error-state kinematics14,20,24 are provided as
In practical terms, the ESKF is realized in a time-discrete manner, with the update time interval , which was also seen as the sampling period of IMU signals. By using numerical integration of Eq. (20), we obtained the discrete error-state equation20 the following auxiliary series was introduced:
Here, means the incremental rotation matrix if rotating an arbitrary coordinate frame with a rotational rate of for timespan according to the Rodrigues rotation formula; therefore, the discrete linearized error dynamic transition matrix as a function of above series24 (for readability ) can be written as in matrix
The discrete process noise covariance matrix was calculated as
Thus, the error covariance propagation resulted as
Measurement Update Equation
According to Eq. (16), the optical flow measurement can be expressed by the estimated state vector. Since the aircraft angular rate was not part of the state vector, we replaced it with . Substituting the coefficient matrices associating with , , and in Eq. (16), and it would be clearer with notation listed as
Furthermore, by subtracting measured angular rate item and redefining the revised optical flow measurement as a new measurement of the filter, the right side was expressed only using the state vector ; therefore, the following notation remains as
So that the measurement residual, also called innovation, was given as
Note that the residual noise was involved with the optical flow algorithm error and gyroscope drift noise, i.e., , hence, the covariance matrix of the revised optical flow measurement was the sum of the covariance of optical flow algorithm and the gyroscope covariance , which was given as
In general, there were a set of optical flow vectors measured at one time, stacking all the observation residuals of Eq. (31) into one vector, which yielded, , where , , and were composed of the block elements , , and , such that the measurement covariance matrix could be stacked together diagonally as
Then, the a priori state estimate would be updated with current measurements according to the following steps: (1) calculate the Kalman gain as
Error-State Kalman Filter Reset
For the ESKF reset process, we considered the expression among the new error , the old error , and the observed error correction , as the true value was constant, yielding . Considering that the observed error has been injected into the updated nominal state, i.e., so that , we defined an error reset function , where the reset operation was to ensure the error would reset , meanwhile, the covariance of error needed to be modified as well,14 thus leading to15 defined as
The estimator employed a linearized and discretized version of the nonlinear system model (to simplify the analysis, we only consider one single feature point):25 which was defined by the error dynamic transition matrix and measurement Jacobian , i.e.,
Here , and the computing method for obtaining analogous expressions of the subblock elements can be found in Joel Hesch’s work.26 Furthermore, using these expressions of , we obtained the ’th block row of , for8. It is noted that and , as containing the nonzero integrals are time-varying matrices and have the linearly independent columns. So that we can only consider the remaining block elements of to form the basis of the right nullspace of , which is stated as follows:
The right nullspace of the local observability matrix for the linearized model is spanned by three unobservable directions:
Nullspace reveals that global rotations about the gravity vector is not observable, and global horizontal position is unobservable while height is fully observable. The remaining states, i.e., the velocity, pitch, and roll angle as well as IMU biases are also observable. The fact that the rotation around the gravity vector is unobservable is due to the missing of an absolute inertial yaw orientation, namely the yaw angle in the inertial frame is unobservable, while the absolute roll and pitch angle of aircraft w.r.t the global inertial frame is made observable by the gravity vector measurement. Similarly, without a global reference, the position in the direction of and is not observable, but what is different from previous findings, the altitude is observable because the assumption that optical flow features used for filter updates lie in a level plane and the plane contains the origin of the inertial frame. Only use the feature-plane assumption, according to Eq. (6), the feature depth is explicitly determined by absolute altitude and aircraft attitude, which becomes part of the components in the optical flow measurement equation and it, therefore, can get corrected when the new measurement update is coming.
To numerically investigate the previous analysis, the proposed navigation scheme was implemented in a simulation. As shown in Fig. 4, there were 100 features randomly placed in a horizontal plane around the origin. For generating the ground truth and sensor readings, we simulated a flight with a piecewise continuous trajectory, beginning with constant velocity at straight line motion lasting for 4 s, following by banking turn with a maximal angle of bank of 30 deg, and then a spiral motion with a pitch angle of up to 9 deg at the altitude between and , ending with a 50 s of circling. A full list of parameters is shown in Table 1, in which most of those quantities were referred in the commonly used IMU datasheet, e.g., MPU6050 or ADIS16448.
Simulation parameters and initial conditions of the filter [“rand (3, 1)” here represents a 3×1 random vector obeying standard normal distribution].
|IMU sampling rate (Hz)||100|
|Camera process rate (Hz)||30|
|Accelerometer noise |
|Gyro noise |
|Accelerometer bias noise [|
|Gyro bias noise [|
|Field of view (FOV) (deg)||90|
|Optical flow noise (pixel·rad/s)||0.01|
|State||Initial value||True value|
|Position (m)||True value + 50·rand(3,1)||[-50 -180 -200]|
|Velocity (m/s)||True value + 10·rand(3,1)||[20 0 0]|
|Quaternion||True value + 0.5·rand(3,1)||[0 0 0 1]|
|Accelerometer bias ()||[0 0 0]||[0.0981 0.0981 0.0981]|
|Gyro bias (deg/s)||[0 0 0]||[0.5 0.5 -0.5]|
The choices of the initial state values are given in Table 1, which is showing that the filter had fault tolerance to the initial state offset to an extent for convergence. In addition, the initial filter error covariance matrix was set as .
Even though using a single optical flow measurement ensured the observability, in a practical trial through proper setting of FOV and initial feature quantity, we would have at least seven features to be extracted every time step to obtain better convergence. After running 100 Monte Carlo simulations, we computed the Monte Carlo standard deviation, also known as the root mean squared error, denoted by the sample error covariance through collecting simulation results as
Here, at the ’th time step, was the true state, and was the estimated value in the ’th simulation. The Monte Carlo standard deviation and filter standard deviation are the square root of the elements in the main diagonal of and ESKF estimation error covariance, respectively.
In Figs. 5Fig. 6Fig. 7Fig. 8–9, it is shown the results of the proposed filter tracking ground truth for the position, attitude, velocity, and IMU bias. The shown results outline that most of the state estimations converge to ground truth occurred within 20 s, except the unobservable states, positions along and axes and the yaw angle, whose Monte Carlo standard deviation as well as the filter standard deviation at the final time maintained around values at the initial time. The unobservable states could remain with a constant error rather than absolute divergence since they could have proper evolution and IMU biases correction but lacking the initial reference. This correlated well with the observability analysis of horizontal position and yaw in Sec. 4.
Among the observable states, as the observability analysis disregards process noise and initial offset, as the filter standard deviation decreases promptly, the Monte Carlo standard deviation seems to converge to zero slowly. Comparing with curves of the gyroscope bias, the uncertainty of accelerometer bias presented a much slower convergence, which was induced by the gyroscope bias being explicitly included in the measurement equation and therefore directly corrected while the accelerometer bias only relied on states propagation and obtained the effect via coupling with other states. For the estimation of roll and pitch angle, height as well as vertical velocity, a better agreement between the Monte Carlo sample covariance and the filter error covariance than other states are shown in plots; nevertheless, the estimation of horizontal inertial and body-fixed velocity relied on mutual tight coupling effect with estimated attitude contained with accelerometer bias and unobservable yaw angle.
The proposed filter has been tested on an X-star quadcopter which was equipped with Pixhawk 4 and PX4Flow board. The optical flow camera consisted of a MT9V034 image CMOS sensor with global shutter and 16 mm M12 lens, , whose output to optical flow data was binned image at 10 Hz. The camera module could be considered as a pinhole camera with the mounting method for pointing the camera straight down. The employed IMU was a dual redundancy solution with BMI055 and ICM-20689, providing inertial measurements at 125 Hz.
Ground truth was collected by the conventional INS based on extended Kalman filter and attitude and heading reference system algorithm, which integrated IMU outputs, magnetometer readings, and barometric altitude with GPS data. Note that these extra sensors were only for generating true value and not a part of the proposed navigation system, nor used for estimates.
In Fig. 10 the outdoor trajectory during the autonomous controlled waypoint mode flight test is shown. The flight envelope was set with a maximum height at 10 m and maximum speed around . The quadcopter took off at the origin, as we had little information of the IMU, the estimated states were initialized with zero value as this was the only option. The covariance parameters could be selected based on hardware specifications, which did not require strong tuning as the framework could work well with a large range of parameters.
From results in Figs. 11Fig. 12–13, the filter operation was started in the flight at around and at a speed of . The roll and pitch angle exhibit a high-quality tracking accuracy except the growing error occurs when the vehicle was ready to descend for landing at the time after 150 s, as the horizontal velocity drops which was not favorable for the accuracy of optical flow measurement. Due to misalignment of the initial value, the yaw angle always kept a deviation with true value, which might make a major contribution to the estimated error of velocity along and axes, especially during as they deviate from ground truth with a greater oscillation. Under unknown noise of the hardware system or vehicle vibration, the horizontal velocity and horizontal position estimates approached their true value more slowly and less precisely, but vertical velocity, as well as altitude estimate, exhibited more satisfactory accuracy.
This paper has presented an optical flow-based approach to vision-aided inertial navigation. Under the assumption that the environment of the observed feature points can be seen as located on a plane, the feature depth can be obtained by deducing distance expression through the optical flow equation with a geometry-coordinate relationship based on a vehicle-camera-feature system. In order to fill in previous works’ gap, an error analysis for optical flow measurement using a block-matching method was provided, which was the prerequisite for adopting and designing an ESKF. With the IMU dead reckoning used for prediction, optical flow, and feature measurement for correction, the estimates of the vehicle’s altitude, attitude, velocity, and drift IMU biases can be obtained. The theoretical limitations and feasibility of the filter were investigated by observability analysis, which has shown that only the absolute position along and axes along with the yaw angle were unobservable. Finally, the Monte Carlo simulation and hardware involved experiment validated the statement in observability analysis.
The numerical and real experiments have shown that the effectiveness and performance of the proposed filter may suffer from initialization errors and noise. Considering that the minimal sensor configuration was low-priced and the adopted method required less computational effort, the overall system could be an onboard estimator for providing real-time supplemental state estimate when the UAV would be placed in an unknown GPS-denied environment.
Appendix I: Expression of Error Dynamic Transition Matrix Subblock Elements and Calculation of
According to Ref. 26, the sub-block elements used in this paper are shown as
Moreover, with these expressions, and based on Eqs. (33)–(36), the ’th block row of was calculated, for
Appendix II: The Proof of Theorem I
To verify the result, we just directly multiplied each block row of with .
Since , both and had such structure , which were marked by and , respectively, hence
Given , , and , it yielded , , so that we had .
The first author wishes to thank Professor Walter Fichter and Dr. Lorenz Schmitt for their useful comments and language editing, which have greatly improved the manuscript. The authors also would like to thank the China Scholarship Council for proving the funding that made this research possible.
Jia Deng received her BS degree at Beihang University, Beijing, in 2010, and she is currently pursuing her PhD at Beihang University, Beijing, China. She mainly studies navigation, control theory, and corresponding application and recently has focused on the data fusion problems and cooperative formation flight control system.
Sentang Wu received his PhD in dynamics, ballistics, and aircraft motion control systems from Ukraine National Aviation University in 1992. He mainly studies the theory and application of nonlinear stochastic systems, computer information processing, and control and aircraft cooperative control. He is currently a professor of automation science and electrical engineering and a doctoral tutor at Beihang University.
Hongbo Zhao received her BS degree at Beihang University, Beijing, and she is currently pursuing her PhD at Beihang University, Beijing, China. She mainly studies the control theory and application, distributed state estimation of stochastic systems, which focus on multiagent systems, and cooperative formation flight control systems with uncertainties.