We introduce a navigation filter fused with information of visual optical flow and data collected from an inertial measurement unit during GPS signal degradation. Under the assumption that the tracked feature points are located on a level plane, the feature depth can be explicitly expressed and an exact measurement model was derived. Moreover, an error model analysis for a block-matching-based optical flow algorithm has been investigated. The measurement error follows a Gaussian distribution, which is a prerequisite for leveraging the error-state Kalman filter. Subsequently, a local observability analysis of the proposed filter was performed yielding the expression of three unobservable directions. We emphasize the ability of the proposed filter to estimate the aircraft’s state, especially for accurate altitude estimation, without any help of prior knowledge or extra sensors. Finally, an extensive Monte Carlo analysis was used to verify the findings in the observability results showing that all states can be estimated except the absolute horizontal positions and rotation around the gravity vector. The effectiveness of the proposed filter is demonstrated through experimental hardware used to acquire outdoor flight test data.

## 1.

## Introduction

During the past decades, there has been a growing research interest in developing vision-based or vision-aided navigation systems for unmanned aerial vehicles (UAVs) under circumstances without prior information.^{1}^{–}^{4} The traditional GPS/inertial navigation system (INS) may face severe drift and inaccuracy for the estimation, especially in environments where GPS is not reliable. Integrated inertial systems using sensors such as radar, LiDAR, or sonar can be regarded as a possible method to improve accuracy.^{5} However, radar or LiDAR may increase the cost, weight, and power consumption for an airborne platform, while low-cost sonar always shows limited performance for flights in outdoor scenes. Camera-aided navigation and visual odometry, on the other hand, provide rich information about the environment and the camera pose, which seems to be a superior solution to the above problems.

In recent years, simultaneous localization and mapping gathered a lot of attention since it can provide accurate and real-time estimates of six-dimensional parameters. Nevertheless, it is subject to a series of strict requirements such as high-quality images for feature extraction, hardware with powerful onboard processor needed as computational complexity increase related to the quantity of features, and repeated tracking for the same image features over longer periods of time is needed to realize loop-closure.^{6} Moreover, there are few applications for fixed-wing aircraft that are faster and higher than quadcopters in an outdoor unstructured environment. Optical flow-aided INS evades the need for mapping and landmark data and shows better tolerance in terms of handling scenes with less salient features and little contrast texture. In addition, it has a compact image representation and does not require the extra burden for image storage without loss of rich navigation information on the translation and rotational velocity as well as the scene depth.^{7} It is more suitable for implementation in systems with limited resources. Previous tests have been conducted to compare the performance between different optical flow algorithms for UAV navigation, as introduced in Ref. 8. The optical flow observation error is usually modeled with the default zero mean normal distribution without any explanations.^{9} We turn to computer vision scientists for a better answer, but it was found that they mainly focus on the evaluation methodology to give new statistics for indicating the performance.^{10} They have raised definitions such as “absolute flow endpoint error” and “frame interpolation error,”^{11} rather than proposing a specific model analysis for the flow error. Furthermore, the use of INS aided with optical flow generated by a single camera often adopts an additional ultrasound range sensor to measure the distance to the ground,^{12} which solves the ambiguity of the scale factor between the translational velocity magnitude and the scene depth.^{9} Another commonly used method assumes that the specific maneuvers for UAV could be designed to keep a constant height or attitude. Five optimization constraints-based methods for the estimation of ego-motion were presented in Florian Raudies’ review,^{10} which can successively solve the problem. As an example, Thomas P. Webb^{13} leveraged epipolar constraint of feature points as the measurement equation in an implicit extended Kalman filter, also termed the essential or coplanarity constraint, and optical flow subspace constraint is similar to its differential version. However, the filter does not perform well in estimating the velocity states as the epipolar constraint weakly relates to velocity. A bilinear constraint-based visual-inertial scheme for relative motion estimation has been introduced.^{7} The observability analysis of INS errors for explicit measurements of a subspace constraint-based optical flow method was investigated in Jacques Waldmann’s work.^{14} These approaches require sufficient numbers of optical flow features to obtain the corresponding linear equations for the unknowns. Xu et al.^{2} presented an autonomous landing method using planar terrain features tracked in sequential images. Using the assumption of a level plane, therefore allowing for the scale of the line-of-sight vector to be expressed explicitly.

Compared to existing work, the main contributions of the proposed research focused on (1) a detailed error model analysis for optical flow measurement based on block-matching algorithm was investigated and showed to obey a Gaussian distribution and therefore suitable for error-state Kalman filter (ESKF); (2) the feature depth was obtained without extra-measurement device through the assumption of flat terrain, which could be reasonably used for outdoor navigation missions when UAVs fly at a high altitude. In the flat terrain assumption, the height proved to be observable by local observability analysis, which was further verified by simulation and experimental results. In addition, a more detailed development of the ESKF is proposed, which fused the information from IMU and optical flow originally described in Refs. 15 and 16. The paper is organized as follows: Sec. 2 describes the problem formulation, introduces the adopted sensors’ dynamical model, reviews the computation of an optical flow block-matching algorithm, and derives the measurement error analysis. Section 3 outlines the structure of the ESKF to estimate the full states. Section 4 presents the observability analysis. Sections 5 and 6 show the simulation, experimental results, and corresponding discussion; the final conclusions are given in Sec. 7.

## 2.

## Problem Formulation

The geometry of the general case of an aircraft equipped with an IMU and a monocular camera is shown in Fig. 1. Symbol $\{B\}$ represents the aircraft body-fixed frame, and the vehicle relative position to the global inertial reference frame $\{G\}$ is denoted by the vector ${}^{G}\mathbf{r}$. The camera frame $\{C\}$ is fixed to the aircraft with a mounting displacement ${}^{B}\mathbf{\Delta}$. We denote the coordinates of the $j$’th feature point ${P}_{j}$ in the frame $\{G\}$ by ${{}^{G}\mathbf{P}}_{j}$, while ${}^{C}{\mathbf{P}}_{j}$ denotes the relative position of ${P}_{j}$ w.r.t. the camera frame $\{C\}$. The transformation matrices ${}_{B}T_{G}$ and ${}_{C}T_{B}$ describe the rotations from $\{G\}$ to $\{B\}$ and $\{B\}$ to $\{C\}$, respectively, where ${}_{C}T_{B}$ is dependent on the mounting of the camera to the aircraft. The relative position of ${P}_{j}$ can be expressed in the camera frame $\{C\}$ as

## Eq. (1)

$${}^{C}{\mathbf{P}}_{j}={}_{C}T_{B}{}_{B}T_{G}({{}^{G}\mathbf{P}}_{j}-{}^{G}\mathbf{r})-{}_{C}T_{B}{}^{B}\mathbf{\Delta}.$$Considering the differential equation with respect to the reference frame $\{G\}$ would require calculating the derivative of the rotation transformation matrix. Therefore, the rotational velocities of the vehicle expressed in the frame $\{B\}$ and $\{C\}$ are represented by ${}^{B}\mathbf{\omega}$ and ${}^{C}\mathbf{\omega}$, respectively, resulting as

## Eq. (2)

$${{}^{C}\dot{\mathbf{P}}}_{j}={}_{C}T_{B}{}_{B}T_{G}({{}^{G}\dot{\mathbf{P}}}_{j}-{}^{G}\dot{\mathbf{r}})-{}_{C}T_{B}({}^{B}\dot{\mathbf{\Delta}}+{}^{B}\mathbf{\omega}\times {}^{B}\mathrm{\Delta})-{}_{C}T_{B}\xb7{}^{B}\mathbf{\omega}\times {{}^{C}\mathbf{P}}_{j},$$## Eq. (3)

$${{}^{C}\dot{\mathbf{P}}}_{j}=-{}_{B}T_{G}{}^{G}\dot{\mathbf{r}}-{}^{B}\mathbf{\omega}\times {{}^{C}\mathbf{P}}_{j}.$$## 2.1.

### Pinhole Camera Model

This work used a pinhole camera model, whose details are shown in Fig. 1. The camera coordinate frame is defined by ${}^{C}z$ paralleling to the optical axis, ${}^{C}x$ and ${}^{C}y$ matching the horizontal and vertical image directions. The projection results of the feature ${P}_{j}$ on the image plane are expressed as

## Eq. (4)

$$\left[\begin{array}{l}{\mu}_{j}\\ {\nu}_{j}\end{array}\right]=\frac{f}{{{}^{C}z}_{j}}\left[\begin{array}{c}{}^{C}{x}_{j}\\ {}^{C}{y}_{j}\end{array}\right],$$## Eq. (5)

$${{}^{C}\mathbf{P}}_{j}={}_{B}T_{G}({{}^{G}\mathbf{P}}_{j}-{}^{G}\mathbf{r})={[\begin{array}{ccc}{{}^{C}x}_{j}& {{}^{C}y}_{j}& {{}^{C}z}_{j}\end{array}]}^{T}=\frac{{{}^{C}z}_{j}}{f}{[\begin{array}{ccc}{\mu}_{j}& {\nu}_{j}& f\end{array}]}^{T}.$$When considering the flight environment for fixed-wing aircraft, which generally holds the flight height above hundreds of meters, flat terrain such as plain and farm field the topographic relief can be neglected compared to the flight height. In such scenes, we assume that the visual features lie in a level plane. Further, without loss of generality, by assuming the feature plane contains the origin of the global inertial frame, we obtained ${\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{P}}_{j}=0$, where ${\mathbf{e}}_{3}={[\begin{array}{ccc}0& 0& 1\end{array}]}^{T}$. We also adopted the unit quaternion ${}_{B}\mathbf{q}_{G}={q}_{0}\mathbf{i}+{q}_{1}\mathbf{j}+{q}_{2}\mathbf{k}+{q}_{3}$ to represent the rotation from the global coordinate frame $\{G\}$ to the body-fixed coordinate frame $\{B\}$ at the time $t$. Hence, ${}_{B}T_{G}$ can be replaced by $C({}_{B}\mathbf{q}_{G})$ which describes the rotational matrix corresponding to ${}_{B}\mathbf{q}_{G}$. Thus, the scale of line-of-sight vector and also the depth component of ${}^{C}{\mathbf{P}}_{j}$, namely ${{}^{C}z}_{j}$ in Eq. (5) can be represented as

## 2.2.

### Optical Flow Computation

After the characterization of the motion equation of the camera and pinhole camera model, we focused on the optical flow computation method. In this work, the optical flow was determined as the displacement between two successive images divided by the time interval between their capture. Based on the local brightness constancy assumption and small motion assumption, the block-matching algorithm approximates the image motion by a displacement, which yielded the best match between image regions. The concept of the procedure, shown in Fig. 2, as well as the specific algorithm for displacement computation outlined in Fig. 3, starts with distinct feature extraction and ends up with minimizing the SSD cost function among the search window,^{9}^{,}^{17} and the result provided the displacement $\mathbf{d}({\mathbf{X}}_{k})$ of the current feature, which can be formulated as

## Eq. (7)

$$\mathrm{SSD}({\mathbf{X}}_{k},\mathrm{d}x,\mathrm{d}y)=\sum _{i={x}_{k}}^{{x}_{k}+b-1}\sum _{j={y}_{k}}^{{y}_{k}+b-1}{[{I}_{O}(i,j,t)-{I}_{C}(i+dx,j+dy,t+\mathrm{\Delta}t)]}^{2},$$## Eq. (8)

$$\mathbf{d}({\mathbf{X}}_{k})=\underset{-w\le dx,dy\le w}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}\text{\hspace{0.17em}}\mathrm{SSD}({\mathbf{X}}_{k},dx,dy),$$Furthermore, we proved that the error of optical flow computation based on the SSD block-matching algorithm is subject to a Gaussian distribution, which is not fully explained by most reviewed articles, such as Farid Kendoul’s work.^{9} Therefore, we redefined the whole algorithm as a maximum likelihood estimation problem. The displacement of the pixel can be seen as a warp function $\omega $, which transforms a pixel point ${\mathbf{p}}_{k}({x}_{k},{y}_{k})$ in the original image frame (${I}_{O}$) into a point ${\mathbf{p}}_{k}^{\prime}({x}_{k}^{\prime},{y}_{k}^{\prime})$ in the current image frame (${I}_{C}$) by a 2-D deformation vector $\mathbf{\zeta}$, namely ${\mathbf{p}}_{k}({x}_{k},{y}_{k})\in {I}_{O}$, $\omega ({\mathbf{p}}_{k},\mathbf{\zeta})={\mathbf{p}}_{k}^{\prime}({x}_{k}^{\prime},{y}_{k}^{\prime})\in {I}_{C}$. Based on the local brightness constancy assumption, which is given as

## Eq. (9)

$${I}_{O}({\mathbf{p}}_{k})={I}_{C}[\omega ({\mathbf{p}}_{k},\mathbf{\zeta})]+{\epsilon}_{k},$$## Eq. (10)

$$P[{I}_{O}({\mathbf{p}}_{k})|\mathbf{\zeta}]=\frac{1}{\sigma \sqrt{2\pi}}\text{\hspace{0.17em}}\mathrm{exp}(-\frac{{\{{I}_{O}({\mathbf{p}}_{k})-{I}_{C}[\omega ({\mathbf{p}}_{k},\mathbf{\zeta})]\}}^{2}}{2{\sigma}^{2}}).$$Hence, we derived the likelihood function for the pixel samples as

## Eq. (11)

$$L(\mathbf{\xi})=P[{I}_{O}({\mathbf{p}}_{1}),{I}_{O}({\mathbf{p}}_{2})\cdots {I}_{O}({\mathbf{p}}_{N})|\mathbf{\zeta}]=\underset{k=1}{\overset{n}{\mathrm{\Pi}}}P[{I}_{O}({\mathbf{p}}_{k})|\mathbf{\zeta}]=\underset{k=1}{\overset{n}{\mathrm{\Pi}}}C\xb7\mathrm{exp}(-\frac{{\{{I}_{O}({\mathbf{p}}_{k})-{I}_{C}[\omega ({\mathbf{p}}_{k},\mathbf{\zeta})]\}}^{2}}{2{\sigma}^{2}}),$$## Eq. (12)

$$\mathrm{log}\text{\hspace{0.17em}}L(\mathbf{\zeta})=-\frac{1}{2{\sigma}^{2}}\sum _{k=1}^{n}{\{{I}_{O}({\mathbf{p}}_{k})-{I}_{C}[\omega ({\mathbf{p}}_{k},\mathbf{\zeta})]\}}^{2}+n\text{\hspace{0.17em}}\mathrm{log}\text{\hspace{0.17em}}C,$$The maximum likelihood estimate (MLE) is proven to be asymptotically normally distributed,^{18}^{,}^{19} and the estimated covariance error of MLE is given by the inverse of the observed Fisher information matrix; therefore, the maximum likelihood estimator can be expressed as

## Eq. (13)

$${\widehat{\mathbf{\zeta}}}_{\mathrm{ML}}\stackrel{a}{\sim}N[{\mathbf{\zeta}}_{0},\frac{1}{n\xb7J({\mathbf{\zeta}}_{0})}]=N[{\mathbf{\zeta}}_{0},\frac{1}{J({\widehat{\mathbf{\zeta}}}_{\mathrm{ML}})}],$$## Eq. (14)

$$J({\widehat{\mathbf{\zeta}}}_{\mathrm{ML}})=-E\left\{\right[\frac{{\partial}^{2}}{\partial x\partial y}\text{\hspace{0.17em}}\mathrm{log}\text{\hspace{0.17em}}L(\mathbf{\zeta})\left]\right\},$$^{18}Since the optical flow $\dot{\mathbf{\zeta}}$ was calculated with the displacement motion divided by interframe time, we saw that the optical flow error maintained a Gaussian distribution, which was the prerequisite for adopting the Kalman filter.

Next, we deduce the measurement equation of optical flow. The projected pixel coordinates of the feature point ${P}_{j}$ on the image plane are given by Eq. (4), taking the time-derivatives of both sides of Eq. (4) leads to the expression for optical flow velocity

## Eq. (15)

$${\dot{\mathbf{\zeta}}}_{j}=\left[\begin{array}{l}{\dot{\mu}}_{j}\\ {\dot{\nu}}_{j}\end{array}\right]=\frac{f}{{}^{C}{z}_{j}^{2}}\left[\begin{array}{ccc}{}^{C}{z}_{j}& 0& -{{}^{C}x}_{j}\\ 0& {{}^{C}z}_{j}& -{}^{C}{y}_{j}\end{array}\right]{{}^{C}\dot{\mathbf{P}}}_{j}.$$Consider the relative motion between the camera and feature point by replacement of ${{}^{C}\dot{\mathbf{P}}}_{j}$ in Eq. (3) into Eq. (15), and eliminating $[{}^{C}{x}_{j},{}^{C}{y}_{j},{}^{C}{z}_{j}]$ with $[{\mu}_{j},{\nu}_{j},f]$ based on Eqs. (4) and (6), the optical flow equation can be computed as

## Eq. (16)

$${\dot{\mathbf{\zeta}}}_{j}=\left[\begin{array}{l}{\dot{\mu}}_{j}\\ {\dot{\nu}}_{j}\end{array}\right]=-\frac{{\mathbf{e}}_{3}^{T}{C}^{T}({}_{B}\mathbf{q}_{G})\frac{1}{f}\left[\begin{array}{l}{\mu}_{j}\\ {\nu}_{j}\\ f\end{array}\right]}{{\mathbf{e}}_{3}^{T}{}^{G}\mathbf{r}}\left[\begin{array}{ccc}-f& 0& {\mu}_{j}\\ 0& -f& {\nu}_{j}\end{array}\right]C({}_{B}\mathbf{q}_{G}){}^{G}\mathbf{v}+\left[\begin{array}{ccc}\frac{{\mu}_{j}{\nu}_{j}}{f}& -(f+\frac{{\mu}_{j}^{2}}{f})& {\nu}_{j}\\ f+\frac{{\nu}_{j}^{2}}{f}& -\frac{{\mu}_{j}{\nu}_{j}}{f}& -{\mu}_{j}\end{array}\right]{}^{B}\mathbf{\omega}+{\mathbf{\xi}}_{j},$$## 2.3.

### IMU Measurement

The IMU outputs are measured quantities in the body-fixed frame, angular rate ${\mathbf{\omega}}_{m}$ and acceleration ${\mathbf{a}}_{m}$, which were modeled as^{20}^{,}^{21}

## Eq. (17)

$$\mathbf{\omega}(t)={\mathbf{\omega}}_{m}(t)-{\mathbf{b}}_{\mathbf{\omega}}(t)-{\mathbf{n}}_{\mathbf{\omega}}(t),\phantom{\rule{0ex}{0ex}}{\dot{\mathbf{b}}}_{\mathbf{\omega}}(t)={\mathbf{n}}_{\mathbf{b}\mathbf{\omega}}(t),\phantom{\rule{0ex}{0ex}}\mathbf{a}(t)={C}^{T}[{}_{B}\mathbf{q}_{G}(t)][{\mathbf{a}}_{m}(t)-{\mathbf{b}}_{\mathbf{a}}(t)-{\mathbf{n}}_{\mathbf{a}}(t)]+\mathbf{g},\phantom{\rule{0ex}{0ex}}{\dot{\mathbf{b}}}_{\mathbf{a}}(t)={\mathbf{n}}_{\mathbf{ba}}(t),$$## 3.

## Estimator Description

With the purpose to integrate the two different sources of information collected by IMU and monocular-camera, an ESKF was designed. While the IMU information served to make predictions to the filter, the optical flow vision information was used to correct the filter. Compared to the general extend Kalman filter, which consists of states prediction step and update step, the ESKF considers the error states induced by the reset process, producing more precise results by reducing the long-term error drift in odometry system.

## 3.1.

### System State and Propagation

The estimator vector was the $16\times 1$ vector, and all the symbol definition can be found in Sec. 2.

## Eq. (18)

$$\mathbf{x}\triangleq {[\begin{array}{ccccc}{{}^{G}\mathbf{r}}^{T}& {}^{G}{\mathbf{v}}^{T}& {}_{B}\mathbf{q}_{G}^{T}& {\mathbf{b}}_{\mathbf{a}}^{T}& {\mathbf{b}}_{\mathbf{\omega}}^{T}\end{array}]}^{T}.$$The system kinematic equation that describes the time evolution of the state was similar to Hesch’s model,^{21} of which the corresponding linear discretized form of predicted state estimate is provided as

## Eq. (19)

$${}^{G}{\widehat{\mathbf{r}}}_{k+1}^{-}={}^{G}{\widehat{\mathbf{r}}}_{k}^{+}+\mathrm{\Delta}t{}^{G}{\widehat{\mathbf{v}}}_{k}^{+}+\frac{\mathrm{\Delta}{t}^{2}}{2}[{\widehat{C}}_{k}^{+T}({\mathbf{a}}_{m,k}-{\widehat{\mathbf{b}}}_{\mathbf{a},k}^{+})+\mathbf{g}],\phantom{\rule{0ex}{0ex}}{}^{G}{\widehat{\mathbf{v}}}_{k+1}^{-}={}^{G}{\widehat{\mathbf{v}}}_{k}^{+}+\mathrm{\Delta}t[{\widehat{C}}_{k}^{+T}({\mathbf{a}}_{m,k}-{\widehat{\mathbf{b}}}_{\mathbf{a},k}^{+})+\mathbf{g}],\phantom{\rule{0ex}{0ex}}{{}_{B}\widehat{\mathbf{q}}}_{G,k+1}^{-}={{}_{B}\widehat{\mathbf{q}}}_{G,k}^{+}+\frac{\mathrm{\Delta}t}{2}\mathrm{\Omega}({\mathbf{\omega}}_{m,k}-{\widehat{\mathbf{b}}}_{\mathbf{\omega},k}^{+})\xb7{{}_{B}\widehat{\mathbf{q}}}_{G,k}^{+},\phantom{\rule{0ex}{0ex}}{\widehat{\mathbf{b}}}_{\mathbf{a},k+1}^{-}={\widehat{\mathbf{b}}}_{\mathbf{a},k}^{+},\phantom{\rule{0ex}{0ex}}{\widehat{\mathbf{b}}}_{\mathbf{\omega},k+1}^{-}={\widehat{\mathbf{b}}}_{\mathbf{\omega},k}^{+}.$$In order to derive the process noise covariance matrix, the error dynamic needed to be derived from the kinematic equation along its nominal form. We defined the error state in the estimate $\widehat{x}$ of a quantity $x$ for aircraft position, velocity, and IMU biases as $\delta x=x-\widehat{x}$. And it is exceptive for the quaternion error, which was defined by the $3\times 1$ angle-error vector $\delta \mathbf{\theta}$, as $\delta \mathbf{q}\triangleq {[\begin{array}{cc}\frac{1}{2}\delta {\mathbf{\theta}}^{T}& 1\end{array}]}^{T}$. Then, the linearized first-order approximation of continuous-time error-state kinematics^{14}^{,}^{20}^{,}^{24} are provided as

## Eq. (20)

$$\delta \dot{\mathbf{r}}(t)=\delta \mathbf{v}(t),\phantom{\rule{0ex}{0ex}}\delta \dot{\mathbf{v}}(t)\approx -{C}^{T}({}_{B}\widehat{\mathbf{q}}_{G}(t))\lfloor [{\mathbf{a}}_{m}(t)-{\widehat{\mathbf{b}}}_{\mathbf{a}}(t)]\times \rfloor \delta \mathbf{\theta}(t)-{C}^{T}[{}_{B}\widehat{\mathbf{q}}_{G}(t)]\delta {\mathbf{b}}_{\mathbf{a}}(t)-{C}^{T}[{}_{B}\widehat{\mathbf{q}}_{G}(t)]{\mathbf{n}}_{\mathbf{a}}(t),\phantom{\rule{0ex}{0ex}}\delta \dot{\theta}(t)\approx -\lfloor [{\mathbf{\omega}}_{m}(t)-{\widehat{\mathbf{b}}}_{\mathbf{\omega}}(t)]\times \rfloor \delta \mathbf{\theta}(t)-\delta {\mathbf{b}}_{\mathbf{\omega}}-{\mathbf{n}}_{\mathbf{\omega}}(t),\phantom{\rule{0ex}{0ex}}\delta {\dot{\mathbf{b}}}_{\mathbf{a}}(t)={\mathbf{n}}_{\mathbf{ba}}(t),\phantom{\rule{0ex}{0ex}}\delta {\dot{\mathbf{b}}}_{\mathbf{\omega}}(t)={\mathbf{n}}_{\mathbf{b}\mathbf{\omega}}(t).$$In practical terms, the ESKF is realized in a time-discrete manner, with the update time interval $\mathrm{\Delta}t={t}_{k+1}-{t}_{k}$, which was also seen as the sampling period of IMU signals. By using numerical integration of Eq. (20), we obtained the discrete error-state equation

## Eq. (21)

$$\delta \mathbf{x}={[\begin{array}{ccccc}\delta {\mathbf{r}}^{T}& \delta {\mathbf{v}}^{T}& \delta {\mathbf{\theta}}^{T}& \delta {\mathbf{b}}_{\mathbf{a}}^{T}& \delta {\mathbf{b}}_{\mathbf{\omega}}^{T}\end{array}]}^{T},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\delta {\mathbf{x}}_{k+1}={\mathbf{\Phi}}_{k}\delta {\mathbf{x}}_{k}+{\mathbf{G}}_{k}{\mathbf{n}}_{k},$$^{20}the following auxiliary series was introduced:

## Eq. (22)

$${\mathbf{\Gamma}}_{n}^{T}(\mathbf{\omega})=\sum _{k=0}^{\infty}\frac{\mathrm{\Delta}{t}^{k+n}}{(k+n)!}{\lfloor -\mathbf{\omega}\times \rfloor}^{k},$$## Eq. (23)

$${\mathbf{\Gamma}}_{0}^{T}(\mathbf{\omega})=\sum _{k=0}^{\infty}\frac{{(\lfloor -\mathbf{\omega}\times \rfloor \mathrm{\Delta}t)}^{k}}{k!}={e}^{\lfloor -\mathbf{\omega}\times \rfloor \mathrm{\Delta}t}=R{\{\mathbf{\omega}\mathrm{\Delta}t\}}^{T},\phantom{\rule{0ex}{0ex}}{\mathbf{\Gamma}}_{1}^{T}(\mathbf{\omega})=I\xb7\mathrm{\Delta}t-\frac{(\lfloor -\mathbf{\omega}\times \rfloor )}{{\Vert \mathbf{\omega}\Vert}^{2}}\xb7[R{\{\mathbf{\omega}\mathrm{\Delta}t\}}^{T}-I-(\lfloor -\mathbf{\omega}\times \rfloor \mathrm{\Delta}t)],\phantom{\rule{0ex}{0ex}}{\mathbf{\Gamma}}_{2}^{T}(\mathbf{\omega})=\frac{1}{2}I\xb7\mathrm{\Delta}{t}^{2}-\frac{1}{{\Vert \mathbf{\omega}\Vert}^{2}}\xb7[R{\{\mathbf{\omega}\mathrm{\Delta}t\}}^{T}-I-(\lfloor -\mathbf{\omega}\times \rfloor \mathrm{\Delta}t)-\frac{1}{2}{(\lfloor -\mathbf{\omega}\times \rfloor \mathrm{\Delta}t)}^{2}],\phantom{\rule{0ex}{0ex}}{\mathbf{\Gamma}}_{3}^{T}(\mathbf{\omega})=\frac{1}{3!}I\xb7\mathrm{\Delta}{t}^{3}+\frac{(\lfloor -\mathbf{\omega}\times \rfloor )}{{\Vert \mathbf{\omega}\Vert}^{2}}\xb7[R{\{\mathbf{\omega}\mathrm{\Delta}t\}}^{T}-I-(\lfloor -\mathbf{\omega}\times \rfloor \mathrm{\Delta}t)-\frac{1}{2}{(\lfloor -\mathbf{\omega}\times \rfloor \mathrm{\Delta}t)}^{2}-\frac{1}{3!}{(\lfloor -\mathbf{\omega}\times \rfloor \mathrm{\Delta}t)}^{3}].$$Here, ${\mathbf{\Gamma}}_{0}^{T}$ means the incremental rotation matrix if rotating an arbitrary coordinate frame with a rotational rate of $-\mathbf{\omega}$ for timespan $\mathrm{\Delta}t$ according to the Rodrigues rotation formula; therefore, the discrete linearized error dynamic transition matrix ${\mathbf{\Phi}}_{k}$ as a function of above series^{24} (for readability ${\widehat{\mathbf{\Gamma}}}_{n}^{T}={\mathbf{\Gamma}}_{n}^{T}[{\mathbf{\omega}}_{m}(k)-{\widehat{\mathbf{b}}}_{\mathbf{\omega}}(k)]$) can be written as in matrix

## Eq. (24)

$${\mathbf{\Phi}}_{k}=\left[\begin{array}{ccccc}{\mathbf{I}}_{3}& \mathrm{\Delta}t{\mathbf{I}}_{3}& -\frac{\mathrm{\Delta}{t}^{2}}{2}{C}^{T}[{}_{B}\widehat{\mathbf{q}}_{G}(k)]\lfloor [{\mathbf{a}}_{m}(k)-{\widehat{\mathbf{b}}}_{\mathbf{a}}(k)]\times \rfloor & -\frac{\mathrm{\Delta}{t}^{2}}{2}{C}^{T}[{}_{B}\widehat{\mathbf{q}}_{G}(k)]& {0}_{3\times 3}\\ {0}_{3\times 3}& {\mathbf{I}}_{3}& -\mathrm{\Delta}t{C}^{T}[{}_{B}\widehat{\mathbf{q}}_{G}(k)]\lfloor [{\mathbf{a}}_{m}(k)-{\widehat{\mathbf{b}}}_{\mathbf{a}}(k)]\times \rfloor & -\mathrm{\Delta}t{C}^{T}[{}_{B}\widehat{\mathbf{q}}_{G}(k)]& {0}_{3\times 3}\\ {0}_{3\times 3}& {0}_{3\times 3}& {\widehat{\mathbf{\Gamma}}}_{0}^{T}& {0}_{3\times 3}& -{\widehat{\mathbf{\Gamma}}}_{1}^{T}\\ {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}& {\mathbf{I}}_{3}& {0}_{3\times 3}\\ {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}& {\mathbf{I}}_{3}\end{array}\right],$$## Eq. (25)

$${\mathbf{G}}_{k}=\left[\begin{array}{cccc}{0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}\\ -{\mathbf{I}}_{3}& {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}\\ {0}_{3\times 3}& -{\mathbf{I}}_{3}& {0}_{3\times 3}& {0}_{3\times 3}\\ {0}_{3\times 3}& {0}_{3\times 3}& {\mathbf{I}}_{3}& {0}_{3\times 3}\\ {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}& {\mathbf{I}}_{3}\end{array}\right].$$The discrete process noise covariance matrix ${\mathbf{Q}}_{k}$ was calculated as

## Eq. (26)

$${\mathbf{Q}}_{k}=\left[\begin{array}{ccccc}\frac{\mathrm{\Delta}{t}^{3}}{3}{\mathbf{Q}}_{\mathbf{a}}+\frac{\mathrm{\Delta}{t}^{5}}{20}{\mathbf{Q}}_{\mathbf{ba}}& \frac{\mathrm{\Delta}{t}^{2}}{2}{\mathbf{Q}}_{\mathbf{a}}+\frac{\mathrm{\Delta}{t}^{4}}{8}{\mathbf{Q}}_{\mathbf{ba}}& {0}_{3\times 3}& -\frac{\mathrm{\Delta}{t}^{3}}{6}{C}^{T}[{}_{B}\widehat{\mathbf{q}}_{G}(k)]{\mathbf{Q}}_{\mathbf{ba}}& {0}_{3\times 3}\\ \frac{\mathrm{\Delta}{t}^{2}}{2}{\mathbf{Q}}_{\mathbf{a}}+\frac{\mathrm{\Delta}{t}^{4}}{8}{\mathbf{Q}}_{\mathbf{ba}}& \mathrm{\Delta}t{\mathbf{Q}}_{\mathbf{a}}+\frac{\mathrm{\Delta}{t}^{3}}{3}{\mathbf{Q}}_{\mathbf{ba}}& {0}_{3\times 3}& -\frac{\mathrm{\Delta}{t}^{2}}{2}{C}^{T}[{}_{B}\widehat{\mathbf{q}}_{G}(k)]{\mathbf{Q}}_{\mathbf{ba}}& {0}_{3\times 3}\\ {0}_{3\times 3}& {0}_{3\times 3}& \mathrm{\Delta}t{\mathbf{Q}}_{\mathbf{\omega}}+({\widehat{\mathbf{\Gamma}}}_{3}^{T}+{\widehat{\mathbf{\Gamma}}}_{3})\xb7{\mathbf{Q}}_{\mathbf{b}\mathbf{\omega}}& {0}_{3\times 3}& -{\widehat{\mathbf{\Gamma}}}_{2}^{T}\xb7{\mathbf{Q}}_{\mathbf{b}\mathbf{\omega}}\\ -\frac{\mathrm{\Delta}{t}^{3}}{6}C[{}_{B}\widehat{\mathbf{q}}_{G}(k)]{\mathbf{Q}}_{\mathbf{ba}}& -\frac{\mathrm{\Delta}{t}^{2}}{2}C[{}_{B}\widehat{\mathbf{q}}_{G}(k)]{\mathbf{Q}}_{\mathbf{ba}}& {0}_{3\times 3}& \mathrm{\Delta}t{\mathbf{Q}}_{\mathbf{ba}}& {0}_{3\times 3}\\ {0}_{3\times 3}& {0}_{3\times 3}& -{\widehat{\mathbf{\Gamma}}}_{2}\xb7{\mathbf{Q}}_{\mathbf{b}\omega}& {0}_{3\times 3}& \mathrm{\Delta}t{Q}_{\mathbf{b}\mathbf{\omega}}\end{array}\right].$$Thus, the error covariance propagation resulted as

## 3.2.

### Measurement Update Equation

According to Eq. (16), the optical flow measurement can be expressed by the estimated state vector. Since the aircraft angular rate was not part of the state vector, we replaced it with ${}^{B}\widehat{\mathbf{\omega}}={\mathbf{\omega}}_{m}(t)-{\widehat{\mathbf{b}}}_{\mathbf{\omega}}(t)$. Substituting the coefficient matrices associating with ${\mu}_{j}$, ${\nu}_{j}$, and $f$ in Eq. (16), and it would be clearer with notation listed as

## Eq. (28)

$${\pi}_{{\mathbf{P}}_{\mathbf{j}}}\triangleq \frac{1}{f}{[\begin{array}{ccc}{\mu}_{j}& {\nu}_{j}& f\end{array}]}^{T},\phantom{\rule[-0.0ex]{1em}{0.0ex}}{A}_{{\mathbf{P}}_{\mathbf{j}}}\triangleq \left[\begin{array}{ccc}-f& 0& {\mu}_{j}\\ 0& -f& {\nu}_{j}\end{array}\right],\phantom{\rule[-0.0ex]{1em}{0.0ex}}{B}_{{\mathbf{P}}_{\mathbf{j}}}\triangleq \left[\begin{array}{ccc}\frac{{\mu}_{j}{\nu}_{j}}{f}& -(f+\frac{{\mu}_{j}^{2}}{f})& {\nu}_{j}\\ f+\frac{{\nu}_{j}^{2}}{f}& -\frac{{\mu}_{j}{\nu}_{j}}{f}& -{\mu}_{j}\end{array}\right],$$## Eq. (29)

$${\widehat{\dot{\mathbf{\zeta}}}}_{j}=-\frac{{\mathbf{e}}_{3}^{T}{C}^{T}({}_{B}\widehat{\mathbf{q}}_{G}){\pi}_{{\mathbf{P}}_{\mathbf{j}}}}{{\mathbf{e}}_{3}^{T}{}^{G}\widehat{\mathbf{r}}}{A}_{{\mathbf{P}}_{\mathbf{j}}}C({}_{B}\widehat{\mathbf{q}}_{G}){}^{G}\widehat{\mathbf{v}}+{B}_{{\mathbf{P}}_{\mathbf{j}}}({\mathbf{\omega}}_{m}-{\widehat{\mathbf{b}}}_{\mathbf{\omega}}).$$Furthermore, by subtracting measured angular rate item and redefining the revised optical flow measurement ${\dot{\overline{\mathbf{\zeta}}}}_{j}$ as a new measurement of the filter, the right side was expressed only using the state vector $\widehat{\mathbf{x}}$; therefore, the following notation remains as

## Eq. (30)

$${\widehat{\dot{\overline{\mathbf{\zeta}}}}}_{j}={\widehat{\dot{\mathbf{\zeta}}}}_{j}-{B}_{{\mathbf{P}}_{\mathbf{j}}}{\mathbf{\omega}}_{m}=-\frac{{\mathbf{e}}_{3}^{T}{C}^{T}({}_{B}\widehat{\mathbf{q}}_{G}){\pi}_{{\mathbf{P}}_{\mathbf{j}}}}{{\mathbf{e}}_{3}^{T}{}^{G}\widehat{\mathbf{r}}}{A}_{{\mathbf{P}}_{\mathbf{j}}}C({}_{B}\widehat{\mathbf{q}}_{G}){}^{G}\widehat{\mathbf{v}}-{B}_{{\mathbf{P}}_{\mathbf{j}}}{\widehat{\mathbf{b}}}_{\mathbf{\omega}}.$$So that the measurement residual, also called innovation, was given as

## Eq. (31)

$${\mathbf{y}}_{k}^{j}=({\dot{\mathbf{\zeta}}}_{k}^{j}-{B}_{{\mathbf{P}}_{\mathbf{j}}}{\mathbf{\omega}}_{m})-{\widehat{\dot{\overline{\zeta}}}}_{k}^{j}\approx {\mathbf{H}}_{k}^{j}\delta {\mathbf{x}}_{k}+{\mathbf{\eta}}_{k}^{j},$$## Eq. (32)

$${\mathbf{H}}_{k}^{j}\triangleq {\frac{\partial {h}^{j}}{\partial \mathbf{x}}\xb7\frac{\partial \mathbf{x}}{\partial \delta \mathbf{x}}|}_{{\mathbf{x}}_{k}}=[\begin{array}{ccccc}{\mathbf{H}}_{\delta \mathbf{r},k}^{j}& {\mathbf{H}}_{\delta \mathbf{v},k}^{j}& {\mathbf{H}}_{\delta \mathbf{\theta},k}^{j}& {0}_{2\times 3}& {\mathbf{H}}_{\delta {\mathbf{b}}_{\mathbf{\omega}},k}^{j}\end{array}],$$## Eq. (33)

$${\mathbf{H}}_{\delta \mathbf{r},k}^{j}=\frac{{\mathbf{e}}_{3}^{T}{C}^{T}({}_{B}\widehat{\mathbf{q}}_{G,k}){\pi}_{{\mathbf{P}}_{\mathbf{j}}}}{{\mathbf{e}}_{3}^{T}{{}^{G}\widehat{\mathbf{r}}}_{k}{}^{G}{\widehat{\mathbf{r}}}_{k}{}^{T}{\mathbf{e}}_{3}}{A}_{{\mathbf{P}}_{\mathbf{j}}}C({}_{B}\widehat{\mathbf{q}}_{G,k}){}^{G}{\widehat{\mathbf{v}}}_{k}{\mathbf{e}}_{3}^{T},$$## Eq. (34)

$${\mathbf{H}}_{\delta \mathbf{v},k}^{j}=-\frac{{\mathbf{e}}_{3}^{T}{C}^{T}({}_{B}\widehat{\mathbf{q}}_{G,k}){\pi}_{{\mathbf{P}}_{\mathbf{j}}}}{{\mathbf{e}}_{3}^{T}{{}^{G}\widehat{\mathbf{r}}}_{k}}{A}_{{\mathbf{P}}_{\mathbf{j}}}C({}_{B}\widehat{\mathbf{q}}_{G,k}),$$## Eq. (35)

$${\mathbf{H}}_{\delta \mathbf{\theta},k}^{j}=\frac{{A}_{{\mathbf{P}}_{\mathbf{j}}}C({}_{B}\widehat{\mathbf{q}}_{G,k}){{}^{G}\widehat{\mathbf{v}}}_{k}{\mathbf{e}}_{3}^{T}}{{\mathbf{e}}_{3}^{T}{{}^{G}\widehat{\mathbf{r}}}_{k}}{C}^{T}({}_{B}\widehat{\mathbf{q}}_{G,k})\lfloor {\pi}_{{\mathbf{P}}_{\mathbf{j}}}\times \rfloor -\frac{{\mathbf{e}}_{3}^{T}{C}^{T}({}_{B}\widehat{\mathbf{q}}_{G,k}){\pi}_{{\mathbf{P}}_{\mathbf{j}}}}{{\mathbf{e}}_{3}^{T}{{}^{G}\widehat{\mathbf{r}}}_{k}}{A}_{{\mathbf{P}}_{\mathbf{j}}}C({}_{B}\widehat{\mathbf{q}}_{G,k})\lfloor {{}^{G}\widehat{\mathbf{v}}}_{k}\times \rfloor {C}^{T}({}_{B}\widehat{\mathbf{q}}_{G,k}),$$## Eq. (36)

$${\mathbf{H}}_{\delta {\mathbf{b}}_{\mathbf{\omega}},k}^{j}=-{B}_{{\mathbf{P}}_{\mathbf{j}}}.$$Note that the residual noise ${\mathbf{\eta}}_{k}^{j}$ was involved with the optical flow algorithm error and gyroscope drift noise, i.e., ${\mathbf{\eta}}_{k}^{j}={\mathbf{\xi}}_{k}^{j}-{B}_{{\mathbf{P}}_{\mathbf{j}}}\xb7{\mathbf{n}}_{\mathbf{\omega},k}$, hence, the covariance matrix of the revised optical flow measurement was the sum of the covariance of optical flow algorithm ${\mathbf{R}}_{{\mathbf{\xi}}_{j}}$ and the gyroscope covariance ${\mathbf{Q}}_{\mathbf{\omega}}$, which was given as

## Eq. (37)

$${\mathbf{R}}_{k}^{j}=E\{{\mathbf{\eta}}_{k}^{j}{({\mathbf{\eta}}_{k}^{j})}^{T}\}={\mathbf{R}}_{{\mathbf{\xi}}_{j}}+{B}_{{\mathbf{P}}_{\mathbf{j}}}{\mathbf{Q}}_{\mathbf{\omega}}{{B}_{{\mathbf{P}}_{\mathbf{j}}}}^{T}.$$In general, there were a set of optical flow vectors measured at one time, stacking all the observation residuals of Eq. (31) into one vector, which yielded, ${\mathbf{y}}_{k}={\mathbf{H}}_{k}\delta {\mathbf{x}}_{k}+{\mathbf{\eta}}_{k}$, where ${\mathbf{y}}_{k}$, ${\mathbf{H}}_{k}$, and ${\mathbf{\eta}}_{k}$ were composed of the block elements ${\mathbf{y}}_{k}^{j}$, ${\mathbf{H}}_{k}^{j}$, and ${\mathbf{\eta}}_{k}^{j}$, such that the measurement covariance matrix could be stacked together diagonally as

## Eq. (38)

$${\mathbf{R}}_{k}=\left[\begin{array}{ccc}{\mathbf{R}}_{k}^{1}& \cdots & {0}_{3\times 3}\\ \vdots & \ddots & \vdots \\ {0}_{3\times 3}& \cdots & {\mathbf{R}}_{k}^{N}\end{array}\right]=\left[\begin{array}{ccc}{\mathbf{R}}_{{\mathbf{\xi}}_{1}}+{B}_{{\mathbf{P}}_{1}}{\mathbf{Q}}_{\mathbf{\omega}}{{B}_{{\mathbf{P}}_{1}}}^{T}& \cdots & {0}_{3\times 3}\\ \vdots & \ddots & \vdots \\ {0}_{3\times 3}& \cdots & {\mathbf{R}}_{{\mathbf{\xi}}_{N}}+{B}_{{\mathbf{P}}_{N}}{\mathbf{Q}}_{\mathbf{\omega}}{{B}_{{\mathbf{P}}_{N}}}^{T}\end{array}\right].$$Then, the *a priori* state estimate would be updated with current measurements according to the following steps: (1) calculate the Kalman gain as

## Eq. (39)

$${\mathbf{K}}_{k}={\mathbf{P}}_{k}^{-}{\mathbf{H}}_{k}^{T}{({\mathbf{H}}_{k}{\mathbf{P}}_{k}^{-}{\mathbf{H}}_{k}^{T}+{\mathbf{R}}_{k})}^{-1},$$## Eq. (41)

$${\widehat{\mathbf{x}}}_{k}^{+}={\widehat{\mathbf{x}}}_{k}^{-}+\mathrm{\Delta}{\mathbf{x}}_{k},\phantom{\rule{0ex}{0ex}}{\mathbf{P}}_{k}^{+}=(I-{\mathbf{K}}_{k}{\mathbf{H}}_{k}){\mathbf{P}}_{k}^{-}{(I-{\mathbf{K}}_{k}{\mathbf{H}}_{k})}^{T}+{\mathbf{K}}_{k}{\mathbf{R}}_{k}{\mathbf{K}}_{k}^{T}.$$## 3.3.

### Error-State Kalman Filter Reset

For the ESKF reset process, we considered the expression among the new error $\delta {\widehat{\mathbf{x}}}_{k}^{+}$, the old error $\delta {\widehat{\mathbf{x}}}_{k}^{-}$, and the observed error correction $\mathrm{\Delta}{\mathbf{x}}_{k}$, as the true value was constant, yielding ${\widehat{\mathbf{x}}}_{k}^{+}+\delta {\widehat{\mathbf{x}}}_{k}^{+}={\widehat{\mathbf{x}}}_{k}^{-}+\delta {\widehat{\mathbf{x}}}_{k}^{-}$. Considering that the observed error has been injected into the updated nominal state, i.e., ${\widehat{\mathbf{x}}}_{k}^{+}={\widehat{\mathbf{x}}}_{k}^{-}+\mathrm{\Delta}{\mathbf{x}}_{k}$ so that $\delta {\widehat{\mathbf{x}}}_{k}^{+}=\delta {\widehat{\mathbf{x}}}_{k}^{-}-\mathrm{\Delta}{\mathbf{x}}_{k}$, we defined an error reset function $\delta \mathbf{x}\triangleq g(\delta \mathbf{x})=\delta {\widehat{\mathbf{x}}}_{k}^{-}-\mathrm{\Delta}{\mathbf{x}}_{k}$, where the reset operation was to ensure the error would reset $\delta \mathbf{x}\leftarrow 0$, meanwhile, the covariance of error needed to be modified as well,^{14} thus leading to

^{15}defined as

## 4.

## Observability Analysis

The estimator employed a linearized and discretized version of the nonlinear system model (to simplify the analysis, we only consider one single feature point):

## Eq. (44)

$$\delta {\mathbf{x}}_{k+1}={\mathbf{\Phi}}_{k}\delta {\mathbf{x}}_{k}+{\mathbf{G}}_{k}{\mathbf{n}}_{k},\phantom{\rule{0ex}{0ex}}{\mathbf{y}}_{k}={\mathbf{H}}_{k}\delta {\mathbf{x}}_{k}+{\mathbf{\eta}}_{k},$$^{25}which was defined by the error dynamic transition matrix ${\mathbf{\Phi}}_{k}$ and measurement Jacobian ${\mathbf{H}}_{k}$, i.e.,

## Eq. (45)

$$\mathbf{M}(\mathbf{x})=\left[\begin{array}{c}{\mathbf{H}}_{1}\\ {\mathbf{H}}_{2}{\mathbf{\Phi}}_{\mathrm{2,1}}\\ \vdots \\ {\mathbf{H}}_{k}{\mathbf{\Phi}}_{k,1}\end{array}\right]=\left[\begin{array}{c}{\mathbf{H}}_{1}\\ {\mathbf{H}}_{2}{\mathbf{\Phi}}_{1}\\ \vdots \\ {\mathbf{H}}_{k}{\mathbf{\Phi}}_{k-1}{\mathbf{\Phi}}_{k-2}\cdots {\mathbf{\Phi}}_{1}\end{array}\right],$$## Eq. (46)

$${\mathbf{H}}_{k}=[\begin{array}{ccccc}{\mathbf{H}}_{\delta \mathbf{r},k}& {\mathbf{H}}_{\delta \mathbf{v},k}& {\mathbf{H}}_{\delta \mathbf{\theta},k}& {0}_{2\times 3}& {\mathbf{H}}_{\delta {\mathbf{b}}_{\mathbf{\omega}},k}\end{array}],$$## Eq. (47)

$${\mathbf{\Phi}}_{k,1}=\left[\begin{array}{ccccc}{\mathbf{I}}_{3}& \mathrm{\Delta}{t}_{k}{\mathbf{I}}_{3}& {\mathbf{\Phi}}^{13}& {\mathbf{\Phi}}^{14}& {\mathbf{\Phi}}^{15}\\ {0}_{3\times 3}& {\mathbf{I}}_{3}& {\mathbf{\Phi}}^{23}& {\mathbf{\Phi}}^{24}& {\mathbf{\Phi}}^{25}\\ {0}_{3\times 3}& {0}_{3\times 3}& {\mathbf{\Phi}}^{33}& {0}_{3\times 3}& {\mathbf{\Phi}}^{35}\\ {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}& {\mathbf{I}}_{3}& {0}_{3\times 3}\\ {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}& {0}_{3\times 3}& {\mathbf{I}}_{3}\end{array}\right].$$Here $\mathrm{\Delta}{t}_{k}={t}_{k}-{t}_{1}$, and the computing method for obtaining analogous expressions of the subblock elements ${\mathbf{\Phi}}^{mn}$ can be found in Joel Hesch’s work.^{26} Furthermore, using these expressions of ${\mathbf{\Phi}}^{mn}$, we obtained the $k$’th block row ${\mathbf{M}}_{k}$ of $\mathbf{M}(\mathbf{x})$, for $k\ge 2$

## Eq. (48)

$${\mathbf{M}}_{k}={\mathbf{H}}_{k}{\mathbf{\Phi}}_{k,1}=[\begin{array}{ccccc}{\mathbf{M}}_{k1}& {\mathbf{M}}_{k2}& {\mathbf{M}}_{k3}& {\mathbf{M}}_{k4}& {\mathbf{M}}_{k5}\end{array}],$$## Theorem I:

The right nullspace $\mathbf{N}$ of the local observability matrix $\mathbf{M}(\mathbf{x})$ for the linearized model is spanned by three unobservable directions:

## Eq. (49)

$$\text{null}[\mathbf{M}(\mathbf{x})]=\mathbf{N}=\left[\begin{array}{cc}{\mathbf{N}}_{1}& {0}_{3\times 1}\\ {0}_{3\times 2}& -\lfloor {}^{G}{\mathbf{v}}_{1}\times \rfloor \mathbf{g}\\ {0}_{3\times 2}& {\mathbf{C}}_{1}\mathbf{g}\\ {0}_{3\times 2}& {0}_{3\times 1}\\ {0}_{3\times 2}& {0}_{3\times 1}\end{array}\right],\phantom{\rule[-0.0ex]{1em}{0.0ex}}{\mathbf{N}}_{1}=\left[\begin{array}{cc}1& 0\\ 0& 1\\ 0& 0\end{array}\right].$$The proof of the Theorem I is detailed in Sec. 9.

Nullspace $\mathbf{N}$ reveals that global rotations about the gravity vector is not observable, and global horizontal position is unobservable while height is fully observable. The remaining states, i.e., the velocity, pitch, and roll angle as well as IMU biases are also observable. The fact that the rotation around the gravity vector is unobservable is due to the missing of an absolute inertial yaw orientation, namely the yaw angle in the inertial frame is unobservable, while the absolute roll and pitch angle of aircraft w.r.t the global inertial frame is made observable by the gravity vector measurement. Similarly, without a global reference, the position in the direction of $x$ and $y$ is not observable, but what is different from previous findings, the altitude is observable because the assumption that optical flow features used for filter updates lie in a level plane and the plane contains the origin of the inertial frame. Only use the feature-plane assumption, according to Eq. (6), the feature depth is explicitly determined by absolute altitude and aircraft attitude, which becomes part of the components in the optical flow measurement equation and it, therefore, can get corrected when the new measurement update is coming.

## 5.

## Numerical Simulation

To numerically investigate the previous analysis, the proposed navigation scheme was implemented in a simulation. As shown in Fig. 4, there were 100 features randomly placed in a $[-350,350\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{m}]\times [-350,350\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{m}]$ horizontal plane around the origin. For generating the ground truth and sensor readings, we simulated a flight with a piecewise continuous trajectory, beginning with constant velocity at $20\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{m}/\mathrm{s}$ straight line motion lasting for 4 s, following by banking turn with a maximal angle of bank of 30 deg, and then a spiral motion with a pitch angle of up to 9 deg at the altitude between $-200$ and $-300\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{m}$, ending with a 50 s of circling. A full list of parameters is shown in Table 1, in which most of those quantities were referred in the commonly used IMU datasheet, e.g., MPU6050 or ADIS16448.

## Table 1

Simulation parameters and initial conditions of the filter [“rand (3, 1)” here represents a 3×1 random vector obeying standard normal distribution].

Parameter | Value | |
---|---|---|

IMU sampling rate (Hz) | 100 | |

Camera process rate (Hz) | 30 | |

Accelerometer noise [$\mathrm{m}/\left(\mathrm{s}\sqrt{s}\right)$] | ${\sigma}_{{n}_{a}}=2.24\times {10}^{-3}$ | |

Gyro noise [$\mathrm{deg}/\sqrt{s}$] | ${\sigma}_{{n}_{w}}=0.005$ | |

Accelerometer bias noise [$\mathrm{m}/\left({\mathrm{s}}^{2}\sqrt{s}\right)$ | ${\sigma}_{{n}_{ba}}=7.53\times {10}^{-5}$ | |

Gyro bias noise [$\mathrm{rad}/(\mathrm{s}\sqrt{s})]$ | ${\sigma}_{{n}_{bw}}=1.08\times {10}^{-5}$ | |

Field of view (FOV) (deg) | 90 | |

Optical flow noise (pixel·rad/s) | 0.01 | |

State | Initial value | True value |

Position (m) | True value + 50·rand(3,1) | [-50 -180 -200] |

Velocity (m/s) | True value + 10·rand(3,1) | [20 0 0] |

Quaternion | True value + 0.5·rand(3,1) | [0 0 0 1] |

Accelerometer bias ($\mathrm{m}/{\mathrm{s}}^{2}$) | [0 0 0] | [0.0981 0.0981 0.0981] |

Gyro bias (deg/s) | [0 0 0] | [0.5 0.5 -0.5] |

The choices of the initial state values are given in Table 1, which is showing that the filter had fault tolerance to the initial state offset to an extent for convergence. In addition, the initial filter error covariance matrix was set as ${\mathbf{P}}_{0}=\mathrm{diag}([\begin{array}{ccccc}2500{\mathbf{I}}_{3}& 100{\mathbf{I}}_{3}& 0.25{\mathbf{I}}_{3}& 0.01{\mathbf{I}}_{3}& 7.6\times {10}^{-5}{\mathbf{I}}_{3}\end{array}])$.

Even though using a single optical flow measurement ensured the observability, in a practical trial through proper setting of FOV and initial feature quantity, we would have at least seven features to be extracted every time step to obtain better convergence. After running 100 Monte Carlo simulations, we computed the Monte Carlo standard deviation, also known as the root mean squared error, denoted by the sample error covariance ${P}_{j}^{\mathrm{Monte}}$ through collecting $N$ simulation results as

## Eq. (50)

$${P}_{j}^{\text{Monte}}=\frac{1}{N}\sum _{i=1}^{N}({x}_{j}-{\widehat{x}}_{ij}){({x}_{j}-{\widehat{x}}_{ij})}^{T}.$$Here, at the $j$’th time step, ${x}_{j}$ was the true state, and ${\widehat{x}}_{ij}$ was the estimated value in the $i$’th simulation. The Monte Carlo standard deviation and filter standard deviation are the square root of the elements in the main diagonal of ${P}_{j}^{\text{Monte}}$ and ESKF estimation error covariance, respectively.

In Figs. 5Fig. 6Fig. 7Fig. 8–9, it is shown the results of the proposed filter tracking ground truth for the position, attitude, velocity, and IMU bias. The shown results outline that most of the state estimations converge to ground truth occurred within 20 s, except the unobservable states, positions along $x$ and $y$ axes and the yaw angle, whose Monte Carlo standard deviation as well as the filter standard deviation at the final time maintained around values at the initial time. The unobservable states could remain with a constant error rather than absolute divergence since they could have proper evolution and IMU biases correction but lacking the initial reference. This correlated well with the observability analysis of horizontal position and yaw in Sec. 4.

Among the observable states, as the observability analysis disregards process noise and initial offset, as the filter standard deviation decreases promptly, the Monte Carlo standard deviation seems to converge to zero slowly. Comparing with curves of the gyroscope bias, the uncertainty of accelerometer bias presented a much slower convergence, which was induced by the gyroscope bias being explicitly included in the measurement equation and therefore directly corrected while the accelerometer bias only relied on states propagation and obtained the effect via coupling with other states. For the estimation of roll and pitch angle, height as well as vertical velocity, a better agreement between the Monte Carlo sample covariance and the filter error covariance than other states are shown in plots; nevertheless, the estimation of horizontal inertial and body-fixed velocity relied on mutual tight coupling effect with estimated attitude contained with accelerometer bias and unobservable yaw angle.

## 6.

## Experiment

The proposed filter has been tested on an X-star quadcopter which was equipped with Pixhawk 4 and PX4Flow board. The optical flow camera consisted of a $752\times 480$ MT9V034 image CMOS sensor with global shutter and 16 mm M12 lens, $24\times 24\text{\hspace{0.17em}\hspace{0.17em}}\mu \mathrm{m}$, whose output to optical flow data was $4\times 4$ binned image at 10 Hz. The camera module could be considered as a pinhole camera with the mounting method for pointing the camera straight down. The employed IMU was a dual redundancy solution with BMI055 and ICM-20689, providing inertial measurements at 125 Hz.

Ground truth was collected by the conventional INS based on extended Kalman filter and attitude and heading reference system algorithm, which integrated IMU outputs, magnetometer readings, and barometric altitude with GPS data. Note that these extra sensors were only for generating true value and not a part of the proposed navigation system, nor used for estimates.

In Fig. 10 the outdoor trajectory during the autonomous controlled waypoint mode flight test is shown. The flight envelope was set with a maximum height at 10 m and maximum speed around $7\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{m}/\mathrm{s}$. The quadcopter took off at the origin, as we had little information of the IMU, the estimated states were initialized with zero value as this was the only option. The covariance parameters could be selected based on hardware specifications, which did not require strong tuning as the framework could work well with a large range of parameters.

From results in Figs. 11Fig. 12–13, the filter operation was started in the flight at around $t=70\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{s}$ and at a speed of $\sim 5\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{m}/\mathrm{s}$. The roll and pitch angle exhibit a high-quality tracking accuracy except the growing error occurs when the vehicle was ready to descend for landing at the time after 150 s, as the horizontal velocity drops which was not favorable for the accuracy of optical flow measurement. Due to misalignment of the initial value, the yaw angle always kept a deviation with true value, which might make a major contribution to the estimated error of velocity along $x$ and $y$ axes, especially during $t\in [120,140\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{s}]$ as they deviate from ground truth with a greater oscillation. Under unknown noise of the hardware system or vehicle vibration, the horizontal velocity and horizontal position estimates approached their true value more slowly and less precisely, but vertical velocity, as well as altitude estimate, exhibited more satisfactory accuracy.

## 7.

## Conclusions

This paper has presented an optical flow-based approach to vision-aided inertial navigation. Under the assumption that the environment of the observed feature points can be seen as located on a plane, the feature depth can be obtained by deducing distance expression through the optical flow equation with a geometry-coordinate relationship based on a vehicle-camera-feature system. In order to fill in previous works’ gap, an error analysis for optical flow measurement using a block-matching method was provided, which was the prerequisite for adopting and designing an ESKF. With the IMU dead reckoning used for prediction, optical flow, and feature measurement for correction, the estimates of the vehicle’s altitude, attitude, velocity, and drift IMU biases can be obtained. The theoretical limitations and feasibility of the filter were investigated by observability analysis, which has shown that only the absolute position along $x$ and $y$ axes along with the yaw angle were unobservable. Finally, the Monte Carlo simulation and hardware involved experiment validated the statement in observability analysis.

The numerical and real experiments have shown that the effectiveness and performance of the proposed filter may suffer from initialization errors and noise. Considering that the minimal sensor configuration was low-priced and the adopted method required less computational effort, the overall system could be an onboard estimator for providing real-time supplemental state estimate when the UAV would be placed in an unknown GPS-denied environment.

## 8.

## Appendix I: Expression of Error Dynamic Transition Matrix Subblock Elements ${\mathbf{\Phi}}^{mn}$ and Calculation of ${\mathbf{M}}_{k}$

According to Ref. 26, the sub-block elements used in this paper are shown as

## Eq. (51)

$${\mathbf{\Phi}}^{13}=\lfloor ({}^{G}{\mathbf{r}}_{1}+{}^{G}{\mathbf{v}}_{1}\mathrm{\Delta}{t}_{k}+\frac{1}{2}\mathbf{g}\mathrm{\Delta}{t}_{k}^{2}-{}^{G}{\mathbf{r}}_{k})\times \rfloor {\mathbf{C}}_{1}^{T}\phantom{\rule{0ex}{0ex}}{\mathbf{\Phi}}^{23}=-\lfloor ({}^{G}{\mathbf{v}}_{k}-{}^{G}{\mathbf{v}}_{1}-\mathbf{g}\mathrm{\Delta}{t}_{k})\times \rfloor {\mathbf{C}}_{1}^{T}\phantom{\rule{0ex}{0ex}}{\mathbf{\Phi}}^{33}={\mathbf{C}}_{k}{\mathbf{C}}_{1}^{T}\phantom{\rule{0ex}{0ex}}{\mathbf{\Phi}}^{14}=-{\int}_{{t}_{1}}^{{t}_{k}}{\int}_{{t}_{1}}^{t}{\mathbf{C}}_{\tau}^{T}\mathrm{d}\tau \text{\hspace{0.17em}}\mathrm{d}t\phantom{\rule{0ex}{0ex}}{\mathbf{\Phi}}^{24}=-{\int}_{{t}_{1}}^{{t}_{k}}{\mathbf{C}}_{\tau}^{T}\mathrm{d}\tau \phantom{\rule{0ex}{0ex}}{\mathbf{\Phi}}^{15}={\int}_{{t}_{1}}^{{t}_{k}}{\int}_{{t}_{1}}^{t}\lfloor ({}^{G}{\mathbf{a}}_{s}-\mathbf{g})\times \rfloor {\int}_{{t}_{1}}^{s}{\mathbf{C}}_{\tau}^{T}\mathrm{d}\tau \text{\hspace{0.17em}}\mathrm{d}s\text{\hspace{0.17em}}\mathrm{d}t\phantom{\rule{0ex}{0ex}}{\mathbf{\Phi}}^{25}={\int}_{{t}_{1}}^{{t}_{k}}\lfloor ({}^{G}{\mathbf{a}}_{s}-\mathbf{g})\times \rfloor {\int}_{{t}_{1}}^{s}{\mathbf{C}}_{\tau}^{T}\mathrm{d}\tau \text{\hspace{0.17em}}\mathrm{d}s\phantom{\rule{0ex}{0ex}}{\mathbf{\Phi}}^{35}=-{\mathbf{C}}_{k}{\int}_{{t}_{1}}^{{t}_{k}}{\mathbf{C}}_{t}^{T}\mathrm{d}t.$$Moreover, with these expressions, and based on Eqs. (33)–(36), the $k$’th block row ${\mathbf{M}}_{k}$ of $\mathbf{M}(\mathbf{x})$ was calculated, for $k\ge 2$

## Eq. (52)

$${\mathbf{M}}_{k1}={\mathbf{H}}_{\delta \mathbf{r},k}=\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}{}^{G}{\mathbf{r}}_{k}{{}^{T}\mathbf{e}}_{3}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}{}^{G}{\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T},$$## Eq. (53)

$${\mathbf{M}}_{k2}=\mathrm{\Delta}{t}_{k}{\mathbf{H}}_{\delta \mathbf{r},k}+{\mathbf{H}}_{\delta \mathbf{v},k}=\mathrm{\Delta}{t}_{k}\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}{}^{G}{\mathbf{r}}_{k}{}^{T}{\mathbf{e}}_{3}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}{}^{G}{\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T}-\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}{A}_{\mathbf{P}}{\mathbf{C}}_{k},$$## Eq. (54)

$${\mathbf{M}}_{k3}={\mathbf{H}}_{\delta \mathbf{r},k}{\mathbf{\Phi}}^{13}+{\mathbf{H}}_{\delta \mathbf{v},k}{\mathbf{\Phi}}^{23}+{\mathbf{H}}_{\delta \mathbf{\theta},k}{\mathbf{\Phi}}^{33}\phantom{\rule{0ex}{0ex}}=\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}{}^{G}{\mathbf{r}}_{k}{{}^{T}\mathbf{e}}_{3}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}{}^{G}{\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T}\xb7\lfloor ({}^{G}{\mathbf{r}}_{1}+{}^{G}{\mathbf{v}}_{1}\mathrm{\Delta}{t}_{k}+\frac{1}{2}\mathbf{g}\mathrm{\Delta}{t}_{k}^{2}-{}^{G}{\mathbf{r}}_{k})\times \rfloor {\mathbf{C}}_{1}^{T}\phantom{\rule{0ex}{0ex}}+\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}\xb7\lfloor ({}^{G}{\mathbf{v}}_{k}-{}^{G}{\mathbf{v}}_{1}-\mathbf{g}\mathrm{\Delta}{t}_{k})\times \rfloor {\mathbf{C}}_{1}^{T}+\frac{{A}_{\mathbf{P}}{\mathbf{C}}_{k}{}^{G}{\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}\lfloor {\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}\times \rfloor {\mathbf{C}}_{1}^{T}-\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}\lfloor {}^{G}{\mathbf{v}}_{k}\times \rfloor {\mathbf{C}}_{1}^{T},$$## Eq. (55)

$${\mathbf{M}}_{k4}={\mathbf{H}}_{\delta \mathbf{r},k}{\mathbf{\Phi}}^{14}+{\mathbf{H}}_{\delta \mathbf{v},k}{\mathbf{\Phi}}^{24}\phantom{\rule{0ex}{0ex}}=\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}{}^{G}{\mathbf{r}}_{k}{{}^{T}\mathbf{e}}_{3}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}{}^{G}{\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T}\xb7{\int}_{{t}_{1}}^{{t}_{k}}{\int}_{{t}_{1}}^{t}{\mathbf{C}}_{\tau}^{T}\mathrm{d}\tau \text{\hspace{0.17em}}\mathrm{d}t-\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}\xb7{\int}_{{t}_{1}}^{{t}_{k}}{\mathbf{C}}_{\tau}^{T}\mathrm{d}\tau ,$$## Eq. (56)

$${\mathbf{M}}_{k5}={\mathbf{H}}_{\delta \mathbf{r},k}{\mathbf{\Phi}}^{15}+{\mathbf{H}}_{\delta \mathbf{v},k}{\mathbf{\Phi}}^{25}+{\mathbf{H}}_{\delta \mathbf{\theta},k}{\mathbf{\Phi}}^{35}+{\mathbf{H}}_{\delta {\mathbf{b}}_{\mathbf{\omega}},k}\phantom{\rule{0ex}{0ex}}=-\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}{}^{G}{\mathbf{r}}_{k}{{}^{T}\mathbf{e}}_{3}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}{}^{G}{\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T}\xb7{\int}_{{t}_{1}}^{{t}_{k}}{\int}_{{t}_{1}}^{t}\lfloor ({}^{G}{\mathbf{a}}_{s}-\mathbf{g})\times \rfloor {\int}_{{t}_{1}}^{s}{\mathbf{C}}_{\tau}^{T}\mathrm{d}\tau \text{\hspace{0.17em}}\mathrm{d}s\text{\hspace{0.17em}}\mathrm{d}t-\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}\xb7{\int}_{{t}_{1}}^{{t}_{k}}\lfloor ({}^{G}{\mathbf{a}}_{s}-\mathbf{g})\times \rfloor {\int}_{{t}_{1}}^{s}{\mathbf{C}}_{\tau}^{T}\mathrm{d}\tau \text{\hspace{0.17em}\hspace{0.17em}}\mathrm{d}s\phantom{\rule{0ex}{0ex}}-\frac{{A}_{\mathbf{P}}{\mathbf{C}}_{k}{}^{G}{\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}{\mathbf{C}}_{k}^{T}\lfloor {\pi}_{\mathbf{P}}\times \rfloor {\mathbf{C}}_{k}{\int}_{{t}_{1}}^{{t}_{k}}{\mathbf{C}}_{t}^{T}\mathrm{d}t-\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}\lfloor {}^{G}{\mathbf{v}}_{k}\times \rfloor {\int}_{{t}_{1}}^{{t}_{k}}{\mathbf{C}}_{t}^{T}\mathrm{d}t-{B}_{\mathbf{P}}.$$## 9.

## Appendix II: The Proof of Theorem I

## Proof:

To verify the result, we just directly multiplied each block row of $\mathbf{M}(\mathbf{x})$ with $\mathbf{N}$.

When

## Eq. (57)

$$k=1,\phantom{\rule[-0.0ex]{1em}{0.0ex}}{\mathbf{H}}_{1}\mathbf{N}=[\begin{array}{cc}{\mathbf{H}}_{\delta \mathbf{r},1}{\mathbf{N}}_{1}& -{\mathbf{H}}_{\delta \mathbf{v},1}\lfloor {}^{G}{\mathbf{v}}_{1}\times \rfloor \mathbf{g}+{\mathbf{H}}_{\delta \mathbf{\theta},1}{\mathbf{C}}_{1}\mathbf{g}\end{array}],$$Since ${\mathbf{e}}_{3}^{T}=[\begin{array}{ccc}0& 0& 1\end{array}]$, both $\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}{}^{G}{\mathbf{r}}_{k}{}^{T}{\mathbf{e}}_{3}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}{{}^{G}\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T}$ and $\frac{{A}_{\mathbf{P}}{\mathbf{C}}_{k}{}^{G}{\mathbf{v}}_{k}{\mathbf{e}}_{3}^{T}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{k}}$ had such structure $\left[\begin{array}{ccc}0& 0& *\\ 0& 0& *\end{array}\right]$, which were marked by ${\mathbf{\Sigma}}_{\mathbf{k}}^{1}$ and ${\mathbf{\Sigma}}_{\mathbf{k}}^{2}$, respectively, hence

## Eq. (59)

$${\mathbf{H}}_{\delta \mathbf{r},1}{\mathbf{N}}_{1}={\mathbf{\Sigma}}_{\mathbf{k}}^{\mathbf{1}}{\mathbf{N}}_{1}={0}_{2\times 2},\phantom{\rule[-0.0ex]{1em}{0.0ex}}{\mathbf{M}}_{k1}{\mathbf{N}}_{1}={\mathbf{\Sigma}}_{\mathbf{k}}^{\mathbf{1}}{\mathbf{N}}_{1}={0}_{2\times 2},$$## Eq. (60)

$$-{\mathbf{H}}_{\delta \mathbf{v},1}\lfloor {}^{G}{\mathbf{v}}_{1}\times \rfloor \mathbf{g}+{\mathbf{H}}_{\delta \mathbf{\theta},1}{\mathbf{C}}_{1}\mathbf{g}=\frac{{A}_{\mathbf{P}}{\mathbf{C}}_{1}{}^{G}{\mathbf{v}}_{1}{\mathbf{e}}_{3}^{T}}{{\mathbf{e}}_{3}^{T}{}^{G}{\mathbf{r}}_{1}}\lfloor {\mathbf{C}}_{1}^{T}{\pi}_{\mathbf{P}}\times \rfloor {\mathbf{C}}_{1}^{T}{\mathbf{C}}_{1}\mathbf{g}={\mathbf{\Sigma}}_{1}^{2}\lfloor {\mathbf{C}}_{1}^{T}{\pi}_{\mathbf{P}}\times \rfloor \mathbf{g},$$## Eq. (61)

$$-{\mathbf{M}}_{k2}\lfloor {}^{G}{\mathbf{v}}_{1}\times \rfloor \mathbf{g}+{\mathbf{M}}_{k3}{\mathbf{C}}_{1}\mathbf{g}=-\mathrm{\Delta}{t}_{k}{\mathbf{\Sigma}}_{\mathbf{k}}^{\mathbf{1}}\lfloor {}^{G}{\mathbf{v}}_{1}\times \rfloor \mathbf{g}+{\mathbf{\Sigma}}_{\mathbf{k}}^{\mathbf{1}}\xb7\lfloor ({}^{G}{\mathbf{r}}_{1}+{}^{G}{\mathbf{v}}_{1}\mathrm{\Delta}{t}_{k}+\frac{1}{2}\mathbf{g}\mathrm{\Delta}{t}_{k}^{2}-{}^{G}{\mathbf{r}}_{k})\times \rfloor \mathbf{g}\phantom{\rule{0ex}{0ex}}-\frac{{\mathbf{e}}_{3}^{T}{\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}}{{\mathbf{e}}_{3}^{T}{{}^{G}\mathbf{r}}_{k}}{A}_{\mathbf{P}}{\mathbf{C}}_{k}\xb7\lfloor (\mathbf{g}\mathrm{\Delta}{t}_{k})\times \rfloor \mathbf{g}+{\mathbf{\Sigma}}_{\mathbf{k}}^{2}\lfloor {\mathbf{C}}_{k}^{T}{\pi}_{\mathbf{P}}\times \rfloor \mathbf{g}.$$Given ${\mathbf{\Sigma}}_{\mathbf{k}}^{\mathbf{1}}\lfloor \xb7\times \rfloor \mathbf{g}={0}_{2\times 1}$, ${\mathbf{\Sigma}}_{\mathbf{k}}^{2}\lfloor \xb7\times \rfloor \mathbf{g}={0}_{2\times 1}$, and $\lfloor \mathbf{g}\times \rfloor \mathbf{g}={0}_{2\times 1}$, it yielded ${\mathbf{H}}_{1}\mathbf{N}={0}_{2\times 3}$, ${\mathbf{M}}_{k}\mathbf{N}={0}_{2\times 3}$, so that we had $\mathbf{M}(\mathbf{x})\mathbf{N}=0$.

## Acknowledgments

The first author wishes to thank Professor Walter Fichter and Dr. Lorenz Schmitt for their useful comments and language editing, which have greatly improved the manuscript. The authors also would like to thank the China Scholarship Council for proving the funding that made this research possible.

## References

**,” J. Guid. Control Dyn., 39 (4), 770 –780 (2016). https://doi.org/10.2514/1.G001146 Google Scholar**

*Observability criteria and null-measurement Kalman filter for vision-aided inertial navigation***,” J. Guid. Control Dyn., 40 (10), 2587 –2602 (2017). https://doi.org/10.2514/1.G002105 Google Scholar**

*Autonomous navigation based on sequential images for planetary landing in unknown environments***,” Opt. Eng., 57 (7), 073104 (2018). https://doi.org/10.1117/1.OE.57.7.073104 Google Scholar**

*Hybrid real-time stereo visual odometry for unmanned aerial vehicles***,” J. Guid. Control Dyn., 39 (8), 1777 –1789 (2016). https://doi.org/10.2514/1.G000281 Google Scholar**

*Vision-aided nonlinear observer for fixed-wing unmanned aerial vehicle navigation***,” Opt. Eng., 55 (4), 044102 (2016). https://doi.org/10.1117/1.OE.55.4.044102 Google Scholar**

*Integrated navigation method based on inertial navigation system and Lidar***,” IEEE Trans. Rob., 31 (5), 1147 –1163 (2015). https://doi.org/10.1109/TRO.2015.2463671 Google Scholar**

*ORB-SLAM: a versatile and accurate monocular SLAM system***,” Auton. Rob., 42 (3), 615 –629 (2018). https://doi.org/10.1007/s10514-017-9654-9 Google Scholar**

*Relative motion estimation using visual–inertial optical flow***,” IEEE Trans. Syst. Man Cybern. Part C, 42 (6), 1752 –1762 (2012). https://doi.org/10.1109/TSMCC.2012.2218806 Google Scholar**

*Comparing optical flow algorithms using 6-dof motion of real-world rigid objects***,” Rob. Autom. Syst., 57 (6–7), 591 –602 (2009). https://doi.org/10.1016/j.robot.2009.02.001 Google Scholar**

*Optic flow-based vision system for autonomous 3D localization and control of small aerial vehicles***,” Comput. Vision Image Understanding, 116 (5), 606 –633 (2012). https://doi.org/10.1016/j.cviu.2011.04.004 Google Scholar**

*A review and evaluation of methods estimating ego-motion***,” Int. J. Comput. Vision, 92 (1), 1 –31 (2011). https://doi.org/10.1007/s11263-010-0390-2 IJCVEQ 0920-5691 Google Scholar**

*A database and evaluation methodology for optical flow***,” Int. J. Adv. Rob. Syst., 10 (10), 341 (2013). https://doi.org/10.5772/56813 Google Scholar**

*An autonomous UAV with an optical flow sensor for positioning and navigation***,” J. Guid. Control Dyn., 30 (3), 816 –826 (2007). https://doi.org/10.2514/1.22398 JGCODS 0731-5090 Google Scholar**

*Vision-based state estimation for autonomous micro air vehicles***,” Inf. Sci., 327 300 –326 (2016). https://doi.org/10.1016/j.ins.2015.08.017 Google Scholar**

*Observability analysis of inertial navigation errors from optical flow subspace constraint***,” Barcelona, Spain (2017). Google Scholar**

*Quaternion kinematics for the error-state Kalman filter***,” Robotics: Science and Systems VIII, 17 –24 The MIT Press, Sydney (2013). Google Scholar**

*State estimation for legged robots-consistent fusion of leg kinematics and IMU***,” Adv. Space Res., 49 (5), 1007 –1018 (2012). https://doi.org/10.1016/j.asr.2011.12.010 ASRSDW 0273-1177 Google Scholar**

*A debris image tracking using optical flow algorithm***,” Theoretical Statistics: Topics for a Core Course, 175 –201 1st ed.Springer, New York (2010). Google Scholar**

*Estimating equations and maximum likelihood***,” In All Likelihood: Statistical Modelling and Inference Using Likelihood, 213 –228 Oxford University Press, Oxford (2001). Google Scholar**

*Score function and Fisher information***,” Fundamentals of High Accuracy Inertial Navigation, 174 59 –78 AIAA, New York (1997). Google Scholar**

*Inertial instrumentation***,” in IEEE Int. Conf. Rob. and Autom., 3565 –3572 (2007). https://doi.org/10.1109/ROBOT.2007.364024 Google Scholar**

*A multi-state constraint Kalman filter for vision-aided inertial navigation***,” IEEE Trans. Instrum. Meas., 57 (1), 140 –149 (2008). https://doi.org/10.1109/TIM.2007.908635 IEIMAO 0018-9456 Google Scholar**

*Analysis and modeling of inertial sensors using Allan variance***,” in IEEE Int. Conf. Rob. and Autom., 4304 –4311 (2016). https://doi.org/10.1109/ICRA.2016.7487628 Google Scholar**

*Extending kalibr: calibrating the extrinsics of multiple IMUs and of individual axes***,” (2017). http://scholarsarchive.byu.edu/facpub/1963 Google Scholar**

*Relative multiplicative extended Kalman filter for observable GPS-denied navigation***,” Int. J. Rob. Res., 33 (1), 182 –201 (2014). https://doi.org/10.1177/0278364913509675 IJRREL 0278-3649 Google Scholar**

*Camera-IMU-based localization: observability analysis and consistency improvement***,” IEEE Trans. Rob., 30 (1), 158 –176 (2014). https://doi.org/10.1109/TRO.2013.2277549 Google Scholar**

*Consistency analysis and improvement of vision-aided inertial navigation*## Biography

**Jia Deng** received her BS degree at Beihang University, Beijing, in 2010, and she is currently pursuing her PhD at Beihang University, Beijing, China. She mainly studies navigation, control theory, and corresponding application and recently has focused on the data fusion problems and cooperative formation flight control system.

**Sentang Wu** received his PhD in dynamics, ballistics, and aircraft motion control systems from Ukraine National Aviation University in 1992. He mainly studies the theory and application of nonlinear stochastic systems, computer information processing, and control and aircraft cooperative control. He is currently a professor of automation science and electrical engineering and a doctoral tutor at Beihang University.

**Hongbo Zhao** received her BS degree at Beihang University, Beijing, and she is currently pursuing her PhD at Beihang University, Beijing, China. She mainly studies the control theory and application, distributed state estimation of stochastic systems, which focus on multiagent systems, and cooperative formation flight control systems with uncertainties.