In the recent past, the number of incidents with intentional and unintentional misuse of small or microunmanned aerial vehicles (MUAVs) has increased due to the increasing number of commercially available small quadcopters. In their annual reports,1 the Federal Aviation Administration has pointed out an increasing number of sightings and close encounters of MUAVs reported in civilian air traffic. Further, the first serious incidents where a MUAV crashed into an air plane have occured.2 Beside this threat to civilian air traffic, misuse of MUAVs can also harm personal and public privacy and create security issues.
MUAVs can operate in highly agile flight patterns and are hard to detect due to having small cross sections.3–7 Thus, in the past decade, several groups have been working on the problem of efficiently detecting, tracking, and classifying MUAVs in different scenarios. Detection and tracking of MUAVs have been studied using single millimeter-wave RADAR,8 optical (VIS/SWIR)9 sensors or within a multisensor approach using passive/active imaging, acoustics, and RADAR in various field trials.10–13 Further, classification of MUAV has been investigated using image-based deep-learning and high-level data-fusion approaches.14–16 Classification can also be used to distinguish MUAVs from, e.g., birds.
Different sensors (such as RADAR, acoustics, and optics) will be embedded in a counter-MUAV system where sensor information will be processed to reliably detect/track, classify, and localize MUAV. Based on this information, decisions are made and countermeasures (CMs) are engaged. Owing to the internal processing time of information generation and decision-making as well as external time constrains (e.g., propagation of a projectile), the effect on the MUAV will occur after a certain inherent time delay . Owing to the agile operation and small cross section, the probability of successful CMs is relatively low.17 Here, the prediction of the MUAV position at the moment of effect could significantly increase the success rate.
First, works that predict the MUAV position in the image plane of the succeeding frame (frame-to-frame prediction) of a video stream18,19 using a linear motion model have been presented. In our approach, we predict the position of the MUAV in three-dimensional (3-D) space and on a longer time scale (with ) or over -frames. For a reliable result, we determine the 3-D position of the MUAV from pose estimation in a two-dimensional (2-D) intensity images and apply a navigation model to simulate the flight behavior.
Prediction of Microunmanned Aerial Vehicle Position in the Image Plane
Based on linear quadratic estimation, Kálmán20 developed a filter to either reduce measurement noise effects and to estimate values (or determine the most probable value). In the computer vision community, for instance, the Kalman filter approach is widely used for tracking and navigating the MUAV.18,19 The Kalman filter is used to predict the MUAV state vector , which contains a set of parameters, from a current measurement and the previous state vector , as defined in Eq. (1). Here, describes noise characteristics
Depending on the application (e.g., navigation or tracking), the state vector summarizes several physical parameters [such as position, velocity, angles (roll, yaw, pitch), and angular velocities] to describe the current flight state. In our approach, the state vector is defined as containing only the MUAV position , velocity , and acceleration that are the first and second temporal derivatives of the position in the ’th frame. Higher order of temporal derivatives [the jerk (third) and the snap (fourth)] could be of interest in future algorithms and are neglected in the current code.
We developed a tracking and prediction filter based on Murphy’s “Kalman filter toolbox for Matlab”21 and the tutorial of Welsh and Bishop.22 Our filter was designed to predict a MUAV position in a -frame future from a continuous data stream and has two stages: a Kalman filter to estimate the current state vector and an -frame prediction filter to estimate the future state .
The Kalman filter, as described in Sec. 5, returns a state vector at time , containing position, velocity, and acceleration. Then, the MUAV state vector at time can be predicted using the motion equation approach, as defined in Eq. (2). The predicted position can be extracted by Eq. (3). We define the motion prediction equations as
The complete -frame prediction procedure is illustrated in Algorithm 1 as a pseudocode. The computation process returns a predicted position and needs a set of input data consisting of the current image frame , the previous MUAV state vector , and the number of frames as the prediction range. First, the MUAV position in the current image has to be determined by, for instance, a tracking algorithm. This step will be discussed in the next paragraph. From these position measurements, the current state vector is calculated using the Kalman filter and the MUAV state at frame is predicted. Finally, the estimated future position is extracted [see Eqs. (2) and (3)].
Pseudocode for the n-frame prediction of 2-D MUAV position in the image plane.
|Input: Current frame , previous state vector , coefficient|
|Output:-frame prediction of 2-D MUAV position|
|1: Determine the MUAV position in frame:|
|2: Apply the Kalman filter|
|3: Predict the state vector in the ’th frame: . [Eq. (2)]|
|4: Extract the position in the ’th frame: . [Eq. (3)]|
In Fig. 1, some frames of a video stream are presented. This video was recorded inside our laboratory, showing a small quadcopter MUAV (type DJI Phantom 3) flying around in the camera field of view. The MUAV is tracked with a state-of-the-art tracker based on the consensus-based matching and tracking of key points (CMT).23,24 In this algorithm, the image is transformed to an alternative base using feature detection from accelerated segment test (FAST)25,26 and binary robust invariant scalable key points of features27 as feature detector and descriptor subroutines.
The tracker employs clustering of correspondences as the central idea to distinguish inlier and outlier key points. This approach improves the tracking results, even when correspondences are located on deformed parts of the object. As the flying MUAV shows very rapid aspect changes due to the angle between the vision system and the target, this tracker is well adapted to this particular situation. The tracking algorithm returns a vector containing the MUAV position ( and are the column and row positions of the MUAV, respectively) and the size of the bounding box (width and height: , ). In Fig. 1, some frames of the video stream illustrate the CMT tracking results.
An exemplary result of our tracking and prediction algorithm (Algorithm 1) is presented in Fig. 2. In Fig. 2(a), a five-frame prediction result from our approach using first and second derivatives (□) is compared to a simple prediction approach using the first derivative only (). Both prediction paths are compared to the ground-truth CMT tracking data (—). As long as the MUAV flight operation is slow and smooth, only small differences between ground truth and predicted paths are observed. But at turning points and during high-agile flight operation, the predictions are less accurate, especially for the simple prediction approach.
A deeper discussion of the prediction results is presented in Fig. 2(b). In the top diagram, the amount of single flight behavior components are plotted for each frame as the amplitude of the first and second derivatives [ (⋯) and (—)]. It is obvious that rapid changes in the first derivative lead to peaks in the amplitude of the second derivative.
Furthermore, in the bottom diagram of Fig. 2(b), the prediction error is illustrated as the mismatch distance between ground truth and predicted position given by the L2-norm distance: . For the simple prediction approach (⋯), high errors can be observed in frames that are related to high agile or dynamic MUAV flight operations (see frames 110 to 125, 165 to 195, 205 to 220, etc.). In this specific sequence, we obtain a mean prediction error of 10.1 pixels. This mismatch distance can be reduced significantly by additionally employing the second derivative (—) in our prediction approach. With this algorithm it is possible to reduce all previously occurring peaks significantly and a mean prediction mismatch distance of 4.7 pixels is reached.
Prediction of Microunmanned Aerial Vehicles Position in Three-Dimensional Space
In three-dimensional (3-D) space, the flight path prediction has to take into account the position and motion in all spatial dimensions to reflect the complex MUAV flight behavior. Here, we want to introduce a pose estimation algorithm to derive the 3-D position and orientation of the MUAV from the 2-D intensity images. Further, we use the obtained data in an aerodynamic flight behavior model to calculate the acceleration as an additional measurement value. Then, both the determined 3-D position and the acceleration values are fed into the aforementioned Kalman filter and prediction algorithm to return the probable future MUAV position in the 3-D space.
Flight behavior of a quadcopter MUAV, for instance, is modeled in the algorithm used for navigation and control.28–33 These algorithms analyze the dynamic behavior of the spatial and the angular motion to control the power consumption of the four rotors individually. Typically, these models combine a global or inertial and a MUAV or body coordinate system. In our approach, we rely on the global coordinate system of the optical sensing device. In the coordinate system the - and -directions are the width and the height corresponding to the previous pixel row and column directions, respectively. Then, the -direction is the range.
Further, we developed a simplified flight behavior model to describe the MUAV motion from the thrust , the gravitational force , and the aerodynamic drag force , which can be derived from some fundamental parameters such as position and orientation in 3-D space, velocity, acceleration, and few boundary conditions (aerodynamic constants and parameters). In principle, we assume that the MUAV is a floating object (the vertical component of the thrust compensates for earth gravitation). Details of this fight behavior model can be found in Secs. 6 and 7.
Position and orientation of the MUAV in 3-D space can be derived from pose estimation. For this task, the quadcopter used in our experiment can be modeled by a skeleton model consisting of four small vertical segments equally distributed around a bigger vertical segment, as shown in Fig. 3. The four small segments represent the rotation axes of each propeller and the bigger one represents the body of the quadcopter.
The use of a skeleton model has the benefit of not being too specific to the investigated MUAV. Later adaption to another MUAV can easily be done by changing the model. The only values that have to be known are the number of propellers and their distance to the center.
For the investigated vehicle, the DJI Phantom 3,34 there are four propellers located 175 mm from the center. The number of points per segment can also vary, but it does not have a great influence; we usually use 20 points per segment. The skeleton points are designated by when placed at the origin. The pose of the skeleton corresponding to the image is represented by a vector containing six values , where is the -coordinate and is the angle of rotation around this axis. Analogous definitions are made for the -axis and the -axis. Here, is the transformed skeleton (rotation and translation) and stands for the points at pose
The pose estimation is based on the use of two successive views. The pseudocode is given in Algorithm 2. First, corners are detected in the region of the current frame using the FAST corner detector26 with a nonmaximum suppression to obtain a better distribution of the corners over the MUAV, as illustrated in Fig. 4(a). The detected corners are described using the BRIEF35 algorithm and matched with the corners from the previous frame, , to obtain the matches . At this point, the set of matches contains corners located on the drone but also mismatches, corners on the background and corners on the propellers. In the next step, the unwanted corners (mismatch, background, and propellers) are filtered out by estimating the essential matrix36 [see Fig. 4(b)]. Hereafter , , and correspond to the filtered data set.
Pseudocode for the pose estimation of the MUAV from two successive views.
|Input: Current frame , previous pose , previous detected corners , from the CMT, threshold , skeleton|
|1: Detect corners within using the FAST algorithm|
|2: Match with using BRIEF descriptors and Hamming distance|
|3: Filter out propellers and background motion from by estimating the essential matrix|
|4: Use to project into|
|5: Associate with , such that|
|6: Align on rays issued from using as initial conditions|
To estimate the pose of the drone, we need to associate the detected corners with points on the skeleton. A pinhole camera model is used to project the skeleton in the image plane. The camera is placed in the origin of the coordinate system with the standard axes, i.e., the depth of the scene is along the positive -axis and the -axis points to the ground. Using homogeneous coordinates, the camera projects the skeleton to according to37
Further, in Fig. 4(c), we couple the detected corners and the projected skeleton points that are close together, , and . If a skeleton point satisfies the condition with several corners, we keep the closest couple. From this we obtain subsets of and called and . It is then straightforward to obtain , , and couples with and .
For each corner in , a ray passing through the camera center (the origin) and the corner is computed. The pose of the drone in image , , is then updated by minimizing the distance between each element of and their corresponding rays (see Fig. 5). The minimization is performed using as a starting point38 algorithm, which is implemented in SciPy.39 Bounds are used on the angular components of the pose to obtain meaningful results according to the MUAV specifications (see Table 2 in Sec. 6). The whole pose estimation algorithm is summarized as a pseudocode in Algorithm 2.
The resulting pose vector of the current frame and the state vector of the previous frame are used to estimate the current state vector using the MUAV flight behavior model in Sec. 6 and the Kalman filter in Sec. 5. Orientation (, , and ) and velocity are needed to estimate the current acceleration from the flight model [Eq. (15)], taking into account the gravitation, thrust, and aerodynamic drag force. The determined acceleration vector and the positions are then fed into the Kalman filter as measurement values () to update the MUAV state vector . Finally, the -frame prediction is calculated as mentioned previously.
Figure 6 shows the results from the analysis of the same image sequence as used in Sec. 2. Owing to the missing reliable data for ground truth for the MUAV position, a reference for the 3-D flight path (—) is calculated from the state vectors (). In Fig. 6(a), this track is compared to the five-frame prediction (gray bullets). A good agreement between flight path and prediction can be observed in area of constant or linear movement (no rapid change of flight direction), while at turning points, where the flight direction changes rapidly, the algorithm still tends to overestimate the flight motion.
In Fig. 6(b), a detailed analysis is presented. In the upper diagram, the amplitudes of the motion components, i.e., absolute velocity and absolute acceleration, are depicted. These values are directly derived from the algorithm. The velocity and acceleration are measured in standard units (m/s and ).
In the lower diagram, the prediction error, that is L2-norm distance between predicted and ground-truth positions, is plotted for each frame. In most frames, this prediction error is . Only in a few sections of the sequence does this error exceed a value of 50 cm. These are the parts of the track where rapid changes of the flight direction occur. Further, significant prediction errors occur in the same frames as in the 2-D case [compare Fig. 2(b)], only.
A more detailed discussion of the prediction error can be derived from Fig. 7. Here, the prediction error is plotted for each coordinate axis as the difference between the MUAV flight path (or track) and the predicted positions. Along the three coordinate axes, the prediction error is scattered around the zero position, giving evidence that no systematic error is present. In height () and width (), the error is scattered with marginal standard deviation of and , respectively. Owing to a more dynamic flight behavior, the standard variation along the range axis () is . Here, we want to point out that the retrieval of 3-D information from a single view in a 2-D intensity image is a challenging task, especially for range estimation. The statistical analysis of the whole sequence of 319 frames is summarized in Table 1.
Summary of statistics on the prediction error in a sequence of 319 frames.
|Data||Mean (m)||Std. deviation σ (m)||Variance σ2 (m2)||Min. (m)||Max. (m)|
We want to point out again that the 3-D information was carried out from pose estimation in 2-D intensity images. Further, in the analyzed sequences, the MUAV was observed under a very gradual observation angle. Here, a more slanted observation angle could help to increase the accuracy of the spatial position estimation.
We successfully demonstrated the capability to extract 3-D information from 2-D intensity images by applying pose estimation with a simple skeleton model. From this pose information (i.e., position and orientation) and a simplified flight behavior model (with a floating body approximation), we could calculate the state vector that describes the current MUAV flight operation. Further, using a motion model that includes velocity and acceleration, we could predict the position of the MUAV in the near future.
The overall performance gives evidence that our approach is capable of resulting in a good first estimation of the future MUAV flight behavior. Unfortunately, in dynamic operation that includes rapid changes in velocity and direction, the underlying model tends to an overshooting effect. In future investigation, this error could be diminished by the use of a higher order of temporal derivative.
In our experiments, we used a single view from a single camera to retrieve a 3-D pose estimation by mainly analyzing the position of the four rotors. Further, the experiments were carried out at a very narrow viewing angle close to the MUAV principal plane. In these conditions, sometimes the line of sight on one or two rotors is blocked by the MUAV main body. Then, the pose estimation uses only the remaining visible rotors (three visible rotors are optimal). This situation is sufficient for a reliable pose estimation and has only a slight impact on the overall estimation of the 3-D position. In the later application in outdoor scenarios, we assume to have a steeper viewing angle, which should allow for an even better pose estimation.
To date, we have investigated our approach only in laboratory conditions as a proof-of-principle study. But, image data recorded in outdoor scenarios under operational conditions will contain a lot of additional disruptive factors such as signal-to-noise ratio, resolution, lighting conditions, and background. Further, MUAV will fly at larger ranges. Here, an active imaging device with a very narrow field of view could counter these challenges and could be used to gate out the background, increase resolution and signal-to-noise ratio, and be independent from lighting conditions. Our team has access to a shortwave-infrared laser gated-viewing system as presented by Christnacher et al.10 In the near future, we will perform tests to investigate the pose estimation approach in operational conditions. Lighting conditions and observing MUAV flight behavior at a larger range can reduce the ranging accuracy of the proposed pose estimation approach due to an expected smaller scaling effect of the MUAV with the range at narrower observation angles. This effect could be compensated for by ranging capabilities of laser imaging devices. Notwithstanding this, the determination of the tilt angles ( and ) and, thus, the application of the flight model should not be affected as long as the target is imaged with a high resolution.
Further, in the presented work, we have tested our approach on a single known MUAV, but we are convinced that our approach has a general nature and it could be easily adapted to other multicopter MUAV. Finally, the tracking and prediction of the flight behavior of unknown MUAV is possible if a classification of the target is done previously. As long as we have a proper skeleton model, this approach should be able to track and predict flight behavior of any multicopter MUAV. For the prediction of other MUAV types, such as fixed-wing or vertical takeoff and landing, the flight model has to be adapted carefully and in detail.
Appendix A: Brief Description of the Kalman Filter
The Kalman filter is an iterative forward recursion process that consists of two steps: the “time update” and the “measurement update.”22 Within the “time update,” for the ’th measurement, the state vector and the error covariance are projected from previous values using the linear stochastic differential equation
Here, Eq. (10) reflects the process model as a system of motion equations with state-transition matrix , the previous state vector , and the control-input model (in our case, ). Analogously, the projected error covariance matrix is calculated in Eq. (11) using the error covariance and the process noise covariance matrix .
Then in “measurement update,” the state vector is updated by weighting the measurement and the projected state vector with the Kalman gain . Finally, the error covariance is updated
Here, the matrices and describe the measurement (or observation) process and covariance.
Appendix B: The Microunmanned Aerial Vehicles Flight Behavior Model
The MUAV flight behavior can be described by the motion in Eq. (15). Here, is the mass of the MUAV and is the resulting force that accelerates the MUAV in any direction. This force is the sum of the gravitational force , the total thrust of all propellers, and the drag force . Owing to the lack of information about the single motor thrusts, rotational motion is neglected in this model
The overall thrust is assumed to be perpendicular to the MUAV main plane, that is, the plane in which all motors lie. Here, is oriented in the direction of the normal vector . We define the thrust amplitude as and the thrust vector as . The thrust amplitude has positive values . Further, in a first approximation, we assume a floating body that means the vertical thrust () compensates for the gravitational force
From this assumption we can determine the forces induced by the thrust in the other direction
Finally, we use a simplified model based on Stokes’s law to determine the drag force with as an aerodynamic constant. This force describes the aerodynamic drag on the MUAV that is proportional to the velocity. The aerodynamic constant can be determined experimentally.40,41 In this paper, we assume an isotropic aerodynamic coefficient of and a maximum thrust of , which can be calculated from Eq. (15) and the technical specification of the investigated MUAV.34 Details of this calculation are summarized in Sec. 7.
From Eqs. (15)–(18), we can derive a calculus to determine the accelerations in the -, - and -directions from the MUAV tilt angles ( and ), which is analogous to the 3-D pose of the MUAV
Further, stable flight operation has to comply with the condition that the thrust amplitude is limited by the maximum thrust by . This condition has a direct impact on the expected values of the normal vector [see Eq. (22)] and is analogous to the definition of a maximum tilt angle
Appendix C: Estimation of Aerodynamic Coefficient and Maximum Thrust
With the definitions of the gravitational force , the thrust , and the drag force as
In the following, the aerodynamic parameters and will be derived from this equation system using manufacturer specifications for weight, maximum horizontal and vertical velocity, and maximum tilt angle of the investigated MUAV.34 The values are summarized in Table 2.
Physical flight parameter Tmax and kd for the investigated MUAV calculated from the manufacturer specifications.34
|Parameter||Symbol||Unit||DJI phantom 3|
|Max ascent speed||m/s||5|
|Max horizontal speed||m/s||16|
|Max tilt angle||deg||35|
As illustrated in Fig. 8, we can distinguish two borderline flight operations: maximum horizontal velocity and maximum vertical velocity.
Maximum Horizontal Velocity
First, we assume a flight operation with maximal horizontal velocity in the -direction. In this case, the MUAV will be tilted to maximum inclination toward the flight direction, which is given by the maximum tilt angle . Further, in the horizontal direction (), the velocity is equal to , the maximum horizontal velocity.
Further, due to borderline flight operation, the velocities in the other directions and all accelerations are zero, and . The thrust is at maximum level, , but, due to the floating body assumption, its vertical component has to compensate for the gravitational force, that is . Thus, the maximal thrust can be calculated as
Then, the aerodynamic constant can be calculated from Eq. (25) as
Although we have already determined the values for both and , these results can be validated by the second case, that is, maximum vertical velocity.
Maximum Vertical Velocity
Here, we can assume a flight operation with maximum vertical velocity . In this case, the inclination angle is zero, , and the maximum thrust pushes the MUAV upward in the -direction, . The velocities in the horizontal directions are zero, , as well as any accelerations, . Then, from Eq. (26) and with the result for either or of the previous case, we can again derive
Both results agree with the previously found values.
The authors would like to thank Dr. Sebastien Hengy from ISL’s acoustics group for carefully operating the MUAV during our indoor experiments.
Martin Rebert received his master’s degree in engineering from Télécom Bretagne (France) in 2015. He is now preparing for his PhD with the collaboration of the IRIMAS laboratory of the University of Haute Alsace in Mulhouse (France) and the French–German Research Institute of Saint-Louis in Saint-Louis (France). His work is focused on computer vision and autonomous navigation.
Stéphane Schertzer is a research engineer at the French–German Research Institute of Saint-Louis (ISL) in Saint-Louis (France). He received his MSc degree in electrical engineering and computer sciences from the University of Haute-Alsace in Mulhouse (France) in 2009. He specializes in computer vision, active imaging and software development (C++, CUDA etc.) of algorithms and man–machine interfaces. He is currently developing new algorithms for active imaging systems.
Martin Laurenzis received his MS degree in physics from the University of Dortmund (Germany) in 1999 and his PhD in electrical engineering and information technology from the University of Aachen (Germany) in 2005. He has been with the French–German Research Institute of Saint–Louis (France) since 2004. His research interests are focused on microunmanned aerial vehicle detection using heterogeneous sensor networks, applied computer vision, and computational imaging.