Real-time motion video analysis is a challenging and exhausting task for the human observer, particularly in safety and
security critical domains. Hence, customized video analysis systems providing functions for the analysis of subtasks like
motion detection or target tracking are welcome. While such automated algorithms relieve the human operators from
performing basic subtasks, they impose additional interaction duties on them. Prior work shows that, e.g., for interaction
with target tracking algorithms, a gaze-enhanced user interface is beneficial.
In this contribution, we present an investigation on interaction with an independent motion detection (IDM) algorithm.
Besides identifying an appropriate interaction technique for the user interface – again, we compare gaze-based and
traditional mouse-based interaction – we focus on the benefit an IDM algorithm might provide for an UAS video analyst.
In a pilot study, we exposed ten subjects to the task of moving target detection in UAS video data twice, once performing
with automatic support, once performing without it. We compare the two conditions considering performance in terms of
effectiveness (correct target selections). Additionally, we report perceived workload (measured using the NASA-TLX
questionnaire) and user satisfaction (measured using the ISO 9241-411 questionnaire).
The results show that a combination of gaze input and automated IDM algorithm provides valuable support for the
human observer, increasing the number of correct target selections up to 62% and reducing workload at the same time.
Motion video analysis is a challenging task, particularly if real-time analysis is required. It is therefore an important issue how to provide suitable assistance for the human operator. Given that the use of customized video analysis systems is more and more established, one supporting measure is to provide system functions which perform subtasks of the analysis. Recent progress in the development of automated image exploitation algorithms allow, e.g., real-time moving target tracking. Another supporting measure is to provide a user interface which strives to reduce the perceptual, cognitive and motor load of the human operator for example by incorporating the operator’s visual focus of attention. A gaze-enhanced user interface is able to help here. This work extends prior work on automated target recognition, segmentation, and tracking algorithms as well as about the benefits of a gaze-enhanced user interface for interaction with moving targets. We also propose a prototypical system design aiming to combine both the qualities of the human observer’s perception and the automated algorithms in order to improve the overall performance of a real-time video analysis system. In this contribution, we address two novel issues analyzing gaze-based interaction with target tracking algorithms. The first issue extends the gaze-based triggering of a target tracking process, e.g., investigating how to best relaunch in the case of track loss. The second issue addresses the initialization of tracking algorithms without motion segmentation where the operator has to provide the system with the object’s image region in order to start the tracking algorithm.
Motion video analysis is a challenging task, especially in real-time applications. In most safety and security critical applications, a human observer is an obligatory part of the overall analysis system. Over the last years, substantial progress has been made in the development of automated image exploitation algorithms. Hence, we investigate how the benefits of automated video analysis can be integrated suitably into the current video exploitation systems. In this paper, a system design is introduced which strives to combine both the qualities of the human observer’s perception and the automated algorithms, thus aiming to improve the overall performance of a real-time video analysis system. The system design builds on prior work where we showed the benefits for the human observer by means of a user interface which utilizes the human visual focus of attention revealed by the eye gaze direction for interaction with the image exploitation system; eye tracker-based interaction allows much faster, more convenient, and equally precise moving target acquisition in video images than traditional computer mouse selection. The system design also builds on prior work we did on automated target detection, segmentation, and tracking algorithms. Beside the system design, a first pilot study is presented, where we investigated how the participants (all non-experts in video analysis) performed in initializing an object tracking subsystem by selecting a target for tracking. Preliminary results show that the gaze + key press technique is an effective, efficient, and easy to use interaction technique when performing selection operations on moving targets in videos in order to initialize an object tracking function.
For many tasks in the fields of reconnaissance and surveillance it is important to know the spatial location represented by the imagery to be exploited. A task involving the assessment of changes, e.g. the appearance or disappearance of an object of interest at a certain location, can typically not be accomplished without spatial location information associated with the imagery. Often, such georeferenced imagery is stored in an archive enabling the user to query for the data with respect to its spatial location. Thus, the user is able to effectively find spatially corresponding imagery to be used for change detection tasks. In the field of exploitation of video taken from unmanned aerial systems (UAS), spatial location data is usually acquired using a GPS receiver, together with an INS device providing the sensor orientation, both integrated in the UAS. If during a flight valid GPS data becomes unavailable for a period of time, e.g. due to sensor malfunction, transmission problems or jamming, the imagery gathered during that time is not applicable for change detection tasks based merely on its georeference. Furthermore, GPS and INS inaccuracy together with a potentially poor knowledge of ground elevation can also render location information inapplicable. On the other hand, change detection tasks can be hard to accomplish even if imagery is well georeferenced as a result of occlusions within the imagery, due to e.g. clouds or fog, or image artefacts, due to e.g. transmission problems. In these cases a merely georeference based approach to find spatially corresponding imagery can also be inapplicable. In this paper, we present a search method based on the content of the images to find imagery spatially corresponding to given imagery independent from georeference quality. Using methods from content-based image retrieval, we build an image database which allows for querying even large imagery archives efficiently. We further evaluate the benefits of this method in the context of a video exploitation workflow on the basis of its integration into our video archive system.
A frequently occurring interaction task in UAS video exploitation is the marking or selection of objects of interest in the
video. If an object of interest is visually detected by the image analyst, its selection/marking for further exploitation,
documentation and communication with the team is a necessary task. Today object selection is usually performed by
mouse interaction. As due to sensor motion all objects in the video move, object selection can be rather challenging,
especially if strong and fast and ego-motions are present, e.g., with small airborne sensor platforms. In addition to that,
objects of interest are sometimes too shortly visible to be selected by the analyst using mouse interaction. To address this
issue we propose an eye tracker as input device for object selection. As the eye tracker continuously provides the gaze
position of the analyst on the monitor, it is intuitive to use the gaze position for pointing at an object. The selection is
then actuated by pressing a button. We integrated this gaze-based “gaze + key press” object selection into Fraunhofer
IOSB's exploitation station ABUL using a Tobii X60 eye tracker and a standard keyboard for the button press.
Representing the object selections in a spatial relational database, ABUL enables the image analyst to efficiently query
the video data in a post processing step for selected objects of interest with respect to their geographical and other
properties. An experimental evaluation is presented, comparing gaze-based interaction with mouse interaction in the
context of object selection in UAS videos.
Image sequences (e.g. video) gathered by a sensor mounted on an airborne platform (e.g. UAV) are used today to
address many different tasks in various fields of application. Sequences are usually taken to gather information of an area
for planning and assessing purposes, to witness any changes and to monitor activities within that area. Image sequences
are usually stored as they are taken. In order to perform the above tasks in a post processing step properly, it is necessary
to find relevant sequences or subsequences in the huge amount of stored data efficiently. Therefore it is mandatory to
store the sequences in a way to enable retrieving any relevant frame or subsequence with respect to a geographical
attribute such as e.g. the position of the footprint or a nongeographical attribute such as the date and time gathered or the
spectral band of gathered sequence. We have developed a method to store each frame of an image sequence into a spatial
relational database in a way that addresses this issue. We further have developed an interface to that database that allows
us to retrieve frames and subsequences both employing task specific clients and existing exploitation software systems
such as Fraunhofer IOSB's ABUL exploitation station.
Small and medium sized UAVs like German LUNA have long endurance and define in combination with sophisticated
image exploitation algorithms a very cost efficient platform for surveillance. At Fraunhofer IOSB, we have
developed the video exploitation system ABUL with the target to meet the demands of small and medium sized
UAVs. Several image exploitation algorithms such as multi-resolution, super-resolution, image stabilization, geocoded
mosaiking and stereo-images/3D-models have been implemented and are used with several UAV-systems.
Among these algorithms is the moving target detection with compensation of sensor motion. Moving objects
are of major interest during surveillance missions, but due to movement of the sensor on the UAV and small
object size in the images, it is a challenging task to develop reliable detection algorithms under the constraint of
real-time demands on limited hardware resources. Based on compensation of sensor motion by fast and robust
estimation of geometric transformations between images, independent motion is detected relatively to the static
background. From independent motion cues, regions of interest (bounding-boxes) are generated and used as
initial object hypotheses. A novel classification module is introduced to perform an appearance-based analysis of
the hypotheses. Various texture features are extracted and evaluated automatically for achieving a good feature
selection to successfully classify vehicles and people.
UAV have a growing importance for reconnaissance and surveillance. Due to improved technical capability also small
UAVs have an endurance of about 6 hours, but less sophisticated sensors due to strong weight limitations. This puts a
high strain and workload on the small teams usually deployed with such systems. To lessen the strain for photo
interpreters and to improve the capability of such systems we have developed and integrated automatic image
exploitation algorithms. An import aspect is the detection of moving objects to give the photo interpreter (PI) hints were
such objects are. Mosaiking of imagery helps to gain better oversight over the scene. By computing stereo-mosaics from
mono-ocular video-data also 3-d-models can be derived from tactical UAV-data in a further processing step. A special
instrument of gaining oversight is to use multi-temporal and multifocal images of video-sensors with different resolution
of the platform and to fusion them into one image. This results in a good situation awareness of the scene with a light-weight
sensor-platform and a standard video link.
For surveillance and reconnaissance tasks small UAVs are of growing importance. These UAVs have an endurance of
several hours, but a small payload of about some kilograms. As a consequence lightweight sensors and cameras have to
be used without having a mechanical stabilized high precision sensor-platform, which would exceed the payload and cost
An example of such a system is the German UAV Luna with optical and IR sensors on board. For such platforms we
developed image exploitation algorithms. The algorithms comprise mosaiking, stabilization, image enhancement, video
based moving target indication, and stereo-image generation. Other products are large geo-coded image mosaics, stereo
mosaics, and 3-D-model generation. For test and assessment of these algorithms the experimental system ABUL has
been developed, in which the algorithms are integrated. The ABUL system is used for tests and assessment by military
The miniature SAR-system MiSAR has been developed by EADS Germany for lightweight UAVs like the LUNASystem.
MiSAR adds to these tactical UAV-systems the all-weather reconnaissance capability, which is missing until
now. Unlike other SAR sensors, that produce large strip maps at update rates of several seconds, MiSAR generates
sequences of SAR images with approximately 1 Hz frame rate.
photo interpreters (PI) of tactical drones, now mainly experienced with visual interpretation, are not used to SARimages,
especially not with SAR-image sequence characteristics. So they should be supported to improve their ability to
carry out their task with a new, demanding sensor system. We have therefore analyzed and discussed with military PIs in
which task MiSAR can be used and how the PIs can be supported by special algorithms.
We developed image processing- and exploitation-algorithms for such SAR-image sequences. A main component is the
generation of image sequence mosaics to get more oversight. This mosaicing has the advantage that also non straight
/linear flight-paths and varying squint angles can be processed. Another component is a screening-component for manmade
objects to mark regions of interest in the image sequences. We use a classification based approach, which can be
easily adapted to new sensors and scenes. These algorithms are integrated into an image exploitation system to improve
the image interpreters ability to get a better oversight, better orientation and helping them to detect relevant objects,
especially considering long endurance reconnaissance missions.
This contribution describes the results of a collaboration the objective of which was to technically validate an assessment approach for automatic target recognition (ATR) components1. The approach is intended to become a standard for component specification and acceptance test during development and procurement and includes the provision of appropriate tools and data.
The collaboration was coordinated by the German Federal Office for Defense Technology and Procurement (BWB). Partners besides the BWB and the group Assessment of Fraunhofer IITB were ATR development groups of EADS Military Aircraft, EADS Dornier and Fraunhofer IITB.
The ATR development group of IITB contributed ATR results and developer's expertise to the collaboration while the industrial partners contributed ATR results and their expertise both from the developer's and the system integrator's point of view. The assessment group's responsibility was to provide task-relevant data and assessment tools, to carry out performance analyses and to document major milestones.
The result of the collaboration is twofold: the validation of the assessment approach by all partners, and two approved benchmarks for specific military target detection tasks in IR and SAR images. The tasks are defined by parameters including sensor, viewing geometries, targets, background etc. The benchmarks contain IR and SAR sensor data, respectively. Truth data and assessment tools are available for performance measurement and analysis. The datasets are split into training data for ATR optimization and test data exclusively used for performance analyses during acceptance tests. Training data and assessment tools are available for ATR developers upon request.
The work reported in this contribution was supported by the German Federal Office for Defense Technology and Procurement (BWB), EADS Dornier, and EADS Military Aircraft.
An international multisensor measurement campaign called "MUSTAFA" yielded many infrared image sequences of differently camouflaged targets. The image sequences were acquired by a helicopter sensor
platform approaching the targets. The effectiveness of the various camouflage methods still has to be evaluated. Apart from observer experiments, FGAN/FOM and IITB pursue an ATR (Automatic Target Recognition) -based method for the automatic evaluation of the camouflage variants. The ATR approach consists basically of the
detection component of an ATR for reconnaissance purposes in forward-looking infrared image sequences (FLIR). Given some flight and sensor parameters the algorithm can report detection hypothesizes together with a measure of confidence and the detection range for each hypothesis. Proceeding on the assumption that better camouflage yields late automatic detection of the corresponding target in approaching image sequences, the detection range output of the algorithm could be an additional criteria for camouflage evaluation. The paper presents some aspects of the reconnaissance detection algorithm, detection ranges for exemplary image sequences of the MUSTAFA data set, and future options: e.g. real-time operation in the sensor platform during the measurement campaign.
It is well known that background characteristics have an impact on target signature characteristics. There are many types of backgrounds that are relevant for military application purposes; e.g. wood, grass, urban, or water areas. Current algorithms for automatic target detection and recognition (ATR) usually do not distinguish between these types of background. At most they have some sort of adaptive behavior. An important first step for our approaches is the automatic geo-coding of the images. An accurate geo-reference is necessary for using a GIS to define Regions of Expectations (ROE-i.e. image background regions with geographical semantics and known signature characteristics) in the image and for fusing the (multiple) sensor data. These ROEs could be road surfaces, forest areas or forest edge areas, water areas, and others. The knowledge about the background characteristics allows the development of a method base of dedicated algorithms. According to the sensor and the defined ROEs the most suitable algorithms can be selected form the method base and applied during operation. The detection and recognition results of the various algorithms can be fused due to the registered sensor data.
Up to now most approaches of target and background characterization (and exploitation) concentrate solely on the information given by pixels. In many cases this is a complex and unprofitable task. During the development of automatic exploitation algorithms the main goal is the optimization of certain performance parameters. These parameters are measured during test runs while applying one algorithm with one parameter set to images that constitute of image domains with very different domain characteristics (targets and various types of background clutter). Model based geocoding and registration approaches provide means for utilizing the information stored in GIS (Geographical Information Systems). The geographical information stored in the various GIS layers can define ROE (Regions of Expectations) and may allow for dedicated algorithm parametrization and development. ROI (Region of Interest) detection algorithms (in most cases MMO (Man- Made Object) detection) use implicit target and/or background models. The detection algorithms of ROIs utilize gradient direction models that have to be matched with transformed image domain data. In most cases simple threshold calculations on the match results discriminate target object signatures from the background. The geocoding approaches extract line-like structures (street signatures) from the image domain and match the graph constellation against a vector model extracted from a GIS (Geographical Information System) data base. Apart from geo-coding the algorithms can be also used for image-to-image registration (multi sensor and data fusion) and may be used for creation and validation of geographical maps.