Capturing vital signs, specifically heart rate and oxygen saturation, is essential in care situations. Clinical pulse oximetry solutions work contact-based by clips or otherwise fixed sensor units which have sometimes undesired impact on the patient. A typical example would be pre-term infants in neonatal care which require permanent monitoring and have a very fragile skin. This requires a regular change of the sensor unit location by the staff to avoid skin damage. To improve patient comfort and to reduce care effort, a feasibility study with a camera-based passive optical method for contactless pulse oximetry from a distance is performed. In contrast to most existing research on contactless pulse oximetry, a task-optimized multi-spectral sensor unit instead of a standard RGB-camera is proposed. This first allows to avoid the widely used green spectral range for distant heart rate measurement, which is unsuitable for pulse oximetry due to nearly equal spectral extinction coefficients of saturated oxy-hemoglobin and non-saturated hemoglobin. Second, it also better addresses the challenge of the worse signal-to-noise ratio than in the contact-based or active measurement, e.g., caused by background illumination. Signal noise from background illumination is addressed in several ways. The key part is an automated reference measurement of background illumination by automated patient localization in the acquired images by extraction of skin and background regions with a CNN-based detector. Due to the custom spectral ranges, the detector is trained and optimized for this specific setup. Altogether, allowing a contactless measurement, the studied concept promises to improve the care of patients where skin contact has negative effects.
Monitoring of the heart rhythm is the cornerstone of the diagnosis of cardiac arrhythmias. It is done by means of electrocardiography which relies on electrodes attached to the skin of the patient. We present a new system approach based on the so-called vibrocardiogram that allows an automatic non-contact registration of the heart rhythm. Because of the contactless principle, the technique offers potential application advantages in medical fields like emergency medicine (burn patient) or premature baby care where adhesive electrodes are not easily applicable. A laser-based, mobile, contactless vibrometer for on-site diagnostics that works with the principle of laser Doppler vibrometry allows the acquisition of vital functions in form of a vibrocardiogram. Preliminary clinical studies at the Klinikum Karlsruhe have shown that the region around the carotid artery and the chest region are appropriate therefore. However, the challenge is to find a suitable measurement point in these parts of the body that differs from person to person due to e. g. physiological properties of the skin. Therefore, we propose a new Microsoft Kinect-based approach. When a suitable measurement area on the appropriate parts of the body are detected by processing the Kinect data, the vibrometer is automatically aligned on an initial location within this area. Then, vibrocardiograms on different locations within this area are successively acquired until a sufficient measuring quality is achieved. This optimal location is found by exploiting the autocorrelation function.
In many camera-based systems, person detection and localization is an important step for safety and security applications
such as search and rescue, reconnaissance, surveillance, or driver assistance. Long-wave infrared (LWIR) imagery promises
to simplify this task because it is less affected by background clutter or illumination changes. In contrast to a lot of related
work, we make no assumptions about any movement of persons or the camera, i.e. persons may stand still and the camera
may move or any combination thereof. Furthermore, persons may appear arbitrarily in near or far distances to the camera
leading to low-resolution persons in far distances. To address this task, we propose a two-stage system, including a proposal
generation method and a classifier to verify, if the detected proposals really are persons. In contradiction to use all possible
proposals as with sliding window approaches, we apply Maximally Stable Extremal Regions (MSER) and classify the
detected proposals afterwards with a Convolutional Neural Network (CNN). The MSER algorithm acts as a hot spot
detector when applied to LWIR imagery. Because the body temperature of persons is usually higher than the background,
they appear as hot spots in the image. However, the MSER algorithm is unable to distinguish between different kinds of hot
spots. Thus, all further LWIR sources such as windows, animals or vehicles will be detected, too. Still by applying MSER,
the number of proposals is reduced significantly in comparison to a sliding window approach which allows employing the
high discriminative capabilities of deep neural networks classifiers that were recently shown in several applications such
as face recognition or image content classification. We suggest using a CNN as classifier for the detected hot spots and
train it to discriminate between person hot spots and all further hot spots. We specifically design a CNN that is suitable for
the low-resolution person hot spots that are common with LWIR imagery applications and is capable of fast classification.
Evaluation on several different LWIR person detection datasets shows an error rate reduction of up to 80 percent compared
to previous approaches consisting of MSER, local image descriptors and a standard classifier such as an SVM or boosted
decision trees. Further time measurements show that the proposed processing chain is capable of real-time person detection
in LWIR camera streams.
In this contribution we propose methods for vehicle detection and tracking for the Advanced Driver Assistance
Systems (ADAS) that work under extremely adverse weather conditions. Most of the state-of-the-art vehicle
detection and tracking methods are based either on appearance based vehicle recognition or on extraction and
tracking of dedicated image key points. Visibility deterioration due to rain drops and water streaks on the
windshield, swirling spray, and fog lead to a drastic performance reduction or even to a complete failure of these
approaches. In this contribution we propose several methods for coping with these phenomena. In addition to
an extension of the feature-based tracking method, which copes with outliers and temporarily disappearing key
points, we present a detection and tracking method based on search for vehicle rear lights and whole rear views
in the saturation channel. Utilization of symmetry operators and search space restriction allows to detect and
track vehicles even in pouring rain conditions. Furthermore, we present two applications of the above-described
methods. Estimation of the strength of spray produced by preceding vehicles allows to draw conclusions about
the overall visibility conditions and to adjust the intensity of one's own rear lights. Besides, a restoration of
deteriorated image regions becomes possible.
Counting people in crowds is a common problem in visual surveillance. Many solutions are just designed to count
less than one hundred people. Only few systems have been tested on large crowds of several hundred people and
no known counting system has been tested on crowds of several thousand people. Furthermore, none of these
large scale systems delivers people's positions, they just estimate the number. But having the position of people
would be a large benefit, since this would enable a human observer to carry out a plausibility check. In addition,
most approaches require video data as input or a scene model. In order to generally solve the problem, these
assumptions must not be made. We propose a system that can count people on single aerial images including
mosaic images generated from video data. No assumptions about crowd density will be made, i. e. the system
has to work from low to very high density. The main challenge is the large variety of possible input data. Typical
scenarios would be public events such as demonstrations or open air concerts. Our system uses a model-based
detection of individual humans. This includes the determination of their positions and the total number. In
order to cope with the given challenges we divide our system into three steps: foreground segmentation, person
size determination and person detection. We evaluate our proposed system on a variety of aerial images showing
large crowds with up to several thousand people
Unveiling unusual or hostile events by observing manifold moving persons in a crowd is a challenging task for human
operators, especially when sitting in front of monitor walls for hours. Typically, hostile events are rare. Thus, due to
tiredness and negligence the operator may miss important events. In such situations, an automatic alarming system is
able to support the human operator. The system incorporates a processing chain consisting of (1) people tracking, (2)
event detection, (3) data retrieval, and (4) display of relevant video sequence overlaid by highlighted regions of interest.
In this paper we focus on the event detection stage of the processing chain mentioned above. In our case, the selected
event of interest is the encounter of people. Although being based on a rather simple trajectory analysis, this kind of
event embodies great practical importance because it paves the way to answer the question "who meets whom, when and
where". This, in turn, forms the basis to detect potential situations where e.g. money, weapons, drugs etc. are handed
over from one person to another in crowded environments like railway stations, airports or busy streets and places etc..
The input to the trajectory analysis comes from a multi-object video-based tracking system developed at IOSB which is
able to track multiple individuals within a crowd in real-time . From this we calculate the inter-distances between all
persons on a frame-to-frame basis. We use a sequence of simple rules based on the individuals' kinematics to detect the
event mentioned above to output the frame number, the persons' IDs from the tracker and the pixel coordinates of the
meeting position. Using this information, a data retrieval system may extract the corresponding part of the recorded
video image sequence and finally allows for replaying the selected video clip with a highlighted region of interest to
attract the operator's attention for further visual inspection.
Besides resolution, an important performance parameter of a FIR camera is the sensitivity. It depends on the
sensitivity of the detector array itself and the characteristics of the optic. The effects of the optic are considerably
driven by the f-number, with high values resulting in decreased sensitivity, but providing the possibility for simple
lens design and cheaper production costs. In this contribution 4 different sensor setups with different optics are
evaluated for their impact on the performance of trained pedestrian classifiers.
To overcome the expensive and time consuming process of ground truth generation for multiple sensors, an
approach for reusing available high sensitivity reference data is presented. Classifiers are trained on specially
transformed reference data with characteristics of sensors with degraded sensitivity.
For the evaluation of the classifiers, data of real world road scenarios is collected simultaneously with the
target sensors mounted in parallel in a test vehicle, following a detailed script for recording a pedestrian scene
test catalogue. This allows for a direct analysis and comparison of the different sensors and their impact on the
Military Operations in Urban Terrain (MOUT) require the capability to perceive and to analyse the situation around a
patrol in order to recognize potential threats. As in MOUT scenarios threats usually arise from humans one important
task is the robust detection of humans.
Detection of humans in MOUT by image processing systems can be very challenging, e.g., due to complex outdoor
scenes where humans have a weak contrast against the background or are partially occluded. Porikli et al. introduced
covariance descriptors and showed their usefulness for human detection in complex scenes. However, these descriptors
do not lie on a vector space and so well-known machine learning techniques need to be adapted to train covariance
descriptor classifiers. We present a novel approach based on manifold learning that simplifies the classification of
In this paper, we apply this approach for detecting humans. We describe our human detection method and evaluate the
detector on benchmark data sets generated from real-world image sequences captured during MOUT exercises.
The performance of perceptive systems depends on a large number of factors. The practical problem during
development is, that this dependency is very often not explicitly known. In this contribution we address this
problem and present an approach to evaluate perception performance, as a function of e.g. quality of the sensor
data. The approach is to use standardized quality metrics for imaging sensors, and to relate them to the observed
performance of the environment perception. During our experiments, several imaging setups were analyzed. The
output of each setup is processed offline to track down performance differences with respect to the quality of
sensor data. We show how and to what extend the measurement of the Modulation Transfer Function (MTF)
using standardized tests can be applied to evaluate the performance of imaging systems. The influence of the
MTF on the signal-to-noise ratio can be used to evaluate the performance on a recognition task. We assess the
measured performance by processing the data of different, simultaneously recorded imaging setups for the task
of lane recognition.
A manned platform is to be equipped with a Synthetic Aperture Radar (SAR) based Automatic Target Recognition
(ATR) system for precision targeting. The platform's airworthiness has to be approved including the ATR system, i.e. the
ATR system needs to be qualified appropriately.
Part of the airworthiness approval is a hazard analysis. In general, this is carried out to make sure that the probability of a
fatal error in one hour of flight is 10-9 or lower.
To date, error probabilities of a SAR-based ATR system, i.e. error probabilities of detection and classification, must be
assumed to lie above 10-9 per hour. This is one reason why existing rules of engagement demand "Man-in-the loop", i.e.
to display the result of the ATR system to the pilot.
Components to the ATR system are consequently
a Synthetic Aperture Radar (SAR) sensor
an Automatic Target Recognition (ATR) SAR image processing unit, and
a Human Machine Interface (HMI) to the pilot.
The aim of the work reported in this contribution was to identify those performance features of the thus defined ATR
system that are relevant to airworthiness approval, and to define the procedures to determine the feature values.
The paper contains the analysis of a reference case of an airworthiness-approved technical system with an error
probability above 10-9 per hour and a result display to the pilot. In the light of the analysis results, it concludes with an
outlook to the airworthiness approval of the ATR system.
In order to control riots in crowds, it is helpful to get the ringleader under control. A great support to achieve this task is
the capability to automatically track individual persons in a video sequence taken from a crowd. In this paper we address
the robustness of such a tracking function.
We start from the results of a previous evaluation of tracking methods, where a so-called Covariance-Tracker was found
to be most appropriate. This tracker uses covariance matrices as object descriptors, as proposed by Porikli et al. The set
of all covariance matrices describes a Riemannian manifold that is used to compare and update the covariance
descriptors during tracking.
We propose Covariance-Tracker adaptations to improve its performance. Furthermore, we summarize the performance
evaluation results of the original method and compare these with the results of the adapted one. The result is a robust
method for tracking people in crowds which can improve situational awareness.
Proc. SPIE. 7114, Electro-Optical Remote Sensing, Photonic Technologies, and Applications II
KEYWORDS: Unmanned aerial vehicles, Detection and tracking algorithms, Cameras, Sensors, Control systems, Data acquisition, Data processing, Error control coding, Information security, Situational awareness sensors
If for a given application, candidate tracking methods for humans need to be selected and optimized, then relevant sensor
and truth data as well as appropriate assessment criteria are required. In the work reported in this contribution we used
data recently collected in a riot control scenario. We then processed the sensor data using a set of tracking methods from
literature. Tracking results and truth data allowed us to deduce metrics that reflect the usefulness of a tracking method for
the selected scenario. The software implementation of the assessment criteria, together with sensor and truth data, forms
a benchmark for tracking algorithms in a riot control scenario. It can be used by developers to optimize their tracking
systems and to demonstrate their usefulness for application in a riot control scenario. The performance and robustness of
optimized tracking methods can considerably improve situational awareness in a riot control scenario.
Quick and precise response is essential for riot squads when coping with escalating violence in crowds. Often it is just a single person, known as the leader of the gang, who instigates other people and thus is responsible of excesses. Putting this single person out of action in most cases leads to a de-escalating situation. Fostering de-escalations is one of the main tasks of crowd and riot control. To do so, extensive situation awareness is mandatory for the squads and can be promoted by technical means such as video surveillance using sensor networks.
To develop software tools for situation awareness appropriate input data with well-known quality is needed. Furthermore, the developer must be able to measure algorithm performance and ongoing improvements. Last but not least, after algorithm development has finished and marketing aspects emerge, meeting of specifications must be proved.
This paper describes a multisensor benchmark which exactly serves this purpose. We first define the underlying algorithm task. Then we explain details about data acquisition and sensor setup and finally we give some insight into quality measures of multisensor data. Currently, the multisensor benchmark described in this paper is applied to the development of basic algorithms for situational awareness, e.g. tracking of individuals in a crowd.
Military Operations in Urban Terrain (MOUT) require the capability to perceive and to analyse the situation around a
patrol in order to recognize potential threats. Human operators can only observe a limited field of regard. Sensors can
enhance the field of regard up to 360°, but then the amount of data cannot be fully exploited by a human operator any
more. For this reason an intelligent assistance system is required that monitors the circumference of a moving platform
and warns the driver of a threatening situation. One first processing step of such a system is the recognition of humans.
There are numerous approaches to the detection of humans, mainly from stationary cameras. Moving cameras play a
role in the field of pedestrian protection from a moving road vehicle. There are two principal differences to this latter
application domain. Firstly, the threat in a MOUT scenario potentially arises from humans in the scene. Secondly, not
only the trajectories of individual humans are relevant, but also the motion and the behavior of groups of humans. As a
first step towards an assistance system that automatically warns drivers in a MOUT scenario, we implemented an
approach to the detection of humans in video images and applied them to a relevant set of image sequences taken in a
MOUT scenario. In the paper we assess the obtained results and outline further research activities.
This contribution describes the results of a collaboration the objective of which was to technically validate an assessment approach for automatic target recognition (ATR) components1. The approach is intended to become a standard for component specification and acceptance test during development and procurement and includes the provision of appropriate tools and data.
The collaboration was coordinated by the German Federal Office for Defense Technology and Procurement (BWB). Partners besides the BWB and the group Assessment of Fraunhofer IITB were ATR development groups of EADS Military Aircraft, EADS Dornier and Fraunhofer IITB.
The ATR development group of IITB contributed ATR results and developer's expertise to the collaboration while the industrial partners contributed ATR results and their expertise both from the developer's and the system integrator's point of view. The assessment group's responsibility was to provide task-relevant data and assessment tools, to carry out performance analyses and to document major milestones.
The result of the collaboration is twofold: the validation of the assessment approach by all partners, and two approved benchmarks for specific military target detection tasks in IR and SAR images. The tasks are defined by parameters including sensor, viewing geometries, targets, background etc. The benchmarks contain IR and SAR sensor data, respectively. Truth data and assessment tools are available for performance measurement and analysis. The datasets are split into training data for ATR optimization and test data exclusively used for performance analyses during acceptance tests. Training data and assessment tools are available for ATR developers upon request.
The work reported in this contribution was supported by the German Federal Office for Defense Technology and Procurement (BWB), EADS Dornier, and EADS Military Aircraft.
Performance prediction of computer vision algorithms is of increasing interest whenever robustness to illumination variations, shadows and different weather conditions has to be ensured. The statistical model which is presented in this contribution predicts the algorithm performance under the presence of noise, image clutter and perturbations and therefore provides an algorithm-specific measure of the underlying image quality. For the prediction of the detection performance logistic regression using covariates defined by the properties of the vehicle signatures is used. This approach provides an estimate of the probability of a single vehicle signature being detected by a given detection algorithm. To describe the relationship between background clutter and the false alarm rate of the algorithm a severity measure of the image background is presented. After the construction of the algorithm model, the probability of a vehicle signature being detected and the false alarm rate are estimated on new data. The model is evaluated and compared to the true algorithm performance.
For the exploitation of aerial and satellite imagery, human military photo interpreters need support by automatic image analysis components to meet the requirements of large data set analysis under strong time constraints. Extending the approaches of performance analysis of automatic target detection, a concept and an experimental study for the assessment of machine assisted vehicle detection is presented. This evaluation pursues the following goals: Extraction of a usability measure in terms of algorithm performance combined with user-oriented parameters. Secondly, an extraction of requirements for the image exploitation process concerning the algorithm performance, the man-machine interface and the training of the photo interpreters. A performance analysis concept for vehicle detection algorithms is presented as well as an experimental setup of the whole interactive exploitation process. This setup has been applied in an experiment with more than 100 real images and more than 40 military photo interpreters.
This contribution presents a comprehensive framework for algorithm evaluation. When we speak of evaluation, we have in mind that first the performance of an algorithm is measured and then the measured performance is assessed with regard to a given application. The performance assessment is done by applying an assessment function that uses desired values for the performance measures and weighting factors giving the importance of each measure, thus considering the application- specific requirements. The algorithm evaluation's goal is to verify the specification of an algorithm. This specification is mainly given by the definition of the input data and the expected output data, both of which are determined by the application. Prior to the evaluation process the algorithm specification has to be laid down by analyzing the application in order to deduce its requirements as well as by defining the application relevant data sets. To organize this sequence of preparatory steps and to formalize the accomplishment of the evaluation we have developed a 3-phase approach, consisting of the definition phase, the tuning phase, and the evaluation phase. An extensive software toolbox has been developed to support the evaluation process.