Recognizing human body activities from the video sequences are directly depends on the features extraction for motion analysis, which is each activity can be presented by certain motion features. Therefore, by using corresponding features, we can probably classify different activities. This idea inspires us to form the activity recognition as a classification problem and verify its feasibility. In this work, two important goals are presented. The first one is extracting the motion and texture features from RGBD sequences by proposing a feature extracting method to extract feature vector values based on the Gray-Level Co-occurrence matrices (GLCM) of the dense optical flow pattern and the well-known Haralick features from these matrices by measuring meaningful properties such as energy, contrast, homogeneity, entropy, sum average, and correlation to capture local spatial and temporal characteristics of the motion through the neighboring optical flow fields (orientation and magnitude). Secondly, we present a performance comparison of five different classifiers such as Artificial Neural Networks, Naive Bayes classifier, Random Forest, K-Nearest Neighbors, and Support Vector Machine. Various numerical experiments results are carried on four well-known public datasets (Gaming Datasets, Cornell Activity Datasets, MSR Daily Activity 3D and Online RGBD Datasets) to verify the effectiveness of these classification algorithms. From experiments, the classifiers show different performance according to the features that computed and the set of classes from different activities. And the results demonstrate that all the five algorithms achieve satisfactory activity recognition performance.
Recently released hyperspectral cameras use large, mosaiced filter patterns to capture different ranges of the light’s spectrum in each of the camera’s pixels. Spectral information is sparse, as it is not fully available in each location. We propose an online method that avoids explicit demosaicing of camera images by fusing raw, unprocessed, hyperspectral camera frames inside an ego-centric ground surface map. It is represented as a multilayer heightmap data structure, whose geometry is estimated by combining a visual odometry system with either dense 3D reconstruction or 3D laser data. We use a publicly available dataset to show that our approach is capable of constructing an accurate hyperspectral representation of the surface surrounding the vehicle. We show that in many cases our approach increases spatial resolution over a demosaicing approach, while providing the same amount of spectral information.
Efficient Cochlear Implant (CI) surgery requires prior knowledge of the cochlea’s size and its characteristics. This information helps to select suitable implants for different patients. To get these measurements, a segmentation method of cochlea medical images is needed. An important pre-processing step for good cochlea segmentation involves efficient image registration. The cochlea’s small size and complex structure, in addition to the different resolutions and head positions during imaging, reveals a big challenge for the automated registration of the different image modalities. In this paper, an Automatic Cochlea Image Registration (ACIR) method for multi- modal human cochlea images is proposed. This method is based on using small areas that have clear structures from both input images instead of registering the complete image. It uses the Adaptive Stochastic Gradient Descent Optimizer (ASGD) and Mattes’s Mutual Information metric (MMI) to estimate 3D rigid transform parameters. The use of state of the art medical image registration optimizers published over the last two years are studied and compared quantitatively using the standard Dice Similarity Coefficient (DSC). ACIR requires only 4.86 seconds on average to align cochlea images automatically and to put all the modalities in the same spatial locations without human interference. The source code is based on the tool elastix and is provided for free as a 3D Slicer plugin. Another contribution of this work is a proposed public cochlea standard dataset which can be downloaded for free from a public XNAT server.
In this work we address the problem of detecting and recognizing transparent objects using depth images from an RGB-D camera. Using this type of sensor usually prohibits the localization of transparent objects since the structured light pattern of these cameras is not reflected by transparent surfaces. Instead, transparent surfaces often appear as undefined values in the resulting images. However, these erroneous sensor readings form characteristic patterns that we exploit in the presented approach. The sensor data is fed into a deep convolutional neural network that is trained to classify and localize drinking glasses. We evaluate our approach with four different types of transparent objects. To our best knowledge, no datasets offering depth images of transparent objects exist so far. With this work we aim at closing this gap by providing our data to the public.
The image data that object recognition systems are designed for changes over time. As soon as a new imaging technology is developed or becomes affordable new algorithms are inspired or known algorithms are adapted. Thus, different object recognition algorithms were developed and used on our mobile robot Lisa. In this work we compare the different approaches and investigate how they can be combined to best use 2D and 3D data. The individual approaches as well as their combinations will be introduced. Evaluation is performed on a large public dataset and a dataset acquired during the RoboCup competition.
Plane circular targets are widely used within calibrations of optical sensors through photogrammetric set-ups. Due to this
popularity, their advantages and disadvantages are also well studied in the scientific community. One main disadvantage
occurs when the projected target is not parallel to the image plane. In this geometric constellation, the target has an
elliptic geometry with an offset between its geometric and its projected center. This difference is referred to as ellipse
eccentricity and is a systematic error which, if not treated accordingly, has a negative impact on the overall achievable
accuracy. The magnitude and direction of eccentricity errors are dependent on various factors. The most important one is
the target size. The bigger an ellipse in the image is, the bigger the error will be. Although correction models dealing
with eccentricity have been available for decades, it is mostly seen as a planning task in which the aim is to choose the
target size small enough so that the resulting eccentricity error remains negligible. Besides the fact that advanced
mathematical models are available and that the influence of this error on camera calibration results is still not completely
investigated, there are various additional reasons why bigger targets can or should not be avoided. One of them is the
growing image resolution as a by-product from advancements in the sensor development. Here, smaller pixels have a
lower S/N ratio, necessitating more pixels to assure geometric quality. Another scenario might need bigger targets due to
larger scale differences whereas distant targets should still contain enough information in the image. In general, bigger
ellipses contain more contour pixels and therefore more information. This supports the target-detection algorithms to
perform better even at non-optimal conditions such as data from sensors with a high noise level.
In contrast to rather simple measuring situations in a stereo or multi-image mode, the impact of ellipse eccentricity on
image blocks cannot be modeled in a straight forward fashion. Instead, simulations can help make the impact visible, and
to distinguish critical or less critical situations. In particular, this might be of importance for calibrations, as undetected
influence on the results will affect further projects where the same camera will be used. This paper therefore aims to
point out the influence of ellipse eccentricities on camera calibrations, by using two typical calibration bodies: planar and
cube shaped calibration. In the first step, their relevance and influence on the image measurements, object- and camera
geometry is shown with numeric examples. Differences and similarities between both calibration bodies are identified
and discussed. In the second step, practical relevance of a correction is proven in a real calibration. Finally, a conclusion
is drawn followed by recommendations to handle ellipse eccentricity in the practice.
Stray light is the part of an image that is formed by misdirected light. I.e. an ideal optic would map a point of
the scene onto a point of the image. With real optics however, some parts of the light get misdirected. This is
due to effects like scattering at edges, Fresnel reflections at optical surfaces, scattering at parts of the housing,
scattering from dust and imperfections – on and inside of the lenses – and further reasons. These effects lead to
errors in colour-measurements using spectral radiometers and other systems like scanners. Stray light is further
limiting the dynamic range that can be achieved with High-Dynamic-Range-Technologies (HDR) and can lead
to the rejection of cameras due to quality considerations. Therefore it is of interest, to measure, quantify and
correct these effects. Our work aims at measuring the stray light point spread function (stray light PSF) of
a system which is composed of a lens and an imaging sensor. In this paper we present a framework for the
evaluation of PSF-models which can be used for the correction of straylight. We investigate if and how our
evaluation framework can point out errors of these models and how these errors influence straylight correction.
An approach for darksignal-correction is presented that uses a model of each pixel's darksignal, which depends
on the sensor's settings (integration time and gain) and its temperature. It is shown how one can improve the
outcome of such a darksignal-correction strategy by using the darksignal of some pixels in order to compute
an estimate of the sensor's temperature. Experimental results indicate that the darksignals' dependency on
temperature and gain is more complex than considered in up-to-date darksignal models. In this paper it is
shown how one can cope with this complex behaviour when estimating the temperature out of the darksignal.
Experimental results indicate, that our method yields better results than using temperature measurements of
dedicated temperature sensors.
We present a novel approach for combining 3D depth and visual information for object class and object instance recognition. Object classes are recognized by first assigning local geometric primitive labels using a CRF, followed by an SVM classification. Object instances are recognized using Hough-transform clustering of SURF features. Both algorithms perform well on publicly available object databases as well as on acquired data with an RGB-D camera. The ob - ject instance recognition algorithm was further evaluated during the RoboCup world championship 2012 in Mexico-City and won the first place in the Technical Challenge of the @Home-league.
We present an approach to generate a 3D model of a building including semantic annotations from image
series. In the recent years semantic based modeling, reconstruction of buildings and building recognition became
more and more important. Semantic building models have more information than just the geometry, thus
making them more suitable for recognition or simulation tasks. The time consuming generation of such models
and annotations makes an automatism desirable. Therefore, we present a semiautomatic approach towards
semantic model generation. This approach has been implemented as a plugin for the photostitching tool Hugin*.
Our approach reduces the interaction with the system to a minimum. The resulting model contains semantic,
geometric and appearance information and is represented in City Geography Markup Language (CityGML).
Loop closing is a fundamental part of 3D simultaneous localization and mapping (SLAM) that can greatly enhance
the quality of long-term mapping. It is essential for the creation of globally consistent maps. Conceptually, loop
closing is divided into detection and optimization. Recent approaches depend on a single sensor to recognize
previously visited places in the loop detection stage. In this study, we combine data of multiple sensors such as
GPS, vision, and laser range data to enhance detection results in repetitively changing environments that are
not sufficiently explained by a single sensor. We present a fast and robust hierarchical loop detection algorithm
for outdoor robots to achieve a reliable environment representation even if one or more sensors fail.
Model-based approaches to object recognition rely on shape and contours while appearance-based approaches use information provided by the object intensity or color. Color histograms as an object characteristics are commonly used to solve this task. TheRGBcolor values formed by a camera depend heavily on the image
formation process - especially the illumination involved. Mainly for this reason color normalization algorithms are applied to estimate the impact of position and color of the illumination and eliminate or at least minimize their influence to the image appearance. Providing information about the image acquisition settings another
color normalization is applicable: color calibration. We compare several color normalization procedures to a
colorimetric calibration method proposed by Raymond L. Lee, Jr. By estimating the spectral reflectance of
object surfaces one obtain a colorimetrically correct image representation. The impact of color normalization on
the recognition rates is explored and is set in contrast to a calibration approach. Additionally our experiments
test several histogram distance measures for histogram based object recognition. We vary the number of bins, the order of two processing steps, and the dimensionality of color histograms to determine a most suitable parameter setting for object recognition.
Detection of the papilla region and vessel detection on images of the retina are problems that can be solved with pattern recognition techniques. Topographic images, as provided e.g. by the HRT device, as well as fundus images can be used as source for the detection. It is of diagnostic importance to separate vessels inside the papilla area from those outside this area. Therefore, detection of the papilla is important also for vessel segmentation. In this contribution we present state of the art methods for automatic disk segmentation and compare their results. Vessels detected with matched filters (wavelets, derivatives of the Gaussian, etc.) are shown as well as vessel segmentation using image morphology. We present our own method for vessel segmentation based on a special matched filter followed by image morphology. In this contribution we argue for a new matched filter that is suited for large vessels in HRT images.
Coronary vessel abnormalities can lead to insufficient blood circulation in the heart muscle. One way to control and detect distributions of this supply is the continuous observation of the vessel structure of the patient over a certain time.
In this paper we propose a reliable method for extracting the main vessels and most notably also fine ramifications in noisy angiographies with uneven background. We structured the extracted centerlines in a graph, obtaining thus information about the depth of branching-out and the number of visible vessels in the coronary-tree. These quantitative measurements serve as indicators to categorize the state of recovery of the patient and can be compared to earlier or later disease-stages. We evaluated our methods by comparing the results with hand-segmented images.
In this contribution we study the number of possible configurations up to discrete rotations of hyperedges built by using image neighborhood hypergraphs. Some results for texture classification are also presented.
In this contribution we present an interface for image processing algorithms that has been made recently available on the Internet (http://nibbler.uni-koblenz.de). First, we show its usefulness compared to some other existing products. After a description of its architecture, its main features are then presented: the particularity of the user management, its image database, its interface, and its original quarantine system. We finally present the result of an evaluation performed by students in image processing.
In computer vision several views exist how to solve vision problems. The first general methodology was introduced by Marr; he proposed a data-driven and straightforward analysis strategy. Nowadays the concept of active vision introduced by Aloimonos et al. becomes more and more important. In contrast to Marr's philosophy, active vision implies a feedback loop which consists of sensors and active components. In this paper we present a system for the identification of material faults under the surface of a test object. For that purpose the specimen is elastically deformed, then the deformation is made visible using holographic interferometry, and finally flaw parameters are estimated using a model-based approach to analyze interferograms. This is an underconstrained computer vision problem which is regularized using a priori knowledge and an active modification of the experimental setup. More mathematically, this vision task can be seen in the context of inverse problem theory. In this contribution we describe the system and point out how it is related to the methodologies named above. To illustrate the functionality of the system, results are shown from nondestructive testing of satellite fuel tanks.
In several applications of interferogram analysis, e.g. automated nondestructive testing, it is necessary to detect irregular interference phase distributions or to compare interference phase distributions with each other. For that purpose it is useful to represent the essential information of phase distributions by characteristic features. We propose features which can be extracted both from interferograms as well as from phase distributions. For feature extraction we developed new image processing methods analyzing the local structure of gray-level images. The feature extraction is demonstrated with examples of a cantilever beam and a pressure vessel using holographic interferometry. Finally we show the use of the features for defect detection and phase distribution comparison.
A new robust and fast method for non-interactive line segmentation of interferograms is proposed. Fringe contours are represented as a set of polygons using a new technique for contour approximation. The method has been developed for application in interferometry with continuously deforming objects. Its application to real-time holographic interferometry in nondestructive testing is shown.
Automatic reconstruction of occlusal surfaces of teeth is an application which might become more and more urgent due to the toxicity of amalgam. Modern dental chairside equipment is currently restricted to the production of inlays. The automatic reconstruction of the occlusal surface is presently not possible. For manufacturing an occlusal surface it is required to extract features from which it is possible to reconstruct destroyed teeth. In this paper, we demonstrate how intact upper molars can be automatically extracted in dental range and intensity images. After normalization of the 3D location, the sizes of the cusps are detected and the distances between them are calculated. In the presented approach, the detection of the upper molar is based on a knowledge-based segmentation which includes anatomic knowledge. After the segmentation of the interesting tooth the central fossa is calculated. The normalization of the spatial location is archieved by aligning the detected fossa with a reference axis. After searching the cusp tips in the range image the image is resized. The methods have been successfully tested on 60 images. The results have been compared with the results of a dentist's evaluation on a sample of 20 images. The results will be further used for automatic production of tooth inlays.
In this paper, we demonstrate how the watershed transform can be applied to series of thermal medical images to compute important features for physiological interpretation. Automatic physiological analysis of neural features can thereby be shown which was not possible otherwise. The transform as described in the literature has some minor algorithmic errors and inconsistencies which usually cause little trouble. These problems occur on flat plateaus where no unique watershed can be detected. After a short formal description of the transform we describe and eliminate these deficiencies and introduce a modified segmentation method which handles these plateaus as expected intuitively. In our particular medical applications, visible differences of the new segmentation with respect to the old one can be noticed. We contrast our results to those obtained by the detection of isothermic regions. Features of the segmented regions are evaluated as a function of time and used for medical and physiological interpretation. An outlook describes current research in sensor fusion of visual and thermal images for medical research.
We will present a new system for knowledge-based image analysis which exploits the benefits of object oriented programming. The knowledge base is built using a formal language for semantic nets, which are already successfully applied in an industrial project. The knowledge base is compiled into an image analysis program. Time consuming search in the knowledge base can thus be avoided. The system provides a general interface to pattern analysis techniques which are included in the generated program as specified by the user. The approach combines the advantages of a compiled program for a special purpose with the flexibility of a general knowledge base tool. The resulting program is used for the reconstruction of the 3D shape of industrial objects using stereo techniques. Images are taken from several viewpoints. Models of 3D objects are then created by an integration of the segmentation data.
A uniform interface for the data exchange between image segmentation and high-level image analysis is presented, termed here an 'iconic-symbolic interface'. The interface is specified as a class in an object-oriented programming environment. The term 'iconic processing' is contrasted to 'iconic data structures.' Symbolic processing is separated from iconic processing by the use of explicitly represented knowledge about the task domain. Many segmentation algorithms may be performed independent of the task domain. It is shown that the same holds for the recovery of depth and surface information by shape from shading or stereo and for the detection of motion. Several data structures for the representation of the results of segmentation are compared. The new class 'segmentation object' (i.e., the data structure and the required operations on it) is defined as a superset of the other proposed data structures. It allows for a uniform representation for 2-D and 3-D image segmentation and for motion detection. The interface to symbolic processing is defined by a machine-independent external representation of the segmentation object. Compactness is obtained by binary storage. International standardization of low-level image preprocessing and of an image interchange format is in process. A future standard can cooperate with the external representation of segmentation objects.