Many approaches have been suggested for automatic pedestrian and car detection to cope with the large variability regarding size, occlusion, background variability, and aspect. Current deep learning-based frameworks rely either on a proposal generation mechanism (e.g., “faster R-CNN”) or on inspection of image quadrants/octants (e.g., “YOLO”), which are then further processed with deep convolutional neural networks (CNN). We analyze the discriminative generalized Hough transform (DGHT), which operates on edge images, for pedestrian and car detection. The analysis motivates one to use the DGHT as an efficient proposal generation mechanism, followed by a proposal (bounding box) refinement and proposal acceptance or rejection based on a deep CNN. We analyze in detail the different components of our pipeline. Due to the low false negative rate and the low number of candidates of the DGHT as well as the high accuracy of the CNN, we obtain competitive performance to the state of the art in pedestrian and car detection on the IAIR database with much less generated proposals than other proposal-generating algorithms, being outperformed only by YOLOv2 fine-tuned to IAIR cars. By evaluations on further databases (without retraining or adaptation), we show the generalization capability of our pipeline.
This paper presents a general framework for object localization in medical (and non-medical) images. In particular, we focus on objects of well-defined shape, like epiphyseal regions in hand-radiographs, which are localized based on a voting framework using the Generalized Hough Transform (GHT). We suggest to combine the GHT voting with a classifier which rates the voting characteristics of the GHT model at individual Hough cells. Specifically, a Random Forest Classifier rates whether the model points, voting for an object position, constitute a regular shape or not, and this measure is combined with the GHT votes. With this technique, we achieve a success rate of 99.4% for localizing 12 epiphyseal regions of interest in 412 hand- radiographs. The mean error is 6.6 pixels on images with a mean resolution of 1185×2006 pixels. Furthermore, we analyze the influence of the radius of the local neighborhood which is considered in analyzing the voting characteristics of a Hough cell.
Bone age assessment on hand radiographs is a frequently and time consuming task to determine growth disturbances in human body. Recently, an automatic processing pipeline, combining content-based image retrieval and support vector regression (SVR), has been developed. This approach was evaluated based on 1,097 radiographs from the University of Southern California. Discretization of SVR continuous prediction to age classes has been done by (i) truncation. In this paper, we apply novel approaches in mapping of SVR continuous output values: (ii) rounding, where 0.5 is added to the values before truncation; (iii) curve, where a linear mapping curve is applied between the age classes, and (iv) age, where artificial age classes are not used at all. We evaluate these methods on the age range of 0-18 years, and 2-17 years for comparison with the commercial product BoneXpert that is using an active shape approach. Our methods reach root-mean-square (RMS) errors of 0.80, 0.76 and 0.73 years, respectively, which is slightly below the performance of the BoneXpert.
In this work, we present a new type of model for object localization, which is well suited for anatomical objects
exhibiting large variability in size, shape and posture, for usage in the discriminative generalized Hough transform
(DGHT). The DGHT combines the generalized Hough transform (GHT) with a discriminative training approach
to automatically obtain robust and efficient models. It has been shown to be a strong tool for object localization
capable of handling a rather large amount of shape variability. For some tasks, however, the variability exhibited
by different occurrences of the target object becomes too large to be represented by a standard DGHT model. To
be able to capture such highly variable objects, several sub-models, representing the modes of variability as seen by
the DGHT, are created automatically and are arranged in a higher dimensional model. The modes of variability
are identified on-the-fly during training in an unsupervised manner. Following the concept of the DGHT, the
sub-models are jointly trained with respect to a minimal localization error employing the discriminative training
approach. The procedure is tested on a dataset of thorax radiographs with the target to localize the clavicles.
Due to different arm positions, the posture and arrangement of the target and surrounding bones differs strongly,
which hampers the training of a good localization model. Employing the new model approach the localization
rate improves by 13% on unseen test data compared to the standard model.
An automatic algorithm for training of suitable models for the Generalized Hough Transform (GHT) is presented. The applied iterative approach learns the shape of the target object directly from training images and incorporates variability in pose and scale of the target object exhibited in the images. To make the model more robust and representative for the target object, an individual weight is estimated for each model point using a discriminative approach. These weights will be employed in the voting procedure of the GHT, increasing the impact of important points on the localization result. The proposed procedure is extended here with a new error measure and a revised point weight training to enable the generation of models representing several target objects. Common parts of the target objects will thereby obtain larger weights, while the model might also contain object specific model points, if necessary, to be representative for all targets.
The method is applied here to the localization of knee joints in long-leg radiographs. A quantitative comparison of the new approach with the separate localization of right and left knee showed improved results concerning localization precision and performance.
Segmentation of organs in medical images can be successfully performed with deformable models. Most approaches
combine a boundary detection step with some smoothness or shape constraint. An objective function
for the model deformation is thus established from two terms: the first one attracts the surface model to the
detected boundaries while the second one keeps the surface smooth or close to expected shapes.
In this work, we assign locally varying boundary detection functions to all parts of the surface model. These
functions combine an edge detector with local image analysis in order to accept or reject possible edge candidates.
The goal is to <i>optimize the discrimination </i>between the wanted and misleading boundaries. We present a method
to <i>automatically learn </i>from a representative set of 3D training images which features are optimal at each position
of the surface model. The basic idea is to simulate the boundary detection for the given 3D images and to select
those features that minimize the distance between the detected position and the desired object boundary.
The approach is experimentally evaluated for the complex task of full-heart segmentation in CT images. A
cyclic cross-evaluation on 25 cardiac CT images shows that the optimized feature training and selection enables
robust, <i>fully automatic heart segmentation </i>with a mean error well below 1 mm. Comparing this approach to
simpler training schemes that use the same basic formalism to accept or reject edges shows the importance of
the discriminative optimization.
An automatic procedure for detecting and segmenting anatomical objects in 3-D images is necessary for achieving a high level of automation in many medical applications. Since today's segmentation techniques typically rely on user input for initialization, they do not allow for a fully automatic workflow. In this work, the generalized Hough transform is used for detecting anatomical objects with well defined shape in 3-D medical images. This well-known technique has frequently been used for object detection in 2-D images and is known to be robust and reliable. However, its computational and memory requirements are generally huge, especially in case of considering 3-D images and various free transformation parameters. Our approach limits the complexity of the generalized Hough transform to a reasonable amount by (1) using object prior knowledge during the preprocessing in order to suppress unlikely regions in the image, (2) restricting the flexibility of the applied transformation to only scaling and translation, and (3) using a simple shape model which does not cover any inter-individual shape variability. Despite these limitations, the approach is demonstrated to allow for a coarse 3-D delineation of the femur, vertebra and heart in a number of experiments. Additionally it is shown that the quality of the object localization is in nearly all cases sufficient to initialize a successful segmentation using shape constrained deformable models.