We propose a means of objectively comparing the results of digital image inpainting algorithms by analyzing changes in predicted human attention prior to and following application. Artifacting is generalized in two catagories, in-region and out-region, depending on whether or not attention changes are primarily within the edited region or in nearby (contrasting) regions. Human qualitative scores are shown to correlate strongly with numerical scores of in-region and out-region artifacting, including the effectiveness of training supervised classifiers of increasing complexity. Results are shown on two novel human-scored datasets.
Quantitative metrics for successful image inpainting currently do not exist, with researchers instead relying upon
qualitative human comparisons to evaluate their methodologies and techniques. In an attempt to rectify this situation, we
propose two new metrics to capture the notions of noticeability and visual intent in order to evaluate inpainting results.
The proposed metrics use a quantitative measure of visual salience based upon a computational model of human visual
attention. We demonstrate how these two metrics repeatably correlate with qualitative opinion in a human observer
study, correctly identify the optimum uses for exemplar-based inpainting (as specified in the original publication), and
match qualitative opinion in published examples.
We propose a method for efficiently determining qualitative depth maps for multiple monoscopic videos of the same scene
without explicitly solving for stereo or calibrating any of the cameras involved. By tracking a small number of feature points
and determining trajectory correspondence, it is possible to determine correct temporal alignment as well as establish a
similarity metric for fundamental matrices relating each trajectory. Modeling of matrix relations with a weighted digraph
and performing Markov clustering results in a determination of emergent depth layers for feature points. Finally, pixels
are segmented into depth layers based upon motion similarity to feature point trajectories. Initial experimental results are
demonstrated on stereo benchmark and consumer data.
We have developed an active shape model (ASM)-based segmentation scheme that uses the original Cootes et al. formulation for the underlying mechanics of the ASM but improves the model by fixating selected nodes at specific structural boundaries called transitional landmarks. Transitional landmarks identify the change from one boundary type (such as lung-field/heart) to another (lung-field/diaphragm). This results in a multi-segmented lung-field boundary where each segment correlates to a specific boundary type (lung-field/heart, lung-field/aorta, lung-field/rib-cage, etc.). The node-specified ASM is built using a fixed set of equally spaced feature nodes for each boundary segment. This allows the nodes to learn local appearance models for a specific boundary type, rather than generalizing over multiple boundary types, which results in a marked improvement in boundary accuracy. In contrast, existing lung-field segmentation algorithms based only on ASM simply space the nodes equally along the entire boundary without specification. We have performed extensive experiments using multiple datasets (public and private) and compared the performance of the proposed scheme with other contour-based methods. Overall, the improved accuracy is 3-5 &percent; over the standard ASM and, more importantly, it corresponds to increased alignment with salient anatomical structures. Furthermore, the automatically generated lung-field masks lead to the same fROC for lung-nodule detection as hand-drawn lung-field masks. The accurate landmarks can be easily used for detecting other structures in the lung field. Based on the related landmarks (mediastinum-heart transition, heart-diaphragm transition), we have extended the work to heart segmentation.
We present a method to incorporate nonlinear shape prior constraints into segmenting different anatomical structures in medical images. Kernel space density estimation (KSDE) is used to derive the nonlinear shape statistics and enable building a single model for a class of objects with nonlinearly varying shapes. The object contour is coerced by image-based energy into the correct shape sub-distribution (e.g., left or right lung), <i>without</i> the need for model selection. In contrast to an earlier algorithm that uses a local gradient-descent search (susceptible to local minima), we propose an algorithm that iterates between dynamic programming (DP) and shape regularization. DP is capable of finding an optimal contour in the search space that maximizes a cost function related to the difference between the interior and exterior of the object. To enforce the nonlinear shape prior, we propose two shape regularization methods, global and local regularization. Global regularization is applied after each DP search to move the entire shape vector in the shape space in a gradient descent fashion to the position of probable shapes learned from training. The regularized shape is used as the starting shape for the next iteration. Local regularization is accomplished through modifying the search space of the DP. The modified search space only allows a certain amount of deformation of the local shape from the starting shape. Both regularization methods ensure the consistency between the resulted shape with the training shapes, while still preserving DP’s ability to search over a large range and avoid local minima. Our algorithm was applied to two different segmentation tasks for radiographic images: lung field and clavicle segmentation. Both applications have shown that our method is effective and versatile in segmenting various anatomical structures under prior shape constraints; and it is robust to noise and local minima caused by clutter (e.g., blood vessels) and other similar structures (e.g., ribs). We believe that the proposed algorithm represents a major step in the paradigm shift to object segmentation under nonlinear shape constraints.
This paper presents a psychophysical study on the perception of image orientation. Some natural images are extremely difficult even for humans to orient correctly or may not even have a "correct" orientation; the study provides an upper bound for the performance of an automatic system. Discrepant detection rates based on only low-level cues have been reported, ranging from exceptionally high in earlier work to more reasonable in recent work. This study allows us to put the reported results in the correct perspective. In addition, the use of a large, carefully chosen image set that spans the "photo space" (in terms of occasions and subject matter) and extensive interaction with the human observers should reveal cues used by humans at various image resolutions. These can be used to design a robust automatic algorithm for orientation detection.
A collection of 1000 images (mix of professional photos and consumer snapshots) is used in this study. Each image is examined by at least five observers and shown at varying resolutions. Object recognition is expected to be more difficult (impossible for some images) at the lowest resolution and easier as the resolution increases. At each resolution, observers are asked to indicate the image orientation, the level of confidence, and the cues they used to make the decision. This study suggests that for typical images, the upper bound on accuracy is close to 98% when using <i>all</i> available semantic cues from high-resolution images and 84% if only low-level vision features and <i>coarse</i> semantics from thumbnails are used. The study also shows that sky and people are the most useful and reliable among a number of important semantic cues.
Sky is among the semantic object classes frequently seen in photographs and useful for image understanding, processing, and retrieval. We propose a novel hybrid approach to sky detection; based on color and texture classification, region extraction, and physics motivated sky signature validation. Sky can be of many different types; clear blue sky, cloudy/overcast sky, mixed sky, and twilight sky, etc. A single model cannot correctly characterize all the various types of skies due to the large difference in physics and appearance associated with different sky types. We have developed a set of physics-motivated sky models to identify clear blue-sky regions and cloudy/overcast sky regions. An exemplar-based approach is to generate the initial set of candidate sky regions. Another data-derived model is subsequently used to combine the results for different sky types to form a more complete sky map. Extensive testing using more than 3000 (randomly oriented) natural images shows that our comprehensive sky detector is able to accurately recall approximately 96% of all sky regions in the image set, with a precision of about 92%. Assuming correct image orientation, the precision on the same set of images increases to about 96%.
We present a computational approach to main subject detection, which provides a measure of saliency or importance for different regions that are associated with different subjects in an image with unconstrained scene content. It is built primarily upon selected image semantics, with low-level vision feature also contributing to the decision. The algorithm consists of region segmentation, perceptual grouping, feature extraction, and probabilistic reasoning. To accommodate the inherent ambiguity in the problem as reflected by the ground truth, we have developed a novel training mechanism for Bayes nets- based on fractional frequency counting. Using a set of images spanning the 'photo space', experimental results have shown the promise of our approach in that most of the regions that independent observers ranked as the main subject are also labeled as such by our system. In addition, our approach lends itself to performance scalable configurations within the Bayes net-based framework. Different applications have different degrees of tolerance to performance degradation and sped aggravation; computing a full set of features may be not practical for time- critical applications. We have designed the algorithm to run under three configurations, without reorganization or retraining of the network.
Autonomous mobile robots rely on multiple sensors to perform a varied number of tasks in a given environment. Different tasks may need different sensors to estimate different subsets of world state. Also, different sensors can cooperate in discovering common subsets of world state. This paper presents a new approach to multimodal sensor fusion using dynamic Bayesian networks and an occupancy grid. The environment in which the robot operates is represented with an occupancy grid. This occupancy grid is asynchronously updated using probabilistic data obtained from multiple sensors and combined using Bayesian networks. Each cell in the occupancy grid stores multiple probability density functions representing combined evidence for the identity, location and properties of objects in the world. The occupancy grid also contains probabilistic representations for moving objects. Bayes nets allow information from one modality to provide cues for interpreting the output of sensors in other modalities. Establishing correlations or associations between sensor readings or interpretations leads to learning the conditional relationships between them. Thus bottoms-up, reflexive, or even accidentally-obtained information can provide tops-down cues for other sensing strategies. We present early results obtained for a mobile robot navigation task.