We describe how to teach deformable models (snakes) to find object boundaries based on user-specified criteria, and we present a method for evaluating which criteria work best. These methods prove indispensable in abdominal CT images. Further work is needed in heart ultrasound images. The methods apply in any domain with consistent image conditions characterizing object boundaries, for which automated identification is nontrivial, perhaps due to interfering detail. A traditional strongest-edge-seeking snake fails to find an object's boundary when the strongest nearby image edges are not the ones sought. But we show how to instead learn, from training data, the relation between a shape and any image feature, as the probability distribution (PDF) of a function of image and shape. An important but neglected task has always been to select image qualities to guide a model. Because success depends on the relation of objective function (PDF) output to shape correctness, it is evaluated using a sampling of ground truth, a random model of the range of shapes tried during optimization, and a measure of shape closeness. The test results are evaluated for incidence of 'false positives' (scoring better than ground truth) versus incorrectness, and for the objective function's monotonicity with respect to incorrectness. Monotonicity is measured using correlation coefficient and using the newly introduced distance from closest increasing function. Domain-dependent choices must be tested. We analyze several Gaussian models fitting image intensity and perpendicular gradient at the object boundary, as well as the traditional sum of gradient magnitudes. The latter model is found inadequate in our domains; some of the former succeed.
In this paper we propose and demonstrate a simple method for navigation in a large unstructured environment that contains featureless objects, using 'isolated' landmarks in the navigator's view. The map-maker and the navigator are implemented using an IBM 7575 SCARA robot arm, PIPE (Pipelined Image Processing Engine), and two cameras. The navigational environment consists of a flat plane with spherical objects populated randomly on it. First, the map-maker model observes the environment, and given a starting position and a goal position, it generates a 'custom map' that describes how to get from the starting position to the goal position. The accuracy and the efficiency of the directional instructions are then demonstrated by the navigator by following the commands in the custom map. This is a first step towards our eventual goal, which is to develop a full set of vocabulary that can qualitatively describe the navigational environment to guide the navigator with efficiency and accuracy.
This paper presents a model for flexible extruded objects such as wires, tubes, or grommets, and demonstrates a novel, self-adjusting seven-dimensional Hough transform that derives and analyzes their three-space curved axes from position and surface normal information. The method is purely local and is very cheap to compute. The model considers such objects as piecewise toroidal, and decomposes the seven parameters of a torus into three nested subspaces, the structure of which counteracts the errors implicit in the analysis of objects of great size and/or small curvature. It is the first example of a parameter space structure designed to cluster ill-conditioned hypotheses together so that they can be easily detected and ignored. This work complements existing shape-from-contour approaches for analyzing tori: it uses no edge information, and it does not require the solution of high-degree non-linear equations by iterative techniques. Most of the results including the conditions for the existence of more that one solution (phantom 'anti-tori'), have been verified using a symbolic mathematical analysis system. This paper presents, in the environment of the IBM ConVEx system, robust results on both synthetic CAD-CAM range data (the hasp of a lock), and actual range data (a knotted piece of coaxial cable), and discusses several system tuning issues.
We describe the research activities of the Exploratory Computer Vision Group at the IBM Thomas J. Watson Research Center; this is a follow-up of the work reported previously.6
The focus of the ongoing work is the development of an experimental vision system for recognition of 3D objects. The thrust of the development of the vision system is to investigate techniques that may lead to a system that scales with the size of the problem; here, by the size of the problem, we mean the complexity of the scene - the number of object in the scene - and, the number of objects in the database - i.e., the number of objects that the system can recognize.
Fusion is a recurring theme in our research. E.g., fusion of evidence about different features extracted from the data; fusion of information obtained at different points in the image; fusion of information extracted from high and low-resolution images. Therefore, rather than focussing on a particular aspect of our work, we present an overview of the work.
In this paper we formalize and implement a model of topological visual navigation in two-dimensional spaces. Unlike much of traditional quantitative visual navigation the emphasis throughout is on the methods and the efficiency of qualitative visual descriptions of objects and environments and on the methods and the efficiency of direction-giving by means of visual landmarks. We formalize three domainsthe world itself the map-maker''s view of it and the navigator''s experience of itand the concepts of custom maps and landmarks. We specify for a simplified navigator (the " level helicopter" ) the several ways in which visual landmarks can be chosen depending on which of several costs (sensor distance or communication) should be minimized. We show that paths minimizing one measure can make others arbitrarily complex the algorithm for selecting the path is based on a form of Dijkstra''s algorithm and therefore automatically generates intelligent navigator overshooting and backtracking. We implement using an armheld camera such a navigator and detail its basic seek-and-adjust behaviors as it follows visual highways (or departs from them) to reach a goal. Seeking is based on topology and adjusting is based on symmetry there are essentially no quantitative measures. We describe under what circumstances its environment is visually difficult and perceptively shadowed and describe how errors in path-following impact landmark selection. Since visual landmark selection and direction-giving are in general NP-complete and rely on the nearly intractable
Conference Committee Involvement (5)
Multimedia Content Access: Algorithms and Systems IV
21 January 2010 | San Jose, California, United States
Multimedia Content Access: Algorithms and Systems III
21 January 2009 | San Jose, California, United States
Multimedia Content Access: Algorithms and Systems II
30 January 2008 | San Jose, California, United States
Multimedia Content Access: Algorithms and Systems
31 January 2007 | San Jose, CA, United States
Multimedia Content Analysis, Management, and Retrieval 2006
18 January 2006 | San Jose, California, United States