We report on the development of a software system to recognize and interpret printed music. The overall goal is to scan printed music sheets, analyze and recognize the notes, timing, and written text, and derive the all necessary information to use the computers MIDI sound system to play the music. This function is primarily useful for musicians who want to digitize printed music for editing purposes. There exist a number of commercial systems that offer such a functionality. However, on testing these systems, we were astonished on how weak they behave in their pattern recognition parts. Although we submitted very clear and rather flawless scanning input, none of these systems was able to e.g. recognize all notes, staff lines, and systems. They all require a high degree of interaction, post-processing, and editing to get a decent digital version of the hard copy material. In this paper we focus on the pattern recognition area. In a first approach we tested more or less standard methods of adaptive thresholding, blob detection, line detection, and corner detection to find the notes, staff lines, and candidate objects subject to OCR. Many of the objects on this type of material can be learned in a training phase. None of the commercial systems we saw offers the option to train special characters or unusual signatures. A second goal in this project is to use a modern software engineering platform. We were interested in how well Java and open source technologies are suitable for pattern recognition and machine vision. The scanning of music served as a case-study.
We describe a scanning system developed for the classification and grading of surfaces of wooden tiles. The system uses color imaging sensors to analyse the surfaces of either hard- or softwood material in terms of the texture formed by grain lines (orientation, spatial frequency, and color), various types of colorization, and other defects like knots, heart wood, cracks, holes, etc. The analysis requires two major tracks: the assignment of a tile to its texture class (like A, B, C, 1, 2, 3, Waste), and the detection of defects that decrease the commercial value of the tile (heart wood, knots, etc.). The system was initially developed under the international IMS program (Intelligent Manufacturing Systems) by an industry consortium. During the last two years it has been further developed, and several industrial systems have been installed, and are presently used in production of hardwood flooring. The methods implemented reflect some of the latest developments in the field of pattern recognition: genetic feature selection, two-dimensional second order statistics, special color space transforms, and classification by neural networks. In the industrial scenario we describe, many of the features defining a class cannot be described mathematically. Consequently a focus was the design of a learning architecture, where prototype texture samples are presented to the system, which then automatically finds the internal representation necessary for classification. The methods used in this approach have a wide applicability to problems of inspection, sorting, and optimization of high-value material typically used in the furniture, flooring, and related wood manufacturing industries.
In real-life soccer games it is sometimes very difficult for a referee to decide whether a goal has been achieved or not. This occurs for example, when the ball was caught close to the goal line. Then from the viewing position of the referee it may be entirely impossible to decide what fraction of the ball volume was behind the line. In such cases referee decisions will be rather arbitrary, and subject to scrutiny. We are working on the development of an image analysis system, which uses online monitoring of the goal area to continuously track the position of the ball. Underlying problems are: the estimation of an approximate position to limit the search space; discounting movements of the players that will cause occlusions; the real-time pattern matching which should result in close to frame-rate coordinate output; and geometric design criteria to facilitate calibration and scene-dependent adaptation of the system. In this paper we describe preliminary results of our work, which will be demonstrated using a lab mock-up of the problem. We will pinpoint present problems and a design route to implement a real-world system.
Proc. SPIE. 4572, Intelligent Robots and Computer Vision XX: Algorithms, Techniques, and Active Vision
KEYWORDS: Detection and tracking algorithms, Data modeling, Computing systems, 3D modeling, Feature extraction, Image registration, Computer vision technology, Machine vision, Systems modeling, 3D image processing
Pattern matching is an important task in industrial applications of computer vision. A variety of different approaches to the pattern matching problem has been proposed in the literature. In this paper we provide an experimental evaluation of three different matching algorithms, namely a feature consensus matching algorithm, an interpretation tree based algorithm and a commercially available pattern matching tool. The systems were chosen to cover a range of matching techniques as well as a large field of possible applications. Both artificially created test images and image sequences of real-world matching problems have been applied to indicate the range of feasible applications to any of the three matching system. The algorithms are investigated with respect to matching speed, accuracy and robustness.
We describe an application of color-based pattern matching, where a real-time vision system needs to detect and exactly localize textile patterns woven into carpet flooring material. These patterns are distributed on a large web in a periodic fashion. The task to be solved is recognition of these patterns by matching them with stored prototypes, computing the exact location and use this information to guide a cutting machine to produce perfect replica of desired tiles. The pattern matching part is challenging because of the presence of distortion, scaling, and rotation of the 2D patterns, and rather high demands on the localization accuracy. Also, the task needs to be solved under real-time constraints. We describe the building blocks used in our system. These are color-based segmentation of the patterns to achieve 2D representation in a graph-like manner, followed by graph-based matching. This block solved the graph-isomorphism problem in real-time tolerating distortions, additions, deletions, rotation, translation, and scale variations between the trained and tested versions of the patterns. We demonstrate the concept showing example images and matching results.
In this paper a new hierarchical structure for fast, robust geometry-based pattern matching is proposed. As opposed to many pattern matching systems reported in the literature we use a structure comprising a number of alternating feature processing and constraint layers. Feature extraction accuracy ranges from coarse at the bottom of the structure to a fine level at the top. A 2D quasi-affine matching system based on the proposed structure has been implemented. Experiments show the reduction in the amount of image data being processed in every layer of the structure as a consequence of applying constraints to the data between two adjacent feature extraction layers. The structure is able to utilize scalable feature extraction algorithms as well as the incorporation of a priori knowledge into the feature extraction.
We apply a model of texture segmentation using multiple spatially and spectrally localized filters, known as Gabor filters, to the analysis of texture and effect regions found on wooden boards. Specifically we present a method to find an optimal set of parameters for a given 2D object detection method. The method uses banks of Gabor filters to limit the rang of spatial frequencies, where mutually distinct textures differ significantly in their dominant characterizing frequencies. By encoding images into multiple narrow spatial frequency and orientation channels a local classification of texture regions can be achieved. Unlike other methods applying Gabor filters, we do not use a full Gabor transform, but use feature selection techniques to maximize discrimination. The selection method uses a genetic algorithm to optimize various parameters of the system including Gabor weights, and the parameters of morphological pre-processing. We demonstrate the applicability of the method to the task of classifying wooden textures, and report experimental results using the proposed method.
One approach to stereo matching is to use different local features to find correspondences. The selection of an optimum feature set is the content of this paper. An operational software tool based on the principle of comparing feature vectors is used for stereo matching. A relatively large set of different local features is sought for optimum combinations of 6 - 10 of them. This is done by a genetic process that uses an intrinsic quality criterion that evaluates the correctness of each individual match. The convergence of the genetic feature selection process is demonstrated on a real stereo pair of a tunnel surface. Four areas were used for individual optimization. After several hundred generations for each of the areas, it is shown that the identified feature sets result in a considerably better stereo matching result than the currently used features, which were the result of an initial manual choice. The experiments described in this paper use a `super-set' of 145 features for every pixel, which are created by filtering the image with convolution kernels (averaging, Gaussian filters, bandpass, highpass), median filters and Gabor kernels. From these 145 filters, the genetic feature selection process selects an optimal set of operators. Using the selected filters results in a 15% improvement of the matching accuracy and robustness.
This paper deals with the recovery of a scene from a pair of images, where each image is acquired from a different viewpoint. The central problem is the identification of corresponding points in all views. We use the feature-based approach to find corresponding points.Various types of features have been sued previously, where Gabor features showed significant advantages in terms of accuracy and the complexity/accuracy trade-off. The accuracy is measured as the rate of correctly associated pixels The accuracy measures are found by comparing the disparity maps produced by the matching program with the correct disparities. These correct disparities must be known, and are typically produced by expensive photogrammetric techniques. In this paper we show a method of gauging the performance of a stereo by expensive photogrammetric techniques. In this paper we show a method of gauging the performance of a stereo matcher without the necessity of such a reference disparity date set. We show that statistics on the back-matching distances can be used instead. these are a by- product of the matching process. This opens the door to extensive testing and optimization, since we no longer have to rely on the existence of the reference disparities.
This paper deals with the recovery of a scene from a pair of images, where each image is acquired from a different viewpoint. The central problem is the identification of corresponding points in all views. We use the feature-based approach to find corresponding points. Various types of features have been used previously, where Gabor features showed significant advantages in terms of accuracy and the complexity/accuracy trade-off. The accuracy is measured as the rate of correctly associated pixels. The matching process typically results in a certain number of ambiguous positions, where the best match found is not the desired match. The main contribution of this paper lies in the application of a genetic algorithm for feature selection. This method uses the previously illustrated fact that the amount of ambiguity in the matching process can be quantitatively measured via statistics on the back-matching distances. With this method, the quality of a matching result can be measured without reference disparity data (or ground truth). The fitness function required for the application of genetic feature optimization is defined using these back-matching statistics. The output of the genetic algorithm is an improved feature set, which contains fewer features as the initial set, but yields extremely improved accuracy. We show that the accuracy of the matching result can be much improved by our genetic optimization approach, and we describe the experiments illustrating the results.
We consider two problems: first, the problem of detection of objects in images of 3D planetary terrain; second, the task of finding corresponding points for stereo matching of this type of images. We propose an approach that is simultaneously applicable to both problem areas. The approach uses a bank of filters based on different 2D Gabor functions. By detection we mean locating multiple classes of targets with distortions present and in a cluttered background. It is desirable to minimize false alarms due to clutter, image noise, and the presence of other objects. In the scenario of stereo matching, the pixel location where we search for the corresponding point is the target, while all ambiguous matches are non-targets. In this work, we use Gabor filter banks in two versions. First, for fast detection of targets, the single filter outputs of the bank are fused by linear combination. Second, for stereo matching, the outputs of the filters form a feature vector used to find the best match. We refer to both types of filters as a macro Gabor filter. In the linear combination case, the filter bank forms a single filter. This filter is correlated with an input image, followed by local maximum detection, and thresholding to yield the finally detected targets. The new aspects are: combining real and imaginary parts of GFs into one filter using centered on off-center GFs, separately optimizing the fusion coefficients of the GFs by controlling the shape of the correlation outputs of each filter alone, and the application to two different scenarios.
This paper deals with the recovery of a scene from a pair of images, where each image is acquired from a different viewpoint. The central problem is the identification of corresponding points in all views. Basically two approaches have evolved: area-based methods, which employ local graylevel correlation techniques; and feature-based methods, which use preprocessing steps to extract local feature vectors and match these entities. Previous work has shown that feature-based methods have advantages both in terms of computational complexity, and accuracy. We extend these comparative studies, which had compared both philosophies, to a new type of feature extraction technique. This technique handles the correspondence problem by matching two sets of dense feature vectors, generated by GABOR filters. Gabor filters have been used previously for recognition of blob-type targets and texture classification. We show how the two techniques can be used as two independent sources to derive feature- vectors. Consequently, fusion of the two sources improves the accuracy of correspondence detection.
Planetary space exploration by unmanned missions strongly relies on automatic navigation methods. Computer vision has been recognized as a key to the feasibility of robust navigation, landing site identification and hazard avoidance. We present a scenario that uses computer vision methods for the early identification of landing spots, emphasizing the phase between ten kilometers from ground and the identification of the lander position relative to the selected landing site. The key element is a robust matching procedure between the elevation model (and imagery) acquired during orbit, and ground features observed during approach to the desired landing site. We describe how (1) preselection of characteristic landmarks reduces the computational efforts, and (2) a hierarchical data structure (pyramid) on graylevels and elevation models can be successfully combined to achieve a robust landing and navigation system. The behavior of such a system is demonstrated by simulation experiments carried out in a laboratory mock-up. This paper follows up previous work we have performed for the Mars mission scenario, and shows relevant changes that emerge in the Moon mission case of the European Space Agency.
This paper deals with stereo matching, which is reformulated as a statistical pattern recognition problem. In stereo, the computation of correspondences of image points in the right and left image is viewed as a two-class pattern recognition problem. The two matching left-right points are said to constitute class 1 (matching) and the points in the neighborhood of these points form class 2 (non-matching). We have argued before that matching can be drastically improved by using several features rather than just graylevels (usually called area- based matching) or edges (usually called edge-based matching). Based on this formulation of matching as a pattern recognition problem well-known theories to optimize feature extraction and feature selection should be applied to stereo as well. In the paper we show the results of experiments to support the statistical framework for stereo and how the performance of a stereo system can be improved by taking into account the findings of statistical pattern recognition.
Developers of machine vision systems for industrial applications are frequently exposed to the problem of proving to their customers that specified performance measures are met. A typical example would be the rate of correct classification in defect detection machines that usually will be in the range of 95 - 100%. We call such machines near perfect. In practice this figure is stated for the complete inspection decision, which in general is based on a number of subdecisions made by the machine. An example would be surface inspection in industrial production, where a workpiece will be rejected if one or several defects are detected. Let's assume that the probability of false classification for a single defect is p1. In the case where several defects may appear on the surface every defect contributes to the final decision, with the probability of a wrong decision p2. It would appear logical that p2 is larger than p1, because the more defects are found on the surface, the more likely the system would make a wrong decision (all the p1s for the single defects would add up). In this paper we show that although seeming paradox the reverse is true. We show that with estimates of p1, the joint decision can be optimized such that the actual error rate of the defect detection machine is less than p1. We also give practical recommendations on how to tune the pattern recognizers to achieve optimal performance.
This paper describes a method to classify the various patterns that make up the appearance of wooden surfaces. Such surfaces are characterized by their textural appearance as well as compact convex objects like knots, holes, resin, cracks, grain lines. Many approaches to describe such surfaces have been published in the past. The list includes, but is not limited to, Hough transform methods, 2D shape recognition, fuzzy set approaches for segmentation, hierarchical pattern recognition, associative memories, and so on. In the present paper we assume that a local textural representation is computed permitting the description of the graylevel image in terms of texture elements or symbols. Using the symbolic image it is shown, how segmentation into objects can be achieved, followed by the extraction the symbolic contour as a list of symbols. Every object is described by a list of symbols to be classified using the syntactic pattern recognition. Each class of objects is described by a formal language, and parsing each string, a classification can be obtained from the grammar that causes the least amount of parsing errors. We describe details of the system, including how symbolic descriptions can be obtained, and the implementation of Earley's parser on a parallel computer architecture.
3D reconstruction of highly textured surfaces like those found in roads, as well as unvegetated (rock-like) terrain is of major interest for applications like autonomous navigation, or the 3D modeling of terrain for mapping purposes. We describe a system for automatic modeling of such scenes. It is based on two frame CCD cameras, which are tightly attached to eachother to ensure constant relative orientation. One camera is used for the acquisition of photogrammetrically measure reference points, the other records the surface images. The system is moved from the first position to the next by an operator carrying it. Automatic calibration using the images acquired by the calibration camera permits the computation of exterior orientation parameters of the surface camera. A fast matching method providing dense disparities together with a robust reconstruction algorithm renders an accurate grid of 3D points. We also describe procedures to merge stereo reconstruction results from all images taken, and report on accuracy, computational complexity, and practical experience in a road engineering application.
This paper presents an overview of past and current activities in the field of computer vision at Joanneum Research. Joanneum Research is a non-profit research organization covering most of its revenues from contract research with industrial partners and international agencies in the area of space research (e.g., the European Space Agency). The focus on industrial real-time image processing started with the development of a 2D inspection system for wooden surfaces, and has expanded to the development of a 3D vision system for spacecraft navigation and elevation modeling. In this presentation the major projects in this area of research are outlined.
Planetary space exploration by unmanned missions strongly relies on automatic navigation. Computer vision has been recognized as a key to the feasibility of robust navigation, landing site identification and hazard avoidance. We are studying a scenario, where remote sensing methods from the orbit around the planet are used to preselect a landing site. The accuracy of the atmospheric entry is restricted by various parameters. One area of uncertainty results from inexact estimation of the landing position. The touch down point must be located an elliptic image area which is called the `ellipsis of uncertainty'. During landing, the early recognition of the preselected landing site in this image is an important factor. It improves the probability for a successful touchdown, since it allows real-time corrections of the trajectory to reach the planned touch down spot. We present a scenario that uses computer vision methods for this early identification emphasizing the phase between ten kilometers from ground and the identification of the lander position relative to the selected landing site.
This paper describes a system for real-time inspection of 2D surfaces. It was initially planned as system for classification of wooden surfaces, but was successfully used also in the context of other inspection tasks like metallic surface inspection and leather inspection. The system has two major modules. One is a 2D object segmentation and recognition part, where key elements of the underlying elements have been published before. This includes hierarchical processing of the incoming gray-level images leading to a symbolic description of the surface; syntactic segmentation; and the decision network methodology used. Beyond these features, a new track has been added, which is entirely devoted to texture classification in real-time. This two-way analysis of wooden surfaces was first implemented on a heterogeneous architecture containing Zoran vector processors and Transputers (all commercially available). The current version uses only TMS32C40 processors. The system has been successfully implemented in a production plant in Austria. We describe major elements of the system and the underlying algorithms.
This paper deals with the problem of camera calibration based on 3D feature measurements. It occurs in industrial 3D measurement systems, as well as in autonomous navigation systems, where the estimation of motion parameters is required. We have selected the problem of extrinsic calibration (exterior orientation) of a camera that is looking at flat or almost flat surfaces (or terrain). This situation causes numerical and stability problems to many of the known calibration methods. To study the impact of flatness of the reference surface (or calibration target) on the calibration errors we have done a comparative study using sixteen available calibration procedures. The major emphasis was on robustness with respect to 3D measurement errors and sensitivity to flatness. A new calibration method is also investigated, which can be used independently of whether the calibration reference surface is flat, almost flat, or rugged.
This paper describes the key elements of a system for detecting quality defects on leather surfaces. The inspection task must treat defects like scars, mite nests, warts, open fissures, healed scars, holes, pin holes, and fat folds. The industrial detection of these defects is difficult because of the large dimensions of the leather hides (2 m X 3 m), and the small dimensions of the defects (150 micrometers X 150 micrometers ). Pattern recognition approaches suffer from the fact that defects are hidden on an irregularly textured background, and can be hardly seen visually by human graders. We describe the methods tested for automatic classification using image processing, which include preprocessing, local feature description of texture elements, and final segmentation and grading of defects. We conclude with a statistical evaluation of the recognition error rate, and an outlook on the expected industrial performance.
An image acquisition and processing algorithm for inspection of tire treads has been developed. The tire treads are flat strips of black rubber material used as the main component in retreading automobile tires. These treads have a complex molded design on one side (DESIGN SIDE) and a flat surface on the other side. The inspection of the Design Side of the tread is one of the key operations in the tread fabrication process impacting quality and consistency of the final product. This paper will discuss development of the main optical inspection algorithms utilized in the system design. The algorithms described in this paper were tested in the laboratory prototype of the inspection system and will be implemented in the final production system.
This paper summarizes the results we have obtained in searching for an efficient, robust, but accurate method for the estimation of the camera (or spacecraft) position based on image measurements. Based on a sequence of images acquired with a moving camera, the task is to estimate the camera position (extrinsic calibration) from corresponding points (landmarks) in the various frames. Using the noisy estimate of the camera parameters, the 3D scene points can be reconstructed. In the paper we first describe a new calibration method, and show the improvements in terms of accuracy compared to known methods. The method is studied in an application to motion estimation for a spacecraft during the orbit and landing maneuvers.
This paper deals with a two-step segmentation algorithm for 2-D convex objects. First the objects are approximated by an elliptic shape description, and then the boundary of the object is refined using dynamic programming. The reason for refinement is accurate shape classification.
This paper describes a computer vision system for the high-precision inspection of bearing shells. We have developed algorithms to solve the problem of inspecting the wearing surfaces of sputter-coated metal shells for surface defects (high spots, cavities, blisters, grooves, and pores). The quality goal to be achieved was 0.3 m2/h, which for a typical 90 mm bearing shell being measured would mean about 0.5 minutes/shell. The resolution to be achieved was of each pixel covering an area of 24 micrometers by 24 micrometers . The analysis method was based on gray-scale rather than a binary algorithm. The quality standards were those defined by the Motoren and Turbinen- Union GmbH, Germany, and Daimler-Benz AG.
A novel technique for automatic elevation model computation is introduced. The technique employs a binocular camera system and an algorithm termed hierarchical feature vector matching to derive an elevation model, as well as to compute the interframe correspondences for tracking. It is argued that this algorithm unifies the procedures of range estimation (i.e., stereo correspondence), tracking (i.e., interframe correspondence), and recognition (input/model correspondence). This technique is demonstrated using a physical model of the Mars surface and a binocular camera system with seven geometrical degrees of freedom. This system provides a tool to generate realistic test imagery for the mock-up of a spacecraft approaching the landing site. The trajectory of the spacecraft can be predefined and is controlled by a computer interfaced to seven motorized positioners. Several experiments defined to estimate the accuracy of the computer vision system are reported.
In this paper we describe the major aspects in our transputer-based automatic vision system aiming to implement a scaleable and easily reconfigurable system. In this system the mapping of image processing and recognition algorithms to the hardware is facilitated by automatic code generation schemes, separating methodic design and implementation details.
An application of the theory of fuzzy sets to detect and measure convex objects in an image is described. Geometric measurements involving the concept of the perimeter of a fuzzy set are compared to measurements using moment parameters of the membership function. The concept of the perimeter of fuzzy sets offers a way to take geometric measurements from a scene without having to segment it. A method to compute the perimeter of a convex fuzzy set was proposed by Rosenfeld . For the special case of elliptically shaped convex objects an alternative formula is proposed. In this method the fuzzy set is approximated by a crisp set of elliptic shape which has same area and second order moments. The computation of the membership function plays a key role in this theory. We use a fuzzy c-means clustering algorithm to compute the membership function. The method is tested on real images.
In this paper the design of a prototype system for real-time classification of wooden profiled boards is described. The presentation gives an overview of the algorithms and hardware developed to achieve classification in real-time at a data rate of 4Mpixel/sec. The system achieves its performance by a hierarchical processing strategy where the intensity information in the digital image is transformed into a symbolic description of small texture elements. Based on this symbolic representation a syntactic segmentation scheme is applied which produces a list of objects that are present on the board surface. The objects are described by feature vectors containing both numeric structural texture- and shape-related properties. A graph-like decision network is then used to recognize the various defects. The classification procedures were extensively tested for spruce boards on a large data set containing 500 boards taken from the production line at random. The overall rate of correct classification was 95 on this data set. The structure of these algorithms is reflected in the hardware design. We use a multiprocessor system where each processor is specialized to solve a specific task in the recognition hierarchy.