The digitization of the 3D shape of real objects is a rapidly expanding discipline, with a wide variety of applications, including shape acquisition, inspection, reverse engineering, gauging and robot navigation. Developments in computer product design techniques, automated production, and the need for close manufacturing tolerances will be facts of life for the foreseeable future. A growing need exists for fast, accurate, portable, non-contact 3D sensors. However, in order for 3D scanning to become more commonplace, new methods are needed for easily, quickly and robustly acquiring accurate full geometric models of complex objects using low cost technology. In this paper, a brief survey is presented of current scanning technologies available for acquiring range data. An overview is provided of current 3D-shape acquisition using both active and passive vision techniques. Each technique is explained in terms of its configuration, principle of operation, and the inherent advantages and limitations. A separate section then focuses on the implications of scannerless scanning for hand held technology, after which the current status of 3D acquisition using handheld technology, together with related issues concerning implementation, is considered more fully. Finally, conclusions for further developments in handheld devices are discussed. This paper may be of particular benefit to new comers in this field.
Many image processing systems have real-time performance
constraints. Systems implemented on general purpose processors
maximize performance by keeping busy the small fixed number of
available functional units such as adders and multipliers. In this
paper we investigate the use of programmable logic devices to
accelerate the execution of an application. Field Programmable Gate
Arrays (FPGAs) can be programmed to generate application specific
logic that alters the balance and type(s) of functional units to match application characteristics. In this paper we introduce a
correction of geometric image distortion application. Real number support is a requirement in most image processing applications. We examine the suitability of fixed point, floating-point and logarithmic number systems for an FPGA implementation of this image processing application. Performance results are presented in terms of: (1) execution time, and (2) FPGA logic resource requirements.
This paper presents a new device and technique for computing the stereo disparity of two binocular optical images using the data from a single sensor. The device, comprising a mirror and beamsplitter, superimposes the two views onto a single sensor to produce a single additive composite image. Local (i.e. windowed) Fourier analysis of this composite image yields the phase difference between the two component images and, thereby, the stereo disparity. The primary advantages of this approach are that it allows existing monocular cameras (digital or analogue, interlaced or non-interlaced) to be converted to stereo at relatively little cost and effort. Results are presented for both simulated images and images acquired with a prototype single-sensor stereo camera. As currently conceived, the approach would probably not be appropriate as a general-purpose technique for the computation of detailed structure of the environment -- and it certainly won't supplant existing multi-camera stereo systems for complex problems -- but it is suitable for simple stereo-based applications, such as obstacle avoidance and segmentation.
Intelligent based systems using methods such as local search meta-heuristic techniques, neural networks, genetic algorithms, genetic programming etc... are applied to many diverse application areas such as image processing and reconfigurable computing. Though great success has been gained by their application in such areas only recently has work been undertaken on their application to the area of reconfigurable hardware. The research presented here intends to use the strengths of these systems both to schedule work on an architecture as well as automatically designing architectures for optimum processing capability. An Intelligent Technique (IT) is used to automatically reconfigure the proposed Systolic Architectures (SA) for the implementation of matrix-based algorithms while a Heuristic Approach (HA) is used to optimize the implementation of the proposed designs on Field Programmable Gate Arrays (FPGAs).
A novel CCD has been commercially produced by Marconi Applied Technology, UK under the trade name of L3Vision, and by Texas Instruments, USA under the trade name Impactron, both of which incorporate an all solid-state electron multiplying structure based on the Impact Ionisation phenomenon in silicon. This technology combines the single photon detection sensitivity of ICCDs with the inherent advantages of CCDs. Here we review the electron multiplying CCD (EMCCD) technology and compare it with scientific ICCDs. In particular we look at the effect of the Excess Noise Factors on the respective S/N performances. We compare QEs, spatial resolution, darksignal, EBI and Clock Induced Charge (CIC), with the latter two as the ultimate limitations on sensitivity. We conclude that the electron multiplying CCD is a credible alternative to ICCDs in all non-gated applications.
This paper develops techniques for the implementation of motion estimation. Gradient-based approaches compute the spatio-temporal derivatives, differentiating the image with respect to time and thus computing the optical flow field. Horn and Schunck's method in particular is considered a benchmarking algorithm of gradient-based differential methods, useful and powerful, yet simple and fast. They formulated an optical flow constraint equation from which to compute optical flow which cannot fully determine the flow but can give the component of the flow in the direction of the intensity gradient. An additional constraint must be imposed, introducing a supplementary assumption to ensure a smooth variation in the flow across the image. The brightness derivatives involved in the equation system were estimated by Horn and Schunck using first differences averaging.
Gradient-based methods for optical flow computation can suffer from unreliability of the image flow constraint equation in areas of an image where local brightness function is non-linear or where there are rapid spatial or temporal changes in the intensity function. Little and Verri suggested regularization to help the numerical stability of the solution. Usually this takes the form of smoothing of the function or surface by convolving before the derivative is taken. The method proposed is a finite element method, based on a triangular mesh, in which diffusion is added into the system of equations to perform a type of smoothing while also retrieving the velocity. Quantitative and qualitative results are presented for real and synthetic images.
This paper introduces image enhancement techniques specifically intended to support the visualization of weak image features. Two distinct strategies are presented, which are referred to as weak feature emphasis and weak feature extraction, respectively. The first emphasizes weakly visible features while simultaneously de-emphasizing stronger features. The second goes further and seeks to extract weakly visible features completely. Both techniques seek to avoid the amplification of noise. The techniques are based on a wavelet foveation strategy which amplifies low magnitude wavelet coefficients and either attenuates or completely removes higher magnitude wavelet coefficients. Fundamental changes in the visible appearance of images can be achieved with these methods.
The problem of scale is of fundamental interest in image processing, as the features that we visually perceive and find meaningful vary significantly depending on their size and extent. It is well known that the strength of a feature in an image may depend on the scale at which the appropriate detection operator is applied. It is also the case that many features in images exist significantly over a limited range of scales, and, of particular interest here, that the most salient scale may vary spatially over the feature. Hence, when designing feature detection operators, it is necessary to consider the requirements for both the systematic development and adaptive application of such operators over scale- and image-domains. We present an overview to the design of scalable derivative edge detectors, based on the finite element method, that addresses the issues of method and scale-adaptability as presented in . The finite element approach allows us to formulate scalable image derivative operators that can be implemented using a combination of piecewise-polynomial and Gaussian basis functions. The issue of scale is addressed by partitioning the image in order to identify local key scales at which significant edge points may exist. This is achieved by consideration of empirically designed functions of local image variance. The general adaptive technique may be applied to a range of operators. Here we evaluate the approach using image gradient operators, and we present comparative qualitative and quantitative results for both first and second order derivative methods.
This paper describes a design methodology for constructing machine vision systems. Central to this is the use of empirical design techniques and in particular quantitative statistics. The approach views both the construction and evaluation of systems as one and is based upon what could be regarded as a set of self-evident propositions;
(1) Vision algorithms must deliver information allowing practical decisions regarding interpretation of an image.
(2) Probability is the only self-consistent computational framework for data analysis, and so must form the basis of all algorithmic analysis processes.
(3) The most effective and robust algorithms will be those that match most closely the statistical properties of the data.
(4) A statistically based algorithm which takes correct account of all available data will yield an optimal result. Where the definition of optimal can be unambiguously defined by the statistical specification of the problem.
Machine vision research has not emphasized the need for (or necessary
methods of) algorithm characterization, which is unfortunate, as the
subject cannot advance without a sound empirical base. In general this problem can be attributed to one of two factors; a poor understanding of the role of assumptions and statistics, and a lack of appreciation of what is to be done with the generated data.
The methodology described here focuses on identifying the statistical
characteristics of the data and matching these to the assumptions of the underlying techniques. The methodology has been developed from more than a decade of vision design and testing, which has culminated in the construction of the TINA open source image analysis/machine vision system [htt://www.tina-vision.net].
We create digital holograms of real-world objects using a process called phase-shift digital holography. This system has been used as the basis for a three-dimensional object reconstruction and recognition technique. We present the results of applying lossless and lossy data compression to individual holographic frames. The lossy techniques are based on quantization and amplitude equalization. We also present a novel technique that uses only phase information of the digital hologram for the real-time optical reconstruction of three-dimensional objects.
We investigate the visual and vocal modalities of interaction with computer systems. We focus our attention on the integration of visual and vocal interface as possible replacement and/or additional modalities to enhance human-computer interaction. We present a new framework for employing eye gaze as a modality of interface. While voice commands, as means of interaction with computers, have been around for a number of years, integration of both the vocal interface and the visual interface, in terms of detecting user's eye movements through an eye-tracking device, is novel and promises to open the horizons for new applications where a hand-mouse interface provides little or no apparent support to the task to be accomplished. We present an array of applications to illustrate the new framework and eye-voice integration.
A number of methods have been recently proposed in the literature for the encryption of 2-D information using optical systems based on the Fractional Fourier Transform. In this paper a brief review of the methods proposed to date is presented. A new technique based on a random shifting algorithm is proposed. The new method is compared numerically to the existing methods. A measure of the strength/robustness of the level of encryption of the various techniques is proposed and a comparison is carried out between the methods. Optical implementations are discussed as are robustness of the systems with respect to misalignment and random noise.
In this paper we present a novel algorithm to track the facial
features using a deformable triangle model. The face is modeled as an isosceles triangle connecting the eyes and lips which are called as features of interest (FoI). A maximum likelihood estimator is used to estimate the position of such a triangle in the image sequences by maximizing the correlation value of the feature template in the image and also the probability that the resulting structure represents a face structure. A method is proposed to remove the noise in the obtained structure by projecting it into a shape subspace of isosceles triangles. Burst error in tracking is removed by using a Kalman filter. The algorithm can successfully locate and track the facial features on two sets of video sequences obtained in the laboratory under normal lighting condition with a cluttered background.
We assess both marginal density clustering, and spatial clustering
using a Markov random field, on multiband Earth observation data.
We use a Bayes factor assessment procedure in all cases. We find that
the spatial model leads to better results, although the non-spatial
clustering achieves a better false alarm rate.
The paper deals with a problem arisen in developing a system for the aided virtual recomposition of fragmented frescos (in particular the S. Mathew's fresco of the S. Francis Upper Church in Assisi). The goal is to expand the capabilities of the operators which remains responsible of the whole process. A core functionality is the automatic evaluation of similarity between images of fragments in a consistent way with evaluations made by humans using their visual perception: a critical property for working in tight cooperation with the operators. This requires a color representation close to human color matching.
S-CIELAB, a spatial extension of the CIELAB color representation, is a space whose metrics closely reproduces, through the Euclidean norm, the color distances perceived by a human observer and accounts for the effects of the spatial distribution of colors.
S-CIELAB extends CIELAB by incorporating factors related to the pattern-color sensitivity of the human eye. The system ascribes to the fragment pattern-color characteristics according to the visual perception the human operator has of the fragment; the use of automatic tools for color evaluation avoids the inconsistent results due to different operators and to fatigue of the same person over time.
The goal of our work is the development of a system for the virtual aided recomposition of frescos. The system is intended to be applied to the St Matthew's fresco, painted by Cimabue in the Upper Church of St Francis in Assisi and fragmented during the earthquake on September 1997. The high number of fragments, the necessity of avoiding further damages caused by physical manipulation and the opportunity of coordinating the work of several operators while increasing their efficiency suggest the use of the virtual modality. On the other hand the intrinsic characteristics of the digital images (providing insufficient information for finding automatically the correct location of each fragment inside the fresco) prevent the possibility of a fully autonomous system. Humans and digital system must therefore work together in order to optimally accomplishing the task. A key aid to the operators is the retrieval of fragments similar to suitably chosen examples.
Unfortunately, the acquisition set-up of the available image of the whole fresco (type of camera, light and geometry) is completely unknown. Even the acquisition of fragments has produced images that are not homogeneous in terms of luminance. The similarity function between images must account for these problems in order to provide an effective help to the operators. The results obtained using an illuminant invariant algorithm for the comparison of images, the Color Angular Indexing will be described. This algorithm has performed well in several different tests, returning reliable similarity estimations on images of fragments in spite of rotations as far as changes in scale and illumination.
The work which we report on here makes use of a new (patented)
technique for measuring the tensile and viscosity properties of
any liquid. One modality uses a laser-derived beam of light
directed into a drop as it builds up on a drop-head, grows and
eventually falls off through gravity. The light is reflected
through the drop, and a trace is built up of its intensity over
time. The trace has been found to have very good discrimination
potential for various classes of liquid. Other sensing modalities
can be used, -- multiple simultaneous optical and near infrared
wavelengths, ultraviolet, ultrasound. In the studies reported on
here, we use the ultrasound modality. Further background on this
new technology for the fingerprinting of liquid content and
composition can be found in McMillan et al. (1992, 1998, 2000).
CT Colonography (CTC) is a new non-invasive colon imaging technique which has the potential to replace conventional colonoscopy for colorectal cancer screening. A novel system which facilitates automated detection of colorectal polyps at CTC is introduced. As exhaustive testing of such a system using real patient data is not feasible, more complete testing is achieved through synthesis of artificial polyps and insertion into real datasets. The polyp insertion is semi-automatic: candidate points are manually selected using a custom GUI, suitable points are determined automatically from an analysis of the local neighborhood surrounding each of the candidate points. Local density and orientation information are used to generate polyps based on an elliptical model. Anomalies are identified from the modified dataset by analyzing the axial images. Detected anomalies are classified as potential polyps or natural features using 3D morphological techniques. The final results are flagged for review. The system was evaluated using 15 scenarios. The sensitivity of the system was found to be 65% with 34% false positive detections. Automated diagnosis at CTC is possible and thorough testing is facilitated by augmenting real patient data with computer generated polyps. Ultimately, automated diagnosis will enhance standard CTC and increase performance.
Magnetic Resonance Cholangiopancreatography (MRCP) is a type of MR imaging which utilizes protocols designed to enhance stationary fluids in the imaged volume. In this way it visualizes the pancreatobiliary tract by highlighting the bile and pancreatic juices in the system. Current practice sees this data being assessed directly, with little or no processing being performed prior to review. MRCP data presents three main difficulties when it comes to image processing. The first is the relatively noisy nature of the data. Second is its low spatial resolution, especially in the inter-slice direction. And third, the variability observed between MRCP studies, which makes consistent results difficult to attain. This paper describes the initial phase of research which aims to develop assistive image analysis techniques to aid in the interpretation of MRCP data. The first stage in this process is the robust segmentation of the pancreatobiliary system. To this end a segmentation procedure has been developed using an approach based on the tools and techniques of the mathematical morphology. This paper examines the task at hand and presents initial results, describing and assessing the segmentation approach developed.
There is a family of difficult image-processing scenarios which
involve seeking out and quantifying minute changes within a sequence
of near-identical images. Traditionally these have been dealt with by
carefully registering the images in terms of position, orientiation
and intensity, and subtracting them from some template image. However, for critical measurements, this approach breaks down if the
point-spread-functions (PSFs) vary even slightly from image to
image. Subtraction of registered images whose PSFs are not matched
leads to considerable residual structure, which may be mistakenly
interpreted as real features rather than processing artefacts. In
astronomy, software known as ISIS has been developed to
fully PSF-match image sequences and to facilitate their analysis. We
show here the tremendous improvement in detection rates and
measurement accuracy which ISIS has afforded in our program for the
study of rare variable stars in dense, globular star clusters. We
discuss the genesis from this work of our new program to use ISIS to
search for extra-solar planets in transit across the face of stars in
such clusters. Finally we illustrate an application of ISIS in the
industrial imaging sector, showing how it can be used to detect minute faults in images of products.
The EGRET gamma-ray telescope has left a legacy of unidentified astronomical sources. Most likely, many of the galactic plane sources will be rotation-powered pulsars. Firm identification has been difficult, given the instrument's poor spatial resolution. The problem is exacerbated by the energy dependant Point Spread Function (PSF) and low numbers of source counts. The main method of identifying sources to-date has been a maximum likelihood method. We have taken a different approach, namely that of regularized deconvolution with a spatially invariant PSF, which is used in optical astronomy and medical X-ray imaging. This technique revealed that wavelet denoising of residuals produced smooth, relatively artefact-free images with improved spatial location. Our source location using standard centroiding produced an improvement in relative spatial location, ranging from 10:1 to 2:1 proportional to source strength. Wavelet deconvolution simultaneously achieves background smoothing, while improving sharpness of the resolved objects. The photon-sparse nature of these images makes them an ideal test bed for such techniques. Although deconvolution does not ordinarily conserve flux, in this instance the flux determination is unaffected in all but the most crowded regions. Finally, we show that the energy dependent PSF can be used to identify objects with a restricted range of energy spectra.
Position determination and verification of a mobile robot is a
central theme in robotics research. Several methods have been
proposed for this problem, including the use of visual feedback
information. These vision systems typically aim to extract known
or tracked landmarks from the environment to localize the robot.
Detection and matching these landmarks is often the most
computationally expensive and error prone component of the system.
This paper presents a real-time system for robustly matching
landmarks in complex scenes, with subsequent tracking. The vision
system comprises of a trinocular head, from which corner points
are extracted. These are then matched with respect to robustness
constraints in addition to the trinocular constraints. Finally,
the resulting robustly extracted corners are tracked from frame to
frame to determine the robot's rotational deviations.
Information navigation and search on the part of a user requires thorough description of the information content of signal and image datasets and archives. Large signal and image databases need comprehensive metadata to facilitate user access. There is no unique way to describe the semantics of images and signals. Therefore a conceptual model serves as an initial platform. From the conceptual
model, a database design can be derived, or a definition of metadata. The different steps from model to description can benefit from tools such as the Unified Modeling Language (UML) for the conceptual model, standard Entity/Relationship (ER) models for database design, and eXtensible Markup Language (XML) for metadata description. As examples of the process of conceptual design and semantic description, we consider the case of a signal database, and the case of astronomical image databases.
A method for registering pairs of digital ophthalmic images of the retina is presented using anatomical features as control points present in both images. The anatomical features chosen are blood vessel crossings and bifurcations. These control points are identified by a combination of local contrast enhancement, and morphological processing. In general, the matching between control points is unknown, however, so an automated algorithm is used to determine the matching pairs of control points in the two images as follows. Using two control points from each image, rigid global transform (RGT) coefficients are calculated for all possible combinations of control point pairs, and the set of RGT coefficients is identified. Once control point pairs are established, registration of two images can be achieved by using linear regression to optimize an RGT, bilinear or second order polynomial global transform. An example of cross-modal image registration using an optical image and a fluorescein angiogram of an eye is presented to illustrate the technique.
Features are derived from wavelet transforms of images containing a
mixture of textures. In each case, the texture mixture is segmented, based on a 10-dimensional feature vector associated with every pixel. We show that the quality of the resulting segmentations can be characterized using the Potts or Ising spatial homogeneity parameter. This measure is defined from the segmentation labels. In order to have a better measure which takes into account both the segmentation labels and the input data, we determine the likelihood of the observed data given the model, which in turn is directly related to the Bayes information criterion, BIC. Finally we discuss how BIC is used as an approximation in model assessment using a Bayes factor.
We present a novel approach to compression of video frames based on the foveation behavior of the human visual system (HSV). Eye fixations on a video frame, as depicted by eye-gaze trace data, define an imaginary region of interest. The perceived resolution of the frame by the human eye depends totally on this eye-gaze (fixation) point. The resolution, then, decreases dramatically with the distance from the fovea. This behavior of the HSV has gained interest in the image and video processing area recently especially in compression of images or video frames. We present an approach where eye-gaze trace data are intergral to the compression process which has demonstrated its usefulness in yielding high compression performance. We partition a video frame into three regions: the inner-most incudes a point of eye-gaze for which we apply lossless compression; an outer region which encompasses the first and for which we apply visually lossless (near-lossless) compression, and finally an outmost region where lossy compression is applied. Because of its low computational complexity, we use the Haar wavelet transform. Preliminary results are promising and show improvement over other methods which are mainly full frame based.
We address the problems of (1) segmenting coarse from fine granularity materials, and (2) discriminating between materials of different granularities. For the former we use wavelet features, and an enhanced version of the widely used EM algorithm. A weighted Gaussian mixture model is used, with a second order spatial neighborhood. For granularity discrimination we investigate the use of multiresolution entropy. We illustrate the good results obtained with a number of practical cases.
Matrix algorithms are important in many types of applications including image and signal processing. These areas require enormous computing power. A close examination of the algorithms used in these, and related, applications reveals that many of the fundamental actions involve matrix operations such as matrix multiplication which is of O (N3) on a sequential computer and O (N3/p) on a parallel system with p processors complexity. This paper presents an investigation into the design and implementation of different matrix algorithms such as matrix operations, matrix transforms and matrix decompositions using an FPGA based environment. Solutions for the problem of processing large matrices have been proposed. The proposed system architectures are scalable, modular and require less area and time complexity with reduced latency when compared with existing structures.
There are shortcomings in existing 3D visualization technology. These may include poor image quality, significant computer processing requirements, and/or awkward viewing aids. These problems have been overcome with a 4D volumetric imaging system called Vis4D(TM) system. This system produces an image having apparent solidity suspended in space in front of the viewer. It is capable of displaying images for a wide range of general and specialized applications.