To help gauge the health of coral reef ecosystems, we developed a prototype of an underwater camera module to automatically census reef fish populations. Recognition challenges include pose and lighting variations, complicated backgrounds, within-species color variations and within-family similarities among species. An open frame holds two cameras, LED lights, and two ‘background’ panels in an L-shaped configuration. High-resolution cameras send sequences of 300 synchronized image pairs at 10 fps to an on-shore PC. Approximately 200 sequences containing fish were recorded at the New York Aquarium’s Glover’s Reef exhibit. These contained eight ‘common’ species with 85–672 images, and eight ‘rare’ species with 5–27 images that were grouped into an ‘unknown/rare’ category for classification. Image pre-processing included background modeling and subtraction, and tracking of fish across frames for depth estimation, pose correction, scaling, and disambiguation of overlapping fish. Shape features were obtained from PCA analysis of perimeter points, color features from opponent color histograms, and ‘banding’ features from DCT of vertical projections. Images were classified to species using feedforward neural networks arranged in a three-level hierarchy in which errors remaining after each level are targeted by networks in the level below. Networks were trained and tested on independent image sets. Overall accuracy of species-specific identifications typically exceeded 96% across multiple training runs. A seaworthy version of our system will allow for population censuses with high temporal resolution, and therefore improved statistical power to detect trends. A network of such devices could provide an ‘early warning system’ for coral ecosystem collapse.
Eye-trackers are emerging computer-input devices. This paper describes an experiment to measure the performance an eye- tracker. Detailed analysis of the system and experimental data show that for a typical 'move and select' task, the eye-tracker is two times faster than traditional computer input pointing devices like the mouse. Also, the cognitive start time for the eye-tracker is about 100-200 milliseconds less than that of other input pointing devices.
This paper outlines a new technique for processing structural color texture image to obtain textural units at multiple structural layers. The multi-layered color texture unit segmentation and feature abstraction allows for efficient structural texture classification and synthesis. In the three-phase process, color texture is first quantized and transformed into a grey-scale texture image. An efficient procedure for color feature clustering using PAM is introduced. In the second pahse, the global statistical features of texture are used to determine the texture unit size and the spatial relationship between the texture units in a periodic texture pattern. Finally, for texture units with internal structures, multi-layered segmenter is developed to separated these internal structures at different layers. Feature extraction and synthesis can be conducted at these multiple layers.
A real-time face tracking and facial information acquisition system developed for multimodal man-machine communication s presented in this paper. The sadism can track a human face and report mouth position and other facial information in real-time. A stochastic model to characterize the skin color distribution of human skin is used to transform the hue image of the HSI color space to a skin color probability distribution image. A modified mean shift algorithm is then applied to find the mode of the probability distribution, which estimates the face window. To make the system more robust, 1D projections of the intensity image in potential face windows are proposed to verify and adjust face location.
A real-time image processing and control interface for remote operation of a microscope is presented in this paper. The system has achieved real-time color image display for 640 X 480 pixel images. Multi-resolution image representation can be provided for efficient transmission through the network. Through the control interface the computer can communicate with the programmable microscope via the RS232 serial ports. By choosing one of three scanning patterns, a sequence of images can be saved as BMP or PGM files to record information on an entire microscope slide. The system will be used by medical and graduate students at the University of Medicine and Dentistry of New Jersey for distance learning. It can be used in many network-based telepathology applications.
Color and texture have long been used as image features to segment and classify images. In most of the previous approaches, color and texture are used as two uncorrelated features, while in the real world, the spatial information and spectral information of an image are often tightly coupled. An feature extraction algorithm is studied in this paper, which represents colored texture in a unified way. With this approach, different spectral channels are correlated spatially to give an unified representation of both the color and texture information. In order to use this feature in image segmentation applications, properties of the feature are studied. A novel segmentation algorithm is proposed based on the study. Preliminary segmentation results are presented.
A real-time face tracker is presented in this paper. The system has achieved 15 frames/second tracking using a Pentium 200 PC with a Datacube MaxPCI image processing board and a Panasonic RGB color camera. It tracks human faces in the camera's field of view while people move freely. A stochastic model to characterize the skin color distribution of human skin is used to segment the face and other skin areas from the background. Median filtering is then used to clean up the background noise. Geometric constraints are applied to the segmented image to extract the face from the background. To reduce computation and achieve real-time tracking, 1D projections (horizontal and vertical) of the image are analyzed instead of the 2D image. Run-length- encoding and frequency domain analysis algorithms are used to separate faces from other skin-like blobs. The system is robust to illumination intensity variations and different skin colors. It can be applied to many human-computer interaction applications such as sound locating, lip- reading, gaze tracking and face recognition.
We have developed a high frame rate image display system to study attentional control and information capacity limitations for perception of static objects in a visual display. The system presents images at 114.4 frames/sec using the stereo mode of a video display monitor and the Datacube MV200 image processing system. The proposed experimental paradigm is an extension of previous work where numeric icons were displayed at `Stimulus Onset Asynchrony' (SOA) of as low as 16.7 msec/icon. With our high frame rate display system, we can achieve lower SOAs of 8.7 msec/icon and hence further examine the perceptual capabilities for short duration displays of static objects.
In the field of marine biology, determining the presence and quantities of different types of fish is traditionally done by dragging nets across the bottom, and counting that which is found in the nets. This method, although accurate, kills the collected fish, damages the habitat, and consumes large quantities of time. This paper presents an alternative. A machine vision system is capable of counting and measuring fish in an ocean environment. Illumination presents a unique problem in this environment. Object orientation and measurement are related and resolved issues. An adaptive thresholding technique is required to appropriately segment the fish from the background in the images. Mode detection, and histogram analysis are useful tools in determining these localized thresholds. This system, created in conjunction with the Rutgers Institute for Marine and Coastal Science, effectively counts and measures fish in an estuarine environment.
Proc. SPIE. 3521, Machine Vision Systems for Inspection and Metrology VII
KEYWORDS: Mouth, Signal to noise ratio, Information visualization, Visualization, Target detection, Digital signal processing, Detection and tracking algorithms, Interference (communication), RGB color model, Array processing
A visual information directed microphone array system is presented in this paper. This system uses a real-time mouth tracking system to direct a beam-former focusing on the mouth. The microphone array system is implemented on a PC with a Signalogic 8-channel DSP board and reports a better signal-to-noise ratio sound capturing in a high noise environment.
Driving requires two basic visual components: 'visual sensory function' and 'higher order skills.' Among the elderly, it has been observed that when attention must be divided in the presence of multiple objects, their attentional skills and relational processes, along with impairment of basic visual sensory function, are markedly impaired. A high frame rate imaging system was developed to assess the elderly driver's ability to locate and distinguish computer generated images of vehicles and to determine their direction of motion in a simulated intersection. Preliminary experiments were performed at varying target speeds and angular displacements to study the effect of these parameters on motion perception. Results for subjects in four different age groups, ranging from mid- twenties to mid-sixties, show significantly better performance for the younger subjects as compared to the older ones.
This paper presents initial results in a study comparing the effectiveness of visible and infra-red (IR) imagery for detecting and recognizing faces in areas where personnel identification is critical, (e.g., airports and secure buildings). We compare the effectiveness of visible versus IR imagery by running three face recognition algorithms on a database of images collected for this study. There are both IR and visible images for each person in the database collected using the same scenarios. We used three very different feature-extraction and decision-making algorithms for our study to insure that the comparisons would not depend on a particular processing technique. We also present recognition results when visible and infra-red decision metrics are fused. The recognition results show that both visible and IR imagery perform similarly across algorithms and that fusion of IR and visible imagery is a viable means of enhancing performance beyond that of either acting alone. We examine the relative importance of different regions of the face for recognition. We also discuss practical issues of implementation, along with plans for the next phase of the study, face detection in an uncontrolled environment. Preliminary face detection experiments are described.
An error concealment scheme for MPEG video networking is presented. Cell loss occurs in the presence of network congestion and buffer overflow. This phenomenon of cell loss transforms into lost image blocks in the decoding process, which can severely degrade the viewing quality. The new method differs from the conventional concealment by its exploitation of spatial and temporal redundancies in large scale. The motion estimation is carried out by registering images within a multiresolution pyramid. The global motion is estimated in the lowest resolution level, and is then used to update and refine the local motion. The local motion is further refined iteratively at higher resolution levels. An affine transform is used to extract translation, scaling and rotation parameters. In many applications where there is significant camera motion (e.g., remote surveillance), the new method performs better than the conventional concealment.
A crucial step in the manufacture of vaccines is the verification of their potency. An assay of the potency must be carried out on every batch produced to determine the safety and efficacy of the vaccine. Currently, human inspectors count the number of plaques (holes) in a cell layer in a petri dish to estimate the potency.They must determine whether nearby plaques that have overgrown each other's borders are single or multiple plaques and distinguish between plaques and small tears in the cell layer resulting from the processing operations (the edges of tears differ in appearance from the edges of plaques). Because of the judgments required to make these subtle distinctions, human inspectors are inconsistent. In cooperation with Merck & Co., Inc., the Rutgers University Center for Computer Aids for Industrial Productivity has demonstrated the feasibility of achieving consistent automatic counting of plaques by a prototype intelligent machine vision system. The David Sarnoff Research Center developed materials handling equipment and factory information system interfaces to enable this prototype system to be installed in a quality control facility at Merck. This paper describes the overall operation of the machine vision aspects of the system, including optics, illumination, sensing, preprocessing, feature extraction and shape recognition. Results of initial tests of the system are also reported.
This paper investigates the use of tranform coding and the neural tree network on data obtained from two security systems; face recognition and explosive detection. The use of discrete cosine transform components as features for classification are demonstrated on face recognition data. The use of cepstral components as features for classification are demonstrated for explosive detection on coherent x-ray scattering data, where surrounding materials nonlinearly affect the spectral data obtained from crystalline explosives. The neural tree network is described and shown to be an effective classifier in both applications.
Quantitative analysis of the powder blending process is important in many industries, e.g. pharmaceutical, glass, food products. Inefficient blending can lead to inhomogeneous powder mixtures and unacceptable product variability. A new method has been devised by F.J. Muzzio and his students to characterize the uniformity of powder mixtures by solidifying samples of the mixtures without disturbing their structure, and subjecting them to machine vision analysis. The key components of the mixture are colored and, with appropriate illumination, the mixture percentage is directly related to video signal intensity. This paper reviews the machine vision algorithms required to perform the analysis, focussing in particular on the real-time hardware configurations that enable significant amounts of data to be collected for use in evaluation of the integrity of the blending process.
A face recognition system has been developed and demonstrated at the Rutgers University Center for Computer Aids for Industrial Productivity. The system uses a preliminary data reduction step, gray scale projection, and a fast transform technique to greatly reduce the computational complexity of the problem and, consequently, the cost of high-speed implementation. The decision function is a new, extremely cost-effective neural network, the Mammone/Sankar Neural Tree Network. This paper examines the use of gray scale projection in detail, and demonstrates the use of 1D signal processing techniques in 2D imaging applications. Results are presented showing immunity to changes in expression and small rotations about the vertical axis.
The complex log polar transform is implemented on a multiresolution foveating sensor. The foveating sensor is a new device that can be programmed to capture an image with variable size pixels (super pixels) at very high speed providing data reduction at the sensor stage. A structured scanning pattern is suggested that approximates the log polar mapping and an algorithm presented that describes how to group the scanning super pixels into the transform. Simulations of the approximate log polar transform show that changes of scale (about 4:1) and rotation (0 to 360 degrees) of an object in the input image are converted into, respectively, the horizontal and cyclicly vertical shifts in an output image (a computational map). Therefore, the task of pattern recognition is greatly simplified and can be performed on the computational map by correlation. Finally, the sensitivity of the placement of the scanning pattern to the object, or centroid mismatch, is discussed.
Detection of explosive materials from X-ray diffraction spectra makes use of the fact that different crystalline materials exhibit characteristic diffraction patterns composed of peaks at different energy locations. The position of the peaks in the spectra are (ideally) invariant for a given material, as are the relative heights of the peak, though to a lesser degree. However, the presence of absorbing materials may alter the measured heights of the peaks, or even eliminate certain peaks altogether. Furthermore, lower signal-to-noise ratios in the spectra, due to short exposure/scanning times, lead to further distortion of the spectra. In this paper we present a feature set which offers some degree of robustness in the presence of such distortions.
A face recognition system has been developed and demonstrated at the Rutgers University Center for Computer Aids for Industrial Productivity. The system uses a preliminary data reduction step. gray scale projections, and a fast transform technique to greatly reduce the computational complexity of the problem and, consequently, the cost of high-speed implementation. The decision function is a few, extremely cost-effective neural network, the Mammone/Sankar Neural Tree Network. This network can be trained and re-trained rapidly on face image data and the system has built-in facilities for acquiring and editing a large data base of face images. Recognition rates higher than 90% were achieved on data sets containing up to 269 subjects. More importantly, it performed well on subjects with and without their glasses, under a wide range of changes in facial expressions, and under a variety of small tilts, translations and rotations.