A modern digital camera is not just a single sensor capturing light. It is an ensemble of different sensors which
capture independent contextual information about the photo shooting event. This is stored as metadata in the
image. In this paper, we demonstrate how the optical metadata (data related to the optics of the camera) can
be retrieved, interpreted and used along with content information for organizing and indexing digital photos.
Our model is based on the physics of vision and operation of a camera. We use our algorithm on images from
personal photo albums. Our results show that the optical metadata improves annotation performance and
decreases the search space for retrieval.
Photo stream segmentation is to segment photo streams into groups, each of which corresponds to an event. Photo
stream segmentation can be done with or without prior knowledge of event structure. In this paper, we study the problem
by assuming that there is no a priori event model available. Although both context and content information are important
for photo stream segmentation, we focus on investigating the usage of context information in this work. We consider
different information components of context such as time, location, and optical setting for inexpensive segmentation of
photo streams from common users of modern digital camera. As events are hierarchical, we propose to segment photo
stream using hierarchical mixture model. We compare the generated hierarchy with that created by users to see how well
results can be obtained without knowing the prior event model. We experimented with about 3000 photos from amateur
photographers to study the efficacy of the approach for these context information components.
The management of the vast amount of media assets captured at every day events such as meetings, birthday parties,
vacation, and conferences has become an increasingly challenging problem. Today, most media management applications
are media-centric. This means, they put the captured media assets into the center of the management. However,
in recent years it has been proposed that events are a much better abstraction of human experience and thus
provide a more appropriate means for managing media assets. Consequently, approaches that include events into
their media management solution have been explored. However, they typically consider events only as some more
metadata that can be extracted from the media assets. In addition, today's applications and approaches concentrate
on particular problems such as event detection, tagging, sharing, classification, or clustering and are often focused
on a single media type. In this paper, we argue for the benefits of an event-centric media management (EMMa) approach
that looks at the problem of media management holistically. Based on a generic event model, we specify a
media event model for the EMMa approach. The single phases and processes of the EMMa approach are defined in
a general process chain for an event-centric media management, the EMMa cycle. This cycle follows the event concept
throughout all phases and processes of the chain and puts the concept of events in the center of the media management.
Based on the media event model and EMMa cycle, we design a component-based architecture for the
EMMa approach and conduct an implementation of the approach.
This paper introduces a model of a spatio-temporal database, which we are developing to query interesting events in video sequences. The database we are designing is pushing the state-of-the-art for a number of fields, and there are many issues that are still waiting a satisfactory solution. In this paper, we present our (albeit still partial) answer to some of these problems, and the future direction of our work. Our design is divided into two layers: a logbook, which operates as a short time repository of unsummarized and unprocessed data, and a long-term spatio-temporal database, which stores and queries summarized data.
In this paper, we introduce our approach to multimedia database interfaces. Although we deal mainly with image databases, most of the ideas we present can be generalized to other types of data. We argue that, when dealing with complex data, such as images, the problem of access must be redefined along different lines than text databases. In multimedia databases, the semantics of the data is imprecise, and depends in part on the user's interpretation. This observation made us consider the development of interfaces in which the user explores the database rather than querying it. In this paper, we give a brief justification of our position and present the exploratory interface, which we have developed for our image database El Nino.
In this paper, we consider the problem of similarity between video sequences. Three basic questions are raised and (partially) answered. Firstly, at what temporal duration can video sequences be compared? The frame, shot, scene and video levels are identified. Secondly, given some image or video feature, what are the requirements on its distance measure and how can it be 'easily' transformed into the visual similarity desired by the inquirer? Thirdly, how can video sequences be compared at different levels? A general approach based on either a set or sequence representation with variable degrees of aggregation is proposed and applied recursively over the different levels of temporal resolution. It allows the inquirer to fully control the importance of temporal ordering and duration. Promising experimental results are presented.
The temporal and multi-modal nature of video increases the dimensionality of content based retrieval problem. This places new demands on the indexing and retrieval tools required. The Virage Video Engine (VVE) with the default set of primitives provide the necessary frame work and basic tools for video content based retrieval. The video engine is a flexible platform independent architecture which provides support for processing multiple synchronized data streams like image sequences, audio and closed captions. The architecture allows for multi-modal indexing and retrieval of video through the use of media specific primitives. This paper presents the use of the VVE framework for content based video retrieval.
Most current image retrieval systems use holistic comparison that require a global match between images or presegmented object in images. However, often the user of an image database system is interested in a local match between images. For example, `Find images from the database with something like this anywhere in the image,' or `Fine images with something like this in some region of any image in the database,' or `Find images with this spatial configuration of regions like this.' In this paper, we provide an overview of a new framework that should help to allow these types of queries to be answered efficiently. In order to illustrate the usefulness of our framework, we have developed a complete image retrieval system based on local color information. Our system features fully automatic insertion and very efficient query execution, rivaling the efficiency of systems that can only handle global image comparisons. The query execution engine, called the ImageGREP Engine, can process queries at a speed of approximately 3000 images per second (or better) on a standard workstation when the index can be stored in main memory. In the future, we believe our framework should be used in other domains and applications, to handle queries based on texture or other material properties and perhaps domain specific image properties.
Efficient indexing support is essential to allow content-based image and video databases using similarity-based retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6-100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized k-d tree we call the VAM k-d tree, and provide algorithms to create an optimized R-tree we call the VAMSplit R-tree. We found that the VAMSplit R-tree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*-tree and SS-tree in secondary memory applications, and modest improvements relative to optimized k-d tree variants.
Until recently, the management of large image databases has relied exclusively on manually entered alphanumeric annotations. Systems are beginning to emerge in both the research and commercial sectors based on 'content-based' image retrieval, a technique which explicitly manages image assets by directly representing their visual attributes. The Virage image search engine provides an open framework for building such systems. The Virage engine expresses visual features as image 'primitives.' Primitives can be very general (such as color, shape, or texture) or quite domain specific (face recognition, cancer cell detection, etc.). The basic philosophy underlying this architecture is a transformation from the data-rich representation of explicit image pixels to a compact, semantic-rich representation of visually salient characteristics. In practice, the design of such primitives is non-trivial, and is driven by a number of conflicting real-world constraints (e.g. computation time vs. accuracy). The virage engine provides an open framework for developers to 'plug-in' primitives to solve specific image management problems. The architecture has been designed to support both static images and video in a unified paradigm. The infrastructure provided by the Virage engine can be utilized to address high-level problems as well, such as automatic, unsupervised keyword assignment, or image classification.
To the end-user of a video database, content consists of objects and events occurring in the video. A video database system must be designed to extract, represent and organize this information in a fashion that supports querying, manipulation and data visualization by a user. As a data modeling exercise, objects and events are defined in terms of semantic attributes such that an end-user's queries are expressible through the modeling language. On the other hand, as a feature extraction exercise, objects are defined as solutions to equations, often in terms of low-level visual primitives like voxels or contours. These two formalisms constitute entirely different languages. However, integration of these two approaches can provide a powerful mechanism for description and manipulation of complex visual data. This paper explores issues involved with this integration. We introduce the notion of a visual data modeling language (VDML), which supports data definition and data manipulation operations over complex visual data characteristic of video database systems. We discuss this data- modeling effort in the context of our multiple perspective interactive video system which generates three-dimensional data sets using input from multiple video cameras.
Interactive video and television viewers should have the power to control their viewing position. To make this a reality, we introduce the concept of Immersive Video, which employs computer vision and computer graphics technologies to provide remote users a sense of complete immersion when viewing an event. Immersive Video uses multiple videos of an event, captured from different perspectives, to generate a full 3D digital video of that event. That is accomplished by assimilating important information from each video stream into a comprehensive, dynamic, 3D model of the environment. Using this 3D digital video, interactive viewers can then move around the remote environment and observe the events taking place from any desired perspective. Our Immersive Video System currently provides interactive viewing and `walkthrus' of staged karate demonstrations, basketball games, dance performances, and typical campus scenes. In its full realization, Immersive Video will be a paradigm shift in visual communication which will revolutionize television and video media, and become an integral part of future telepresence and virtual reality systems.
A video database provides content based access to video. This is achieved by organizing or indexing video data based on some set of features. This paper defines the problem of video indexing based on video data models. The procedure required to index video data is outlined. The use of semi-automatic techniques to speed up the indexing processes are explored. These techniques use image motion features to aid in the indexing process. The techniques developed have been applied to video data from cable television feed.
Similarity between images is used for storage and retrieval in image databases. In the literature, several similarity measures have been proposed that may be broadly categorized as: (1) metric based, (2) set-theoretic based, and (3) decision-theoretic based measures. In each category, measured based on crisp logic as well as fuzzy logic are available. In some applications such as image databases, measures based on fuzzy logic would appear to be naturally better suited, although so far no comprehensive experimental study has been undertaken. In this paper, we report results of some of the experiments designed to compare various similarity measures for application to image databases. We are currently working with texture images and intend to work with face images in the near future. As a first step for comparison, the similarity matrices for each of the similarity measures are computed over a set of selected textures and are presented as visual images. Comparative analysis of these images reveals the relative characteristics of each of these measures. Further experiments are needed to study their sensitivity to small changes in images such as illumination, magnification, orientation, etc. We describe these experiments (sensitivity analysis, transition analysis, etc.) that are currently in progress. The results from these experiments offer assistance in choosing the appropriate measure for applications to image databases.
A low-flying autonomous rotorcraft traveling in an unknown domain must use passive sensors to detect obstacles and form a three-dimensional model of its environment. As the rotorcraft travels toward a predefined destination, it acquires images of stationary objects in its field of view. Several texture classifying operators are applied to the original intensity images to obtain "texture" images. The application of each operator to the sequence of images will form an alternate sequence of images in which pixel values encode a measure of texture. In our approach to reconstruct the environment, we divide the three dimensional space of interest ( i.e. the environment) into small cubic volumetric elements (voxels). It is assumed that the position and orientation of the the camera with respect to the environment is known. Thus, for every pixel in each image in a sequence, we can compute a ray originating at the camera center and extending through and beyond the pixel. The value observed at the pixel is assigned as an observation for all the voxels through which the ray passes. Then, using the mean and variance of the observations for each voxel, one can determine whether the voxel is full or empty. Each sequence of images is used to form a three dimensional model of the environment. The reconstruction obtained using the sequence of intensity images is not necessarily the same as that obtained using texture images. While intensity images may do well in one area of the scene, texture images may do well in others. Thus, by fusing the different environment models together, a more robust model of the environment is formed. We discuss various methods of fusing the environment models obtained using intensity as well as texture measures and discuss the advantages and disadvantages of each as related to our application. Finally, we present experimental results obtained using real image sequences.
Real-time process monitoring has been a primary concern of the semiconductor industry for a number of years. In an attempt to provide higher yield and performance it has become accepted that the monitoring of the etch step is critical. This is the primary motivation for the development of a real time process monitor with particular attention paid to wafer monitoring at the surface. To this end, the use of the results of diffraction, in particular, diffraction from 3-dimensional gratings is proving to be a viable technique for real time, process monitoring. Thus, the primary focus of this paper is to present the progress made toward the development of a non destructive, real time, optical metrology based system for direct wafer monitoring, utilizing the results of diffraction from a grating structure. The results of experiment and simulation will also be discussed.
In this paper we present an overview of the state of the art in SXM with emphasis on image processing techniques for SXM. We outline the principle of operation of different scanning probe microscopes. Issues related to sensor technology are discussed. Commercially available scanning probe microscopes are listed and their features summarized. We review in detail the image processing work that has been done to date in relation to SXM and raise relevant issues. Existing and potential applications of SXM are discussed. Finally, we point out directions for future research in image processing related to SXM.
Simulation is used in robot navigation for testing control algorithms such as obstacle avoidance and path planning. Simulation is also being used for generating expectations to guide sensory processing for robots operating in the real world. In this paper, we present the tradeoffs in designing an environment model for outdoor environments. The models for outdoor environments are significantly different from indoor environment models. Outdoor environments are inherently unstructured due to changing lighting conditions, variation in the form of objects, and the dynamic nature of the environment. We present the design tradeoffs involved in building models of such environments, from the perspective of simulating passive sensors like vision. We point out that a powerful approach to outdoor navigation is to have a detailed model of the environment that can provide expectations both in terms of the spatial location of scene entities and the operators suitable for detecting these entities in an image.
One of the most important technologies needed across many traditional and emerging applications is the management of visual information. Every day we are bombarded with information presented in the form of images. So important are images in our world of information technology, that we generate literally millions of images every day, and this number keeps escalating with advances in imaging, visualization, video, and computing technologies. Advances in video technology and its marriage with computing are resulting in the video-computing discipline. High Performance Computing and Communications (HPCC ) is emerging as a key technology for asserting international leadership in industrial, medical, scientific, defense, and environmental areas. Improved computational methods and information management tools are critical in order to enhance the national competitive edge across broad sectors of the economy. The Federal HPCC initiative will address the development of technologies which are essential for building the infrastructure to strengthen our position in meeting the challenges posed by the global developments in industry, political situations, and the environment areas. It would be impossible to cope with this explosion of image information, unless the images were organized for rapid retrieval on demand. A similar situation occurred in the past for numeric and other structured data, and led to the creation of computerized database management systems. In these systems, large amounts of data are organized into fields and important or key fields are used to index the databases making search very efficient. These information management systems have changed several aspects of the modern society. These systems, however, are limited by the fact that they work well only with numeric data and short alpha-numeric strings. Since so much information is in non-alphanumeric form (such as images, video, speech), to deal with such information, researchers started exploring the design and implementation of image databases. But creation of mere image repositories is of little value unless there are methods for fast retrieval of images based on their content, ideally with an efficiency that we find in today's databases. We should be able to search image databases with image-based queries, in addition to alphanumeric queries. The fundamental problem is that images, video, and other similar data differ from numeric data and text in format, and hence they require a totally different technique of organization, indexing, and query processing. We need to consider the issues in visual information management , rather than simply extending the existing database technology to deal with images. We must treat images as one of the central sources of information rather than as an appendix to the main database. A few researchers have addressed problems in image databases. Most of these efforts in image databases, however, focussed either on only a small aspect of the problem, such as data structures or pictorial queries, or on a very narrow application, such as databases for pottery articles of a particular tribe. Other researchers have developed image processing shells which use several images. Clearly, visual information management systems encompass not only databases, but aspects of image processing and image understanding, very sophisticated interfaces, knowledge-based systems, compression and decompression of images. Moreover, memory management and organization issues start becoming much more serious than in the largest alphanumeric databases. In failing to address any of these topics, one may either address only theoretical issues, or may work in a microcosm that will, at best, be extremely narrow in its utility and extensibility. It is clear that the tremendous progress in processing speed and memory technology has made it not only possible, but also attractive to design Visual Information Management Systems (VIMS) for many disparate applications. People already call the 90s the decade of imaging. On considering any of the Grand Challenge problems, such as weather forecasting, air pollution, the earth's biosphere, genome research, or the education network, it becomes clear that the existing database technology must be extended in several new dimensions, from managing tertiary memory to representing an object at varying degrees of detail. Many of the current issues in databases, such as interconnecting heterogeneous databases, are also important to VIMS. Moreover, VIMS have several of their own problems that must be addressed for making progress in challenging industrial and medical applications. Considering the growing need and interest in the organization and retrieval of visual and other non-alphanumeric information, and the insufficient number of academic projects in this area, a workshop on visual information management systems was sponsored Robotics and Machine Inteffigence, and Database and Expert Systems Programs of the National Science Foundation. The aim of the workshop was to bring together active researchers in databases, object- oriented systems, image and signal processing, multi-media, and other related areas to discuss important issues in managing the large amount of visual information that will play a key role in designing information systems of the future. In addition to the researchers in the above and related areas, a few researchers and practitioners interested in applying these systems were also invited.
Visual information systems require a new insertion process. Prior to storage within the database, the system must first identify the desired objects (shots and episodes), and then calculate a descriptive representation of these objects. This paper discusses the steps in the insertion process, and some of the tools we have developed to semi-automatically segment the data into domain objects which are meaningful to the user. Image processing routines are necessary to derive features of the video frames. Models are required to represent the desired domain, and similarity measures must compare the models to the derived features.
Scanning probe microscopy (SXM), which includes techniques such as scanning tunneling microscopy (STM) and scanning force microscopy (SFM), is becoming increasingly popular for analyzing surface structure at the sub-micron level. As the probe used for scanning is non- ideal, the image output by SXM is dependent on the shape and size of the probe. The use and success of SXM strongly depend on methods for ensuring the accuracy of the images produced by SXM. In this paper, we derive models of the effects of the probe shape geometry on the image produced by SXM. Methods are formulated for recovering the true surface from the imaged surface and for indicating where the surface reconstruction is exact and where it is uncertain. We formulate these methods both for images scanned in a `contact' mode and those scanned in a `non-contact' mode. It is shown that scanning in a non-contact mode by a non- ideal probe is equivalent to scanning in a non-contact mode by an ideal probe followed by scanning in a contact mode by the non-ideal probe. The methods developed in this paper can be used to recover a surface scanned by a scanning probe microscope, given the shape of the probe used for scanning, and for visualizing the scanning and recovery of surfaces by different probe shapes.
We present an efficient algorithm to compute the critical dimensions of aligned rectangular and trapezoidal wafer structures using images generated by a Fourier imaging system. We show that the Fourier images of aligned rectangular and trapezoidal structures are separable functions. This allows us to project them onto x and y coordinates and simplifies the computation process. We compute the critical dimensions of rectangular structures by estimating the distance between either peaks or zeros, or by estimating the distance between zeros for trapezoidal structures. For each projected 1-dimensional signal, we apply a zero- crossing technique to find the peaks or zeros, and then compute the critical dimensions at sub- pixel resolution.
Simulations have traditionally been used as off-line tools, for examining process models and experimenting with system models for which it would have been either impossible, too dangerous, expensive, or time-consuming to perform with the physical systems. We propose a novel way of regarding simulations as part of both the development and the working phases of systems. In our approach simulation is used within the processing and control loop of the system to provide sensor and state expectations. This minimizes the inverse sensory data analysis and model maintenance problems. We refer to this mode of operation as the verification mode, in contrast to the traditional discovery mode. This paper describes the integration of control, simulation, and planning within the mobile platform control and simulation interface program (MOSIM). MOSIM is a program which supports the combination of control and simulation of disparate platforms and environments. The main feature of MOSIM is the sensor simulations and the provision for capturing real sensory data and registering the simulated data with it. In order to provide simulations and planning that are intertwined with the control of a physical system, temporal issues have to be considered. By limiting the focus of the system to small portions of complex models which are temporarily relevant to the system's operation, the system is able to maintain its models and respond faster. For this we employ the context-based caching (CbC) mechanism within MOSIM. CbC is a novel knowledge management technique which maintains large knowledge bases by making the necessary information available at the right time.
A method for automating the analysis of images generated from the Fourier imaging (FI) process is described. Fourier imaging is used to observe the evolving topography of wafers as they are being
etched in a reactive ion etcher. The focus of the research described in this paper is the segmentation and analysis of images generated from FI. A brief overview of the theory of FI is presented. Segmentation of a Fourier image is done in three steps; local maxima detection, Fourier component extent identification, and boundary tracking. Quantification descriptions for each component are then generated from the segmented image. The algorithm presented demonstrates how it is possible to obtain subpixel resolution automatically in image analysis to collect data that may be used to compute critical dimensions of the microtopography of the wafer substrate.
This paper presents a stereo algorithm to recursively compute a boundary-level structural description of a scene, from a sequence of stereo images. The majority of existing stereo algorithms deal with individual points as the basic primitive to match between two or more images. While this keeps the implementation simple, the output description, which is a depth/disparity map, is represented as a composition of individual points. This is often undesirable as no semblance of the underlying structure of the scene is explicitly represented. A stereo matching algorithm is presented, based on connected line segments as the basic match primitive, which yields a description composed primarily of boundaries of objects in the scene. A description of this nature is very useful for obstacle avoidance and path planning for mobile robots. The stereo matching algorithm is integrated into a dynamic stereo vision system to compute and incrementally refine such a structural description recursively, using belief functions. The stereo camera motion between two viewpoints, which is necessary to register the two views, is recovered as part of the stereo computations. The approach is illustrated with a real dynamic stereo sequence acquired from a mobile robot.
STM (Scanning Tunneling Microscopy) and its variants, collectively called SXM (not including electron microscopy), are being increasingly used in electronics research and industry to study and inspect semiconductor surfaces. While it is possible to obtain quite high depth resolutions, lateral spatial resolution is chiefly determined by probe size and shape which is typically much larger than the depth resolution. Thus, an SXM image is not a reflection of the true surface shape but rather a 'convolution' of the surface and probe shapes. This paper reviews the theoretical and experimental work done in reconstructing surface shape from SXM images. The authors present a computational model of the SXM imaging process that encompasses previous models and show that the imaging process (convolution) is essentially a nonlinear operation and can be approximated mathematically by a morphological dilation between the surface and probe shape. The authors address the problem of inverting this process to estimate the true surface shape. A general method is developed by which a surface can be reconstructed from a composition of SXM images produced by different scanning probes. A multi-resolution version of this composite method is then described using a set of multiscaled probes that can recursively and efficiently reconstruct the entire surface as though it had been scanned entirely by the smallest probe. Some other useful and interesting results of our SXM imaging model are presented. The authors conclude by discussing the theoretical and practical importance of their computational model of SXM imaging process and directions for future work.
We address the problem of building a map of the environment utilizing sensory depth information obtained from multiple viewpoints. The desired representation of the environment is in the form of a finite-resolution three-dimensional grid of voxels. Each voxel within the grid is assigned a binary value corresponding to its occupancy state. We present an approach for multi-sensory depth information assimilation based on Dempster-Shafer theory for evidential reasoning. This approach provides a mechanism to explicitly model ignorance which is desirable when dealing with an unknown environment. A fundamental requirement for such an approach to be used is accurate knowledge of the camera motion between two viewpoints. We present a robust least median of squares (LMS) based algorithm to recover this motion which provides a self-calibration mechanism. We present results obtained from this approach on a laboratory stereo sequence.
Typical stereo algorithms produce sparse depth maps. The depth points often lie on object boundaries and thus obstacles cannot be distinguished from holes. Surface reconstruction algorithms work poorly with such sparse and irregularly spaced data points. We present an algorithm which uses intensity images and a sparse depth map to produce a depth map which can be used for navigation. Our approach is based on an intensity image segmentation algorithm followed by local surface fitting. The algorithm is fast and thus suitable for high speed navigation. Results using laboratoty images are presented.
An autonomous intelligent agent working in a real, unstructured, dynamic environment must have very close interactions among its perceptual, cognitive, and motor components. We believe that by placing the environment model at the heart of these systems this interaction can be facilitated significantly. In our approach, the environment model is responsible for interaction among different components, providing temporal coherence, combining information from multiple sensors, and providing the purposive behavior to the system. The information about the environment is acquired by using multiple disparate sensors, from multiple viewpoints, and at multiple time instants. We believe that the combination of information from disparate sensors should be viewed as a problem of information assimilation, rather than sensor integration. The focus in information assimilation is on the physical world being modeled, sensory information is just a means to the end. Sensor integration treats the goal implicitly, misplacing the focus on the processing of sensed information. Existing approaches towards autonomous systems tend to follow exclusively reactive or exclusively deliberated operations. We present an approach that provides a balance between reaction and deliberation.
Developing techniques for interpreting the structure of volumetric data is useful in many applications. A key initial stage is volumetric segmentation involving the processes of pariilionirig and idenificaion. Here we present a parallel algorithm which through the use of a-partitioning and volume filtering segments volumetric image data such that the greylevel variation within each volume can be described by a regression model. Experimental results demonstrate the effectiveness of this algorithm on several real-world 3D images.
There is growing interest in using the complex logarithmic mapping for depth determination in motion stereo applications. This has lead to a need for a comprehensive error analysis. Rather than just giving an analytic description of the errors inherent in the approach an attempt will be made to characterize the errors that occur when using the mapping with real images. Techniques to reduce the impact of these errors will also be discussed. 1.
The automation of visual inspection in semiconductor wafer processing is a very challenging task. In this paper we address the automatic description and measurement of surface textures in semiconductor wafers. Texture plays a critical role in inspecting surfaces that are produced at various stages in the inspection of semiconductor devices. In this paper we describe a novel scheme to characterize surface textures that arise in semiconductor wafer processing. The emphasis in our scheme is on quantitative measures that allow for accurate characterization of surface texture. The fractal dimension is a quantitative measure of surface roughness, and we have developed an algorithm to automatically measure this. We also present an algorithm to compute the orientation field of a given texture. This algorithm can be used to characterize defects such as 'orange peel'. Furthermore, we have used the qualitative theory of differential equations to devise a symbol set for oriented textures in terms of singularities. An algorithm has been devised to process an image of a defect and extract qualitative descriptions based on this theory. We present the results of applying our algorithms to representative defects that arise in semiconductor wafer processing.