Large video databases, as well as many other applications involving video, such as multimedia, training, and `movie-on-demand' systems, require efficient steps to manipulate the enormous amount of data associated with full motion video. In this paper the incoming video is systematically and efficiently reduced via a frame selection procedure which takes advantage of the fact that the incoming video is encoded using one of several existing DCT- based standards. The procedure is performed in the frequency domain prior to video decoding. Further refinement in the frame selection step is achieved using a robust metric based upon the color histogram of the selected subset of decoded frames. The procedure is presented in detail and several examples are exhibited.
Visual information systems require a new insertion process. Prior to storage within the database, the system must first identify the desired objects (shots and episodes), and then calculate a descriptive representation of these objects. This paper discusses the steps in the insertion process, and some of the tools we have developed to semi-automatically segment the data into domain objects which are meaningful to the user. Image processing routines are necessary to derive features of the video frames. Models are required to represent the desired domain, and similarity measures must compare the models to the derived features.
With the progression of multimedia technology and the trend of visualization of man-machine interface, video data will become a kind of fundamental resource of data as general as text and graphics. When the data size is large, retrieving data will become a time-consuming process. Much research has been done to facilitate the retrieval of data, e.g., indexes in a database. Similarly, the video data also encounter the same problem. It is very difficult to generate video indexes automatically, because the index is application-dependent and only the user knows what indexes he actually needs. In this paper a mechanism for video data indexing is proposed which is based on the concept of objects and object motion with interactive annotation. This mechanisms provides an efficient method to access video substance without browsing the whole video data. A motion representation for the track of a moving object is presented. The access methods of video queries are introduced. A prototype of the video indexing system is implemented.
Autosophy, an emerging new science, explains `self-assembling structures,' such as crystals or living trees, in mathematical terms. This research provides a new mathematical theory of `learning' and a new `information theory' which permits the growing of self-assembling data network in a computer memory similar to the growing of `data crystals' or `data trees' without data processing or programming. Autosophy databases are educated very much like a human child to organize their own internal data storage. Input patterns, such as written questions or images, are converted to points in a mathematical omni dimensional hyperspace. The input patterns are then associated with output patterns, such as written answers or images. Omni dimensional information storage will result in enormous data compression because each pattern fragment is only stored once. Pattern recognition in the text or image files is greatly simplified by the peculiar omni dimensional storage method. Video databases will absorb input images from a TV camera and associate them with textual information. The `black box' operations are totally self-aligning where the input data will determine their own hyperspace storage locations. Self-aligning autosophy databases may lead to a new generation of brain-like devices.
The DE-1 satellite has gathered over 500,000 images of the Earth's aurora. This data set provides a realistic testbed for developing algorithms for scientific image databases. Scientists studying the aurora currently need to browse through large numbers of images to find events suitable for further scientific studies. They select or reject an image based on a variety of visual queues, including shape, size, and intensity. This paper describes a system currently under development for selecting interesting events based on image content. We use boundaries from the images to outline the aurora, and then to extract features that relate to shape, size, and intensity. These features are then input into a supervised decision tree classifier. The system retrieves images of potential interest to the user. The user makes the final decision regarding the use of the images retrieved. The algorithm is applied to hundreds of DE-1 satellite images to find `quiet' versus `active' auroras, after being initially trained by the user. The system's advantage over neural networks is that the scientists may inspect the event selection process by studying the decision tree generated.
An attributed relational graph (ARG) is introduced into our NOAA satellite image database system. The node and the branch of an ARG denotes a classified region and a spatial relationship between adjacent regions, respectively. Furthermore, a few attributes of a node/branch help to express numerical shape features of regions. Similarity retrieval thereby turns out to be equivalent to graph matching. The similarity retrieval process of the system is as follows: (1) select a visual example image as a query and generate its graph structure, (2) calculate an optimal graph matching cost between a query graph and an archived graph in the database, utilizing algorithm A* with heuristic information, (3) choose archived images in the ascending order of a corresponding matching cost.
In this paper, a method of image information retrieval is presented. The method employs a new similarity measure between graph representations of images. The measure is effective for drawing images that describe logical meaning by their structure. Most of the currently available image database systems offer retrieval functions called key word retrieval, where users specify key words such as titles, attributes, and categories of themes. But it is not easy for the users to select suitable key words according to the purpose of retrieval. So recently some retrieval functions called similarity retrieval have been proposed, where users specify key images by means of examples, sketches, and icons. We are developing a drawing image database system that stores plant diagrams. The system scans, recognizes, and stores plant diagrams. Then users can refer to any parts of the diagrams according to their needs. The system is used as a help to plant observation and control. To realize similarity retrieval for logically structured drawings like plant diagrams, we introduced a graph representation of drawings, which is suitable to deal with their logical structure. Then we defined a similarity measure between them. In this paper, effectiveness of the similarity measure and applicability to plant diagrams are discussed and some experimental results are shown.
We address some of the problems of accessing database images which do not contain any indexing information and investigate methods of automating search strategies which currently rely on human operators to match the target against a number of images in the database. Such problems might include the extraction of facial photographs from a library given a suspect or the registration of new trademarks whose uniqueness must be assured. The object of the retrieval mechanism is to narrow down the search space for final perusal by the human operator. We present a neural network based coding scheme to retrieve images from a database according to the degree of similarity with a target image. The code represents each image with respect to a set of feature archetypes learned by the neural network during a training phase. We introduce a novel neural network learning law which performs an extremely efficient implementation of principal component analysis and maximizes the amount of information conveyed by the code. We present results using a database of machine printed fonts and discuss how the image size, the database diversity, and code length affect the efficacy of the retrieval mechanism.
The general problem of object recognition is difficult and often requires a large amount of computing resources, even for locating an object within a single image. How, then, can it be possible to build a tool for indexing into a large database of, say, thousands of images, which works effectively in `interactive time' on affordable hardware? One important optimization is to take advantage of interaction with the user to find out what types of variation are expected in the database, and to rely on the user to discriminate between similar-looking objects. Another is to create appropriate data structures off-line to speed on-line searches. We are building a tool, called FINDIT, for locating the image of an object from within a large number of images of scenes which may contain the object. The user outlines an object in an image that he wants to find in the database, and specifies the constraints on the transformations of the object that are expected to occur. The program acts as a filter to quickly reduce the possible number of candidates to a number small enough to be perused by the user. FINDIT chooses an appropriate search algorithm depending on the selection of constraints by the user.
We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.
Professionals in various fields such as medical imaging, biology, and civil engineering require rapid access to huge amounts of uncompressed pixmap image data. In order to fulfill these requirements, a parallel image server architecture is proposed, based on arrays of intelligent disk nodes, each disk node being composed of one processor and one disk. Pixmap image data is partitioned into rectangular extents, whose size and distribution among disk nodes minimize overall image access times. Disk node processors are responsible for maintaining both the data structure associated with their image file extents and an extent cache offering fast access to recently used data. Disk node processors may also be used for applying image processing operations to locally retrieved image parts. This contribution introduces the concept of an image oriented file system, where the file system is aware of image size, extent size, and extent distribution. Such an image oriented file system provides a natural way of combining parallel disk accesses and processing operations. The performance of the proposed multiprocessor-multidisk architecture is bounded either by communication throughput or by disk access speed. However, when disk accesses are combined with low-level local processing operations such as image size reduction (zooming), close to linear speedup factors can be obtained by increasing the number of intelligent disk nodes.
Advanced visual information retrieval systems supporting both video and images need to have flexible system design so that their system configurations can easily be enhanced. It is therefore desirable to separate the features of a central system into three parts: storage servers, communication servers, and a back-end network that combines these. In this architecture, unscheduled arrivals of data blocks at a back-end network cause two problems: unacceptable fluctuation of video frames and overly long delays of image transfer. To solve these problems, we have designed a new multimedia integrated switching system (MISS) that uses a fully connected crossbar switch to combine servers. MISS treats a time interval of a few hundred microseconds (called a `time-slot') as the basic unit of data block transfer, and allocates appropriate time-slots to all transfer requests in order to simultaneously meet the requirements for each kind of visual information transfer. According to simulation results and estimates based on queuing theory, MISS greatly reduces video frame fluctuation and halves the average image transfer delay. These effects have been confirmed in an experimental visual communication system built around MISS. This system supports JPEG compressed video and images, and six terminals can simultaneously retrieve visual information through an FDDI network.
In this paper, a top-down data placement methodology for a large intertive muliimedia information system (MMIS) on a single spindle multi-disk environment such as a Jukebox is presented. The objective of this work is to minimize aveiage disk seek time as well as the number of platter switehes fcw Jukebox. A large data placement problem can be divided into a number of small data placement problems by weighted graph decomposition. The Kernighan-Lin partitioning algorithm is recursively applied for this jiirpoac. Once the graph is fully partitioned, the objects in the same subgraph are assigned to the same disk. The data placement within a disk is divided into two stages, global data placement and detailed data placement. The expected access patterns of global data placement are modeled as a time-homogeneous ergodic Markov Chain, from which the stationary probability for each node of the browsing graph can be found. Based on these probabilities, we define an expected access cost Then, the problem of global data placement is posed as an optimization problem, and various clustering and storage layout algxithms are proposed.
In past decades, many storage schemes for large images on parallel computers have been proposed to provide simultaneous access to various subsets of the pixels. The existing storage schemes have the following limitations: (1) The address generation mechanism is dependent on the size of the image to be processed. (2) Many schemes have limitations on the machine size and image size (N X N, such as N must be an even power of 2). (3) As more and more frequently used data patterns have been recognized, most schemes can only provide parallel access to a limited range of data patterns. (4) The data alignment (connecting each memory module to a proper processor) may require special hardware. In this study, we investigate the combination of several storage schemes. They mainly employ exclusive-or operations for address generation which can be completed in constant time. The address generation mechanism is independent of the image size so that different sized images can be processed efficiently on a fixed-size machine. The system uses N memory modules where N is any (even or odd) power of two. With schemes combined together, this system covers more data patterns than any single scheme yet proposed.
The realization of image/video database requires a specific design for both database structures and mass storage management. This issue has addressed the project of the digital image/video database system that has been designed at IBM SEMEA Scientific & Technical Solution Center. Proper database structures have been defined to catalog image/video coding technique with the related parameters, and the description of image/video contents. User workstations and servers are distributed along a local area network. Image/video files are not managed directly by the DBMS server. Because of their wide size, they are stored outside the database on network devices. The database contains the pointers to the image/video files and the description of the storage devices. The system can use different kinds of storage media, organized in a hierarchical structure. Three levels of functions are available to manage the storage resources. The functions of the lower level provide media management. They allow it to catalog devices and to modify device status and device network location. The medium level manages image/video files on a physical basis. It manages file migration between high capacity media and low access time media. The functions of the upper level work on image/video file on a logical basis, as they archive, move and copy image/video data selected by user defined queries. These functions are used to support the implementation of a storage management strategy. The database information about characteristics of both storage devices and coding techniques are used by the third level functions to fit delivery/visualization requirements and to reduce archiving costs.
Content-based retrieval is founded on neural networks, this technology allows automatic filing of images and a wide range of possible queries of the resulting database. This is in contrast to methods such as entering SQL keys manually for each image as it is filed and later correctly re-entering those keys to retrieve the same image. An SQL-based approach does not take into account information that is hard to describe with text, such as sounds and images. Neural networks can be trained to translate `noisy' or chaotic image data into simpler, more reliable feature sets. By converting the images into the level of abstraction necessary for symbolic processing, standard database indexing methods can then be applied, or used in layers of associative database neural networks directly.
This paper describes various image database archiving solutions implemented by high volume semiconductor manufacturers. The solutions address the primary need of the manufacturing environment to have pictorial information and associated data instantly available throughout the manufacturing process. The problems explored in this paper include the interfacing of various devices that produce data, management of the large space requirements that image storage requires, and handling the mixing of different image formats such as NTSC and PAL. The discussion covers the collection and archiving of information from a variety of input devices such as analog video cameras, digital cameras, and scanning electron microscopes (SEM). The discussion also includes the problems and solutions of establishing a common format to allow one system to collect, merge, store, and recover a variety of pictorial information and data in one video medium. The solutions include a mix of analog and digital technology. Arguments are presented in favor of archiving images in their native format, analog or digital, and the advantages and disadvantages of each of the different formats.
In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.
Carboniferous Foraminifers are a specific type of microfossil which are manifest in plane sections of rock and are used by geologists for dating rock samples. The images contain a high degree of visual noise and currently must be interpreted by human experts. We are studying the classification problem in the context of intelligent image databases. Here we present a technique for automatic identification of microfossil structures and for classification of the structures according to which type of 3-D section they represent. This is achieved by using: (1) A specialized filter to detect local curves in the gray level image data; and (2) Hough transform processing of the resulting feature point vectors. An interesting aspect of our approach is that the processing of the features is not embedded in a program but is instead specified using a visual query language. This allows us to experiment quickly with different types of grouping criteria. The detection performance of our system is comparable with that of a trained geologist. We store the information obtained in a database together with the raw image data. The system can then present the user with only those images which contain structures of interest.
One of the most important technologies needed across many traditional and emerging applications is the management of visual information. Every day we are bombarded with information presented in the form of images. So important are images in our world of information technology, that we generate literally millions of images every day, and this number keeps escalating with advances in imaging, visualization, video, and computing technologies. Advances in video technology and its marriage with computing are resulting in the video-computing discipline. High Performance Computing and Communications (HPCC ) is emerging as a key technology for asserting international leadership in industrial, medical, scientific, defense, and environmental areas. Improved computational methods and information management tools are critical in order to enhance the national competitive edge across broad sectors of the economy. The Federal HPCC initiative will address the development of technologies which are essential for building the infrastructure to strengthen our position in meeting the challenges posed by the global developments in industry, political situations, and the environment areas. It would be impossible to cope with this explosion of image information, unless the images were organized for rapid retrieval on demand. A similar situation occurred in the past for numeric and other structured data, and led to the creation of computerized database management systems. In these systems, large amounts of data are organized into fields and important or key fields are used to index the databases making search very efficient. These information management systems have changed several aspects of the modern society. These systems, however, are limited by the fact that they work well only with numeric data and short alpha-numeric strings. Since so much information is in non-alphanumeric form (such as images, video, speech), to deal with such information, researchers started exploring the design and implementation of image databases. But creation of mere image repositories is of little value unless there are methods for fast retrieval of images based on their content, ideally with an efficiency that we find in today's databases. We should be able to search image databases with image-based queries, in addition to alphanumeric queries. The fundamental problem is that images, video, and other similar data differ from numeric data and text in format, and hence they require a totally different technique of organization, indexing, and query processing. We need to consider the issues in visual information management , rather than simply extending the existing database technology to deal with images. We must treat images as one of the central sources of information rather than as an appendix to the main database. A few researchers have addressed problems in image databases. Most of these efforts in image databases, however, focussed either on only a small aspect of the problem, such as data structures or pictorial queries, or on a very narrow application, such as databases for pottery articles of a particular tribe. Other researchers have developed image processing shells which use several images. Clearly, visual information management systems encompass not only databases, but aspects of image processing and image understanding, very sophisticated interfaces, knowledge-based systems, compression and decompression of images. Moreover, memory management and organization issues start becoming much more serious than in the largest alphanumeric databases. In failing to address any of these topics, one may either address only theoretical issues, or may work in a microcosm that will, at best, be extremely narrow in its utility and extensibility. It is clear that the tremendous progress in processing speed and memory technology has made it not only possible, but also attractive to design Visual Information Management Systems (VIMS) for many disparate applications. People already call the 90s the decade of imaging. On considering any of the Grand Challenge problems, such as weather forecasting, air pollution, the earth's biosphere, genome research, or the education network, it becomes clear that the existing database technology must be extended in several new dimensions, from managing tertiary memory to representing an object at varying degrees of detail. Many of the current issues in databases, such as interconnecting heterogeneous databases, are also important to VIMS. Moreover, VIMS have several of their own problems that must be addressed for making progress in challenging industrial and medical applications. Considering the growing need and interest in the organization and retrieval of visual and other non-alphanumeric information, and the insufficient number of academic projects in this area, a workshop on visual information management systems was sponsored Robotics and Machine Inteffigence, and Database and Expert Systems Programs of the National Science Foundation. The aim of the workshop was to bring together active researchers in databases, object- oriented systems, image and signal processing, multi-media, and other related areas to discuss important issues in managing the large amount of visual information that will play a key role in designing information systems of the future. In addition to the researchers in the above and related areas, a few researchers and practitioners interested in applying these systems were also invited.
This paper describes a technique for interactive, computer-assisted boundary tracking from two-dimensional images. Once the boundaries of an object are known, it is possible to extract any number of features to be used for subsequent search or retrieval based on image content. The technique combines manual inputs from the user with machine inputs, generated from edge detection algorithms, for assisting in the extraction of the boundaries. This allows for the quick extraction of boundaries by combining the capabilities of the user and the computer. The user is adept at quickly locating an object of interest and at drawing a very rough outline of the object. The computer is adept at quickly making a large number of calculations that refine the rough outline generated by the user. The performance of the technique was tested using both computer simulations and real images. The performance of the technique degrades gracefully in the presence of noise.
The purpose of our work is to outline objects on images in an interactive environment. We use an improved method based on energy minimizing active contours or `snakes.' Kass et al., proposed a variational technique; Amini used dynamic programming; and Williams and Shah introduced a fast, greedy algorithm. We combine the advantages of the latter two methods in a two-stage algorithm. The first stage is a greedy procedure that provides fast initial convergence. It is enhanced with a cost term that extends over a large number of points to avoid oscillations. The second stage, when accuracy becomes important, uses dynamic programming. This step is accelerated by the use of alternating search neighborhoods and by dropping stable points from the iterations. We have also added several features for user interaction. First, the user can define points of high confidence. Mathematically, this results in an extra cost term and, in that way, the robustness in difficult areas (e.g., noisy edges, sharp corners) is improved. We also give the user the possibility of incremental contour tracking, thus providing feedback on the refinement process. The algorithm has been tested on numerous photographic clip art images and extensive tests on medical images are in progress.