Managers of photographic collections in libraries and archives are exploring digital image database systems, but they usually have few sources of technical guidance and analysis available. Correctly digitizing photographs puts high demands on the imaging system and the human operators involved in the task. Pictures are very dense with information, requiring high-quality scanning procedures. In order to provide advice to libraries and archives seeking to digitize photographic collections, it is necessary to thoroughly understand the nature of the various originals and the purposes for digitization. Only with this understanding is it possible to choose adequate image quality for the digitization process. The higher the quality, the more expertise, time, and cost is likely to be involved in generating and delivering the image. Despite all the possibilities for endless copying, distributing, and manipulating of digital images, image quality choices made when the files are first created have the same 'finality' that they have in conventional photography. They will have a profound effect on project cost, the value of the final project to researchers, and the usefulness of the images as preservation surrogates. Image quality requirements therefore have to be established carefully before a digitization project starts.
To obtain an extra high quality imaging system in electrical imaging, new important physical factors and physical characters are considered and partly tested in addition to traditional characteristics such as frequency characteristics, quantization errors and so on. We have discussed the problem systematically based on the Uniform Lightness-Chromaticness Scale System and picture quality which is derived from the discussion of psychophysics.
We have been studying a medical tele-education support system by an individual tutoring system, called CALAT, and a super high definition (SHD) image processing system, called SuperFM-III. Now, we are in a trial operation to use the SuperFM-III for a super high definition image control viewer on the CALAT client side, and have created the courseware of the pathological images. In this paper, we show the concept and the implementation of this system.
This paper describes VIPS (VASARI Image Processing System), an image processing system developed by the authors in the course of the EU-funded projects VASARI (1989-1992) and MARC (1992-1995). VIPS implements a fully demand-driven dataflow image IO (input- output) system. Evaluation of library functions is delayed for as long as possible. When evaluation does occur, all delayed operations evaluate together in a pipeline, requiring no space for storing intermediate images and no unnecessary disc IO. If more than one CPU is available, then VIPS operations will automatically evaluate in parallel, giving an approximately linear speed-up. The evaluation system can be controlled by the application programmer. We have implemented a user-interface for the VIPS library which uses expose events in an X window rather than disc output to drive evaluation. This makes it possible, for example, for the user to rotate an 800 MByte image by 12 degrees and immediately scroll around the result.
We have developed a 60 frame/sec 2 K multiplied by 2 K progressive-scan color camera system. Two new key technologies have been applied in the design process. One is high-data- rate imager operation technology in which each of four charge modulation device (CMD) chips is driven at 167 M pixel/sec in progressive scan mode. One chip consists of 1920 (H) multiplied by 1035 (V) pixels. The other technology is the four-imager pickup method in which two CMD imagers are used for green and the other two for red and blue. Spatial offset imaging is applied in the vertical direction to the two green imagers so that the equivalent number of vertical lines reaches 2070, twice that of one CMD. The above technologies enable the construction of a very-high-resolution camera with a data rate of 334 M pixel/sec and a vertical limiting resolution on a color monitor of more than 1500 lines.
The progress of computer graphics and display technology has led us to always obtain an advanced visual image. However, we now feel the limit of the color reproduction (by the present 24 bits/pixel quantization, R, G, B, 8 bits respectively,) when pursuing a higher image quality. Therefore, we are developing an 'extra high quality imaging system' of 36 bit/pixel quantization (R, G, B, 12 bits, respectively.) This system comprises a MO disk drive, a controlling computer, a frame buffer and two 21' displays. The 2048 multiplied by 2048 pixel (36 bits/pixel) image data are read from the MO disk drive, and are sent to the frame buffer. A deliberately constructed 16 M byte frame buffer outputs the 36 bits/pixel video signal at a 200 MHz clock rate. Two displays, using a shadow-mask type CRT, are driven at a 78.7 kHz horizontal frequency. The system outputs the 36 bits/pixel and the 24 bits/pixel video signals concurrently, which makes it possible to compare the image quality of a 36 bits/pixel system with that of a 24 bits/pixel system. Many characteristics and physical factors, including noise, which do not cause a serious problem in conventional 24 bits/pixel systems, have a much more serious effect on the 36 bits/pixel system. We have now obtained the performance of color depth of up to 33 bits/pixel.
In this paper, we propose a novel image sensor which compresses image signal on the sensor plane. Since image signal is compressed on the sensor plane by making use of the parallel nature of image signals, the amount of signal read out from the sensor can be significantly reduced. Thus, the proposed sensor can be potentially applied to high pixel rate cameras and processing systems which require very high speed imaging or very high resolution real time imaging. The very high bandwidth is the fundamental limitation to the feasibility of those high pixel rate sensors and processing systems. Conditional replenishment is employed for the compression algorithm. In each pixel, current pixel value is compared to that in the last replenished frame. The value and the address of the pixel is extracted and coded if the magnitude of the difference is greater than a threshold. Analog circuits have been designed both for processing in each pixel and for controlling entire data rate. A first prototype of a VLSI chip has been fabricated. Some results of experiments obtained by using the first prototype are shown in this paper.
Current proposals for advanced television for the United States are based upon the premise that temporal and resolution layering are inefficient. These proposals therefore only provide a menu of individual formats from which to select, but each format only encodes and decodes a single resolution and frame rate. In addition, it is being suggested by some people that interlace is required, due to their claimed need to have one thousand lines at high frame rates, but based upon the notion that such images cannot be compressed within the available 18 mbits/second. This paper discusses an approach to image compression which demonstrably achieves thousand line image compression at high frame rates with high quality. It also achieves both temporal and spatial scalability at this resolution at high frame rates within the available 18 mbits/second. This technique efficiently encodes 2 megapixel images at 72 frames per second, achieving over twice the compression ratio being proposed by ACATS for advanced television. Further, this proposed technique is more robust than the current unlayered ACATS format proposal for advanced television, since all of the bits may be allocated to the lower resolution base layer when stressful image material is encountered. Thus, a number of key technical attributes are provided by this proposal, allowing substantial improvement over the ACATS proposal. These improvements include: the replacement of numerous resolutions and frame rates with a single layered resolution and frame rate; no need for interlace in order to achieve a thousand lines of two million pixels at high frame rates; and compatibility with computer displays through the use of 72 frames per second.
In the past decade various museums and galleries around Europe have been developing digital imaging as a tool for archiving and analysis. Accurate digital images can replace the conventional film archives which are not stable or accurate but are the standard record of art. The digital archives open up new research possibilities as well as become resources for CD- ROM production, damage analysis, research and publishing. In the VASARI project new scanners were devised to produce colorimetric images directly from paintings using multispectral (six band) imaging. These can produce images in CIE Lab format with resolutions over 10 k multiplied by 10 k and have been installed in London, England; Munich, Germany; and Florence, Italy. They are based around a large stepper-motor controlled scanner moving a high resolution CCD camera to obtain patches of 3 k multiplied by 2 k pels which are mosaiced together. The scanners can also be used for infra-red imaging with a different camera. The MARC project produced a portable scan-back, RGB camera capable of similar output and techniques for calibrated printing. The Narcisse project produced a fast high resolution scanner for X-radiographs and film and many projects have worked on networking the growing number of image databases. This paper presents a survey of some key European projects, particularly those funded by the European Union, involved in high resolution and colorimetric imaging of art. The design of the new scanners and examples of the applications of these images are presented.
The distortion caused by image compression is typically determined by a pixelwise measure such as mean square error. However, these types of measures do not consider global artifacts, like blockiness or blurring. In the present paper an attempt is made to model the distortion by a blockwise measure. A three-parameter distortion function is proposed that measures how well the average intensity, the contrast, and the spatial structure are preserved in the reconstructed image. The distortion in the spatial structure is described by the response of an edge detection operation. The blockwise measure operates with a scrolling block of 3 by 3 pixels to be able to detect distortion beyond the block boundaries.
In the present work we study image quality evaluation from an empirical point of view and want to find out (1) what kind of distortion existing compression algorithms generate; (2) how annoying a human observer finds certain types of distortion; (3) what are the possibilities of mathematically analyzing the distortion. A set of test images was compressed by several different compression algorithms and the visual quality of compressed images was evaluated by a group of humans. The distortion caused by compression can be classified to blockiness, blurring, jaggedness, ghost figures, and quantization errors. Blockiness is the most annoying type of distortion. For the other types the amount is more important than the type of the distortion. Blurring and quantization errors can be detected by local analysis of the image. Blockiness, jaggedness and ghost figures are harder to detect since they are composed of structural changes in the image.
In image coding applications, quantitative quality metrics which approximate the perceived quality of an image can be used for evaluating the performance of coders, designing new coders and optimizing existing coders. In this paper, we consider images of high quality levels for which most of the errors are in the threshold region of perception, and perceptible errors are sparsely distributed. Two different methodologies are used; first, we use an objective picture quality scale (PQS) which combines partial measures, denoted distortion factors, of random as well as structured perceived distortions due to coding. We also consider an alternative approach, applicable at high quality, that is based on a multi-channel vision model, the visible differences predictor (VDP) proposed by Scott Daly. The VDP produces a probability detection map identifying regions in which errors are sub-threshold, threshold and supra-threshold. For the Phe PQS, since distortion factors due to structured errors and errors in the vicinity of high contrast image transitions are most important at high quality, these two factors are analyzed to compute spatial distortion maps of their contributions. This paper compares the spatial distortion maps produced by both methods for high quality still images to experimental observations. Global values for the quality have been obtained by integrating the factor images obtained using PQS and yield a correlation of 0.9 with mean opinion of score values for JPEG, subband and wavelet coded images.
The main media in multimedia communication are moving images, and the development of the objective quality evaluation method of the color moving image is strongly hoped for. In this research, as the preparatory steps for the development of the objective quality evaluation method, the subjective evaluation experiment was done using the semantic differential method, to clarify factors of the subjective quality evaluation in which the moving image was encoded using the MC plus DCT. These factors are the basic components which evaluate objectively the picture quality. Moreover, the difference of the evaluation factors was analyzed, it compared with the result of the subjective evaluation experiment of the intra-frame coding and the inter-frame coding. Next, the subjective evaluation experiment was done by the EBU method and the relation between the subjective evaluation factors which is derived from the SD method and the scale of the quality degradation (MOS) was investigated.
Applications of super-high-definition (SHD) television are likely to require fast coding and decoding of large, full-color, multimedia images, typically with stills and videos embedded in graphical surrounds. Binary tree predictive coding (BTPC) efficiently codes both photographs and graphics, and is therefore suitable as a general-purpose multimedia compressor. It codes successive 'bands' of error difference signals which progressively refine image predictions. This paper reports the design of a new predictor for BTPC which improves performance on graphics without significantly affecting the coding of photographs. At each point the predictor uses local luminance surface shape to choose a combination of surrounding points to form the prediction. Shapes which are suggestive of graphics are handled differently than those suggestive of photographs. The new BTPC approach is compared with the JPEG standard and with a state-of-the-art embedded zerotree wavelet coder. Though falling between the standard and the state-of-the-art for compressing pure photographic imagery, BTPC is superior to both for multimedia image coding. It is also much faster than the embedded coder.
Applications that demand very high resolution imagery are well suited to multi-resolution, progressive image compression. Currently, applying embedded zerotree coding to wavelet decomposition is popular. Though state-of-the-art, this approach is prone to visual artifacts caused by an uneven introduction of detail. Maintaining perceptually acceptable results at every bit-rate in the lossy-to-lossless progression requires a reordering of wavelet information. This paper discuses some modifications and generalizations to zerotree coding. The approach is to rearrange image information according to how humans perceive quality.
Existing image coding systems can be categorized into lossy and lossless coding methods. Nearly lossless compression, which attains both higher compression performance than lossless compression and higher image quality than lossy compression, is required in the applications such as medicine or printing. This paper proposes a nearly lossless coding method in which distortion is restricted to within a predetermined range. RGB input signals are transformed into one luminance and two chrominance components, and the magnitudes of these luminance and chrominance components are adjusted in the encoder so that the reconstructed RGB signals have minimum distortion after inverse color transformation in the decoder. The prediction residual signals of these level-adjusted components are entropy coded. As a result, the reconstructed RGB signals have coding distortion levels of only plus one or minus one, and this restricted distortion never accumulates even if encoder/decoder couples are connected in tandem because all encoders and decoders operate in synchronization.
The encoding of images at high quality is important in a number of applications. We have developed an approach to coding that produces no visible degradation and that we denote as perceptually transparent. Such a technique achieves a modest compression, but still significantly higher than error free codes. Maintaining image quality is not important in the early stages of a progressive scheme, when only a reduced resolution preview is needed. In this paper, we describe a new method for the progressive transmission of high quality still images, that efficiently uses the lower resolution images in the encoding process. Analysis based interpolation is used to estimate the higher resolution image, and reduces the incremental information transmitted at each step. This methodology for high quality image compression is also aimed at obtaining a compressed image of higher perceived quality than the original.
Coding techniques, such as JPEG and MPEG, result in visibly degraded images at high compression. The coding artifacts are strongly correlated with image features and result in objectionable structured errors. Among structured errors, the reduction of the end of block effect in JPEG encoding has received recent attention, with advantage being taken of the known location of block boundaries. However, end of block errors are not apparent in subband or wavelet coded images. Even for JPEG coding, end of block errors are not perceptible for moderate compression, while other artifacts are still quite apparent and disturbing. In previous work, we have shown that the quality of images can be in general improved by analysis based processing and interpolation. In this paper, we present a new approach that addresses the reduction of the end of block errors as well as other visible artifacts that persist at high image quality. We demonstrate that a substantial improvement of image quality is possible by analysis based post-processing.
A motion compensation scheme suitable for super high definition (SHD) video, consisting of hierarchical motion estimation and background motion vector processing, has been proposed. The scheme is capable of compensating motion from a few pixels up to well over one hundred, designed to cope with very large motion (in terms of number of pixels) that is present in SHD video. The background motion processing technique identifies more than one background motion vector and saves a great deal of overhead motion vector information.
The development of the MPEG video coding standard is an important step in the commercial development of digital media, but constrains further performance improvements that may be possible by innovative algorithmic methods. The approach described in this paper keeps the basic structure of the MPEG2 coder, but allows for pre- and post-processing of the video sequence. We have two broad options available. The first one is to perform image processing prior to encoding, so as to improve the compression-quality trade off in the subsequent MPEG2 coder. The second one is to use image processing on the decoded video to improve its quality. This second approach includes the widely recognized desire to reduce end of block impairments of DCT based coders. In this paper, we report some of our work on both pre- and post processing to improve performance. For this work, we focus on the use of data dependent inhomogeneous filtering that preserves the structure and thus quality of images, while reducing unwanted random noise.
The 'Class Library for Image Processing' (CLIP) provides object-oriented programming facilities in a framework that supports user migration from C. CLIP augments the C/C++ built-in types with just three additions: the picture, the integer range, and the value range. Associated with these are overloaded operators for arithmetic on and between the types. Range limiting is implemented as a modification of conventional indexing. Pel-by-pel and block-by-block iterations are conveniently handled within the picture and range objects, and via callbacks. Each type incorporates error handling. The target users for CLIP will accept object-orientation only it if allows them to save on development time, after minimal learning, without compromising program execution speed. CLIP programs are compact, the required knowledge of C++ is elementary, and the class library's public interface is small. Speed is kept high by minimizing the amount of data in temporary objects, reducing the dynamic memory management overhead. Picture- or block-wide operations and callbacks are efficiently supported by ordering iterations to minimize the number of counter and pointer increments.
The extensible WELL (Window-based elaboration language) has been developed using the concept of common platform, where both client and server can communicate with each other with support from a communication manager. This extensible language is based on an object oriented design by introducing constraint processing. Any kind of services including imaging in the extensible language is controlled by the constraints. Interactive functions between client and server are extended by introducing agent functions including a request-respond relation. Necessary service integrations are satisfied with some cooperative processes using constraints. Constraints are treated similarly to data, because the system should have flexibilities in the execution of many kinds of services. The similar control process is defined by using intentional logic. There are two kinds of constraints, temporal and modal constraints. Rendering the constraints, the predicate format as the relation between attribute values can be a warrant for entities' validity as data. As an imaging example, a processing procedure of interaction between multiple objects is shown as an image application for the extensible system. This paper describes how the procedure proceeds in the system, and that how the constraints work for generating moving pictures.
Microsoft's Visual Basic (VB) and Borland's Delphi provide an extremely robust programming environment for delivering multimedia solutions for interactive kiosks, games and titles. Their object oriented use of standard and custom controls enable a user to build extremely powerful applications. A multipurpose, database enabled programming environment that can provide an event driven interface functions as a multimedia kernel. This kernel can provide a variety of authoring solutions (e.g. a timeline based model similar to Macromedia Director or a node authoring model similar to Icon Author). At the heart of the kernel is a set of low level multimedia components providing object oriented interfaces for graphics, audio, video and imaging. Data preparation tools (e.g., layout, palette and Sprite Editors) could be built to manage the media database. The flexible interface for VB allows the construction of an infinite number of user models. The proliferation of these models within a popular, easy to use environment will allow the vast developer segment of 'producer' types to bring their ideas to the market. This is the key to building exciting, content rich multimedia solutions. Microsoft's VB and Borland's Delphi environments combined with multimedia components enable these possibilities.
Images, in a general sense, include natural images, such as photos, medical images and satellite images; artificial images, such as computer graphics, paintings and drawings; and scientific pictures, such as statistics charts and visualization patches. We introduce an object- oriented scheme for image processing in multimedia systems. Images are represented as objects in a hierarchical structure. Several issues are addressed. We discuss the multimedia object composition issue which includes spatial composition and temporal composition. Hierarchical storing, which possesses good properties for image retrieval, display and composition, is presented. We develop a content-based retrieval scheme using shape, texture and color information. A two-stage retrieval method is presented.
This video-on-demand service is constructed of distributed servers, including video servers that supply real-time MPEG-1 video & audio, real-time MPEG-1 encoders, and an application server that supplies additional text information and agents for retrieval. This system has three distinctive features that enable it to provide multi viewpoint access to real-time visual information: (1) The terminal application uses an agent-oriented approach that allows the system to be easily extended. The agents are implemented using a commercial authoring tool plus additional objects that communicate with the video servers by using TCP/IP protocols. (2) The application server manages the agents, automatically processes text information and is able to handle unexpected alterations of the contents. (3) The distributed system has an economical, flexible architecture to store long video streams. The real-time MPEG-1 encoder system is based on multi channel phase-shifting processing. We also describe a practical application of this system, a prototype TV-on-demand service called TVOD. This provides access to broadcast television programs for the previous week.
In this paper, the design and implementation of a multimedia on demand file server on an inexpensive PC is presented. The server can efficiently record multiple media streams in the UNIX file format and playback multiple media streams simultaneously. The objective of the design is to provide high throughput and high predictable response time. The claims are verified by comparing the throughput, data loss rates and deviations of the response time of the proposed file server and the original file system. When multiple user requests of continuous stream access the file system simultaneously, the low level I/O scheduler alone is not sufficient to keep its I/O throughput at the full raw power. A high level access scheduler aiming at maximizing I/O throughput is presented in this paper. The applications of a read/write multimedia file server include video conference recording/playback, multimedia program store-and-forward hub, and multi-level multimedia storage system.
This paper describes the design and performance of an object-oriented communication framework we developed to meet the demands of next-generation distributed electronic medical imaging systems. Our framework combines the flexibility of high-level distributed object computing middleware (like CORBA) with the performance of low-level network programming mechanisms (like sockets). In the paper, we outline the design goals and software architecture of our framework, illustrate the performance of the framework over ATM, and describe how we resolved design challenges we faced when developing an object- oriented communication framework for distributed medical imaging.
The Electronic Visualization Laboratory (EVL) at the University of Illinois at Chicago (UIC) specializes in virtual reality (VR) and scientific visualization research. EVL is a major potential beneficiary of guaranteed latency and bandwidth promised by cell switch networking technology as the current shared Internet lines are inadequate for doing VR-to-VR, VR-to- supercomputer, and VR-to-supercomputer-to-VR research. EVL's computer scientists are working with their colleagues at Argonne National Laboratory (ANL) and the National Center for Supercomputing Applications (NCSA) to develop an infrastructure that enables computational scientists to apply VR, networking and scalable computing to problem solving. ATM and other optical networking schemes usher in a whole new era of sustainable high-seed networking capable of supporting telecollaboration among computational scientists and compute scientists in the immersive visual/multi-sensory domain.