The science and technology of image transmission and display has evolved primarily from the diciplines of computer science, electrical engineering and physics. Thus, it is only natural that techniques and attitudes which have developed are characterized by the style in which people in these areas approach the subject of images and their transmission, display, and processing. For example, one finds many important differences between the methods of television and those of photography. Moreover, an even greater contrast is found when comparing the methods of electrical engineering, physics, television and photography with those now understood to be employed by the human eye and the human visual system. Specifically, let us contrast the method by which television, photography and the human visual system, represent image information. In television (particularly digital imaging), image values are represented by signals analogous to quantities of light. They are called intensities. In photography, on the other hand, the representing quantities are concentrations of silver or dyes. Consequently, due to the natural exponentiating laws governing the interaction of light with these media, they are analogous to the logarithm of quantities of light. They are called densities. The human visual system, on the other hand, (while being generally logarithmically sensitive) moves a large step further away from representation by physical quantities of light and produces at very early stages in its processing highly modified versions of the patterns of light or their logarithms. The natural question then arrises, why should the human visual system try to do this; and since it does, what consequences are implied in terms of the television and photographic presentations normally employed?
The channel capacity required for long distance digital transmission of video signals can be substantially reduced through a system using interframe coding. This is possible because of the amount of frame-to-frame redundancy which exists in a video source output. Through appropriate signal processing it is feasible to send only the changed area of each frame rather than the full frame provided that the previous frame values are retained in memories.
The coding methods suitable for bandwidth compression of multispectral imagery are considered. These methods are compared using various criteria of optimality such as MSE,signal-to-noise ratio, recognition accuracy, as well as computational complexity. Based on these results the candidate methods are reduced to three recommended methods. These are KL-2 Dimensional DPCM, KL-cosine-DPCM and a cluster coding method. The performance of the recommended methods are examined in the presence of a noisy channel and concatenated with a variable rate (Huffman) encoder.
Recent advances in analog semiconductor technology have made possible the direct sensing and processing of television images. By combining a charge transfer device (CTD) imager and a CTD transversal filter, real time image sensing and encoding have been achieved with low power integrated circuits so that digital transmission and bit rate reduction are made possible using differential pulse code modulation (DPCM). Good mean square error performance and freedom from DPCM artifacts are possible in a hybrid intraframe image encoder. The hybrid transform encoder performs a discrete cosine transform (DCT) on each line of the television image as it is scanned. This compacts the variance into low frequency coefficients and the DPCM encodes the corresponding DCT coefficients between successive lines. Computer simulation of this hybrid coding technique has shown good performance on 256 x 256 pixel images at 0.5 bits/pixel and channel bit error rates of 10-2. An experimental system using a low resolution General Electric 100 x 100 charge injection device camera and a Texas Instruments bucket brigade transversal filter as part of the DCT processor has been constructed and provides good low resolution image quality at 1 bit/pixel and bit error rates of 10-3. A high resolution vidicon compatible system is also being constructed.
Charge-coupled devices (CCD's) are analog sampled-data delay lines which can be used to implement many of the filtering functions required for transform encoding in video bandwidth reduction. The CCD transversal filter is a fundamental building block which is particularly cost effective in terms of it's simplicity and versatility. In particular it can be used to perform the Fourier transform of video signals by means of the chirp z-transform (CZT) algorithm. CCD's have performance limitations compared to digital filters, but for applications which fall within their performance ranges, CCD's offer advantages of lower cost, smaller size, lighter weight, lower power and improved reliability over digital filters.
A real-time digital video processor using Hadamard transform techniques to reduce video bandwidth is described. The processor can be programmed with different parameters to investigate various algorithms for bandwidth compression. The processor is also adaptive in that it can select different parameter sets to trade off spatial resolution for temporal resolution in the regions of the picture that are moving. Algorithms used in programming the system are described along with results achieved at various levels of compression. The algorithms relate to spatial compression, temporal compression, and the adaptive selection of parameter sets.
An advanced imaging communication system (AICS) for planetary exploration is presented., The system offers "end-to-end" information rate improvements of 3 to 5 times over existing systems in addition to extensive user flexibility to adapt his rate/fidelity priorities to fit a particular mission. AICS contains two major system elements. The first is a concatenated Reed-Solomon/Viterbi coded channel. This provides a powerful, yet practical, solution to the usual "error vulnerability" problem associated with compressed data. The second major element is an extremely adaptive image data compression algorithm called RM2. The details of this algorithm as presently simulated are discussed in considerable detail. Used in conjunction with the virtually error free performance of the Reed-Solomon/Viterbi channel the stated AICS advantages are obtained.
Laser, acousto-optics and dry silver material technologies were designed into an advanced facsimile system to achieve high recording speeds, high copy resolution, and chemical-free operation. The basic system was designed so that its performance specifications can be modified over a wide range with minor changes required to enable a broad spectrum of applications to be addressed.
An operational data compression system has been developed and implemented for transmission of digitized ATS and ITOS-VHRR satellite video data over the wideband communication link between the Wallops Island, Va. Command and Data Acquisition Station and the National Environmental Satellite Service at Suitland, Md. This system uses mini-computers for the coding and decoding of the data to achieve maximum flexibility together with specially designed interface equipment for greater efficiency. No loss in data quality occurs due to the compression, and, in certain cases, data is transmitted which would be otherwise unavailable due to the limited channel capacity. This paper describes the method of compression, the equipment used, and the compression results attained.
A bandwidth compression system for the transmission of video images from remotely piloted vehicles has been built and demonstrated. Novel features of this system are the use of the Constant Area Quantization (CAQ) technique to obtain spatial bit rate reduction of 6:1 and a rugged and compact scan convertor, based on a core memory, to accommodate temporal frame rate reduction. Based on the ability of the human eye to perceive more detail in high contrast regions than in low, the CAQ method transmits higher resolution in the former areas. The original six-bit digitized video is converted to a three level signal by the quanti-zing circuit and then Huffman - encoded to exploit its statistical properties and reduce it further to one-bit per pixel. These circuits operate on one line of the picture at a time, and can handle information at full video (10 MHz) rate. The compressed information when received on the ground is stored in coded form in a two-frame (500,000 bit) digital core memory. One frame of the memory is filled while the other is being displayed and then the two are interchanged. Decoding and reconstruction of the video are performed between the memory and the display.
This paper describes work performed as an extension of the techniques of transform coding with block quantization. These techniques have been very well defined theoretically with implementation of the predicted optimum results. The purpose of this work was to examine the techniques in practice and to define adaptive coding techniques which could be practically applied to the coding of still and moving pictures.
This paper describes a new technique which jointly applies clustering and source encoding concepts to obtain data compression. The cluster compression technique basically uses clustering to extract features from the measurement data set which are used to describe characteristics of the entire data set. In addition, the features may be used to approximate each individual measurement vector by forming a sequence of scalar numbers which define each measurement vector in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. A description of a practical cluster compression algorithm is given and experimental results are presented to show trade-offs and characteristics of various implementations. Examples are provided which demonstrate the application of cluster compression to multispectral image data of the Earth Resources Technology Satellite.
A dual mode nonlinear interpolative compressor is described which compresses synthetic aperture radar images. The compression algorithm takes advantage of (1) the strong contrast between the target and the background, (2) the small percentage of target area compared with the total SAR image. The algorithm operates in two modes, i.e., an exact data transmission mode and an interpolative mode. The image is first segmented into small sub-blocks. The compressor then selects one of two modes according to the pixel values inside the block. If any pixel inside the block is above a predetermined threshold all pixels in this block are transmitted. On the other hand, if all pixels inside the block are below the threshold only one pixel is transmitted. At the receiver a two dimensional interpolation is then conducted to estimate the entire pixels in the block. Under the stationary assumption of the background information a second order statistic of the SAR image is used to derive the interpolative scheme. The compressor not only offers a high degree of exact target recovery but also offers a good recovery of the background information. Computer simulation results indicate that a good quality reconstruction can be accomplished with 1.5 bits per pixel.
One frequently used image compression method is based on transform coding. In terms of RMS error, the best transform is the Karhunen-Loeve (Principal Components). This method is not generally used due to computational complexity. In this paper we show that under isotropicity conditions the Karhunen-Loeve is almost separable and that an approximate fast principal components transform exists. Our results indicate that the fast K-L is nearly as good as the true K-L and that it yields better results than other discrete transforms such as DLB, SLANT, or Hadamard. The approximations and errors are discussed in terms of the RMS and RMS correlated error.
The numerical algorithm known as singular value decomposition  has been applied to image processing with interesting consequences [2 ]. Initially the algorithm was used to effect a pseudoinverse restoration [3 ] for better object estimation from degraded imagery. However the singular value decomposition (SVD) technique can also be effectively utilized in image compression problems when large compression ratios are desired. This Abstract refers to some experimental work developed at The Aerospace Corporation in which imagery has been compressed using SVD methods. Pictorial and computational results indicate that 8 bit imagery can be compressed to 2 bits with 0. 5% mean square error while the same imagery can be compressed to 1/2 bit with 1. 5% mean square error. These results will be presented along with various algorithms which make use of the SVD domain of an image for subsequent image compaction.
There are three possibilities for reducing the bandwidth of a Television signal: (1) reduce the temporal resolution by transmitting less than 30 frames per second; (2) reduce the spatial resolution by coding fewer than all of the picture elements in the frames transmitted; (3) reduce the spatial redundancy in the remaining picture elements.
A method for optimally reducing the error incurred during the quantization operation in DPCM image coding is presented. The method is optimum with respect to a mean-square error criterion and is applied a posteriori. The technique is based on knowledge of the multidimensional probability density function of the DPCM difference signal. The quantizer operates on this difference signal to achieve a data compression. The coarser the quantization, the greater the compression, but also the greater the degradation of the encoded image. To minimize the degradation, a reconstruction should utilize all of the knowledge that is available about the difference samples, such as their distribution, the quantization levels, and any correlation which remains after the differencing operation. An estimation equation which embodies this information is derived and solved. The solution is applied to images which have been either DPCM or deltamodulation encoded. The resultant images have lower mean-square error and exhibit an improvement in subjective quality.
Interframe coding of television images encompasses techniques which make use of correlations between pixel amplitudes in successive frames. Intraframe coding techniques that exploit spatial correlations can, in principle, be extended to include correlations in the temporal domain. In this paper, successive frames of digital images are coded using two-dimensional spatial transforms combined with DPCM in the temporal domain. Specific transform techniques investigated are the two-dimensional cosine and Fourier transforms. Due to DPCM encoding in the temporal domain, the hybrid transform/DPCM encoders require storage of only the single previous frame of data. Hardware implementation of the Fourier transform involves manipulation of complex numbers where the cosine transform does not. However, the Fourier transform is attractive because frame-to-frame motion compensation can be introduced directly in the phase plane by application of appropriate phase correction factors. Results are presented in terms of coding efficiency, storage requirements, computational complexity, and sensitivity to channel noise.
Image processing by computer or otherwise requires cost-effective means of scanning and recording high quality images and of interfacing the imaging devices with computers and communication channels. Based on our previous work in newspaper facsimile and computer editing of news photos, we have developed a large-format high-resolution system, interfaced to a PDP-11 processor as well as the Bell DDS Communications channel which operates at 56 kB/s. One configuration, suitable for medical X-rays, features variable resolution up to 250 pels/inch over 14 inches and 1000 pels/inch over 4 inches and a 256 step density scale. Dry silver film with on-line development produces ready-to-use output of high diagnostic usefulness. The technology used includes low-power internally modulated HeNe lasers, pre-objective scanning with a galvanometer-driven mirror, a flat-field paper/film drive, and silicon photodetectors. This work was supported by NIGMS Grant 5 P01-GM19428-03S1 and by USPHS Grant GM018674 (through the Departments of Radiology of Peter Bent Brigham Hospital and Beth Israel Hospital).
Aerial photoreconnaissance with optoelectronic sensors generates video data at formidable rates. All facets of the image data handling and assimilation problem are brought together in presentation of a viable means for its solution. A real-time image data acquisition/retrieval (RIDAR) system is defined and described. System architecture is that of a ground-based data hub which serves operational requirements of reconnaissance and surveillance activities. The RIDAR system functions to enable human beings to screen wideband video signals continuously incoming from high-resolution large-format sensors of remote survey stations as they sweep rapidly over scenes of interest. A basic system feature is its immediate display of a widefield index picture which shifts through the observation field synchronously with sensor transit of the object scene. Pointer designation of an image detail selects its subfield for magnified display and/or speed-print of a high-resolution document. Capture of incoming image data in permanent mass storage permits subsequent playback and selective recall through the same display and document generator system. The proposed RIDAR system can be extended toward a more universal image data processor by incorporating optical and electronic means for parallel computation over two- and three-dimensional fields.
The mean square error (MSE) is a classical measure of image distortion. However, this metric is generally not a faithful indication of subjective image quality. We attempt to correct this deficiency by addressing the individual error sources in transform image coding. Specifically, the component of the MSE introduced by transform coefficient deletion is separated from requantization effects. Results are demonstrated in both numerical and pictorial form.
Several different hardware configurations for implementing the Hadamard transform in real-time are discussed. The discussion is referenced to a 64 point (8 x 8) transform that is used in image coding for bandwidth reduction. A successive implementation of two 8 point transformations with intermediate random access memory using the straight forward N2 computation is compared to implementations of the Fast Hadamard Transform (FHT). The speed of computing transform coefficients is compared to the hardware requirements of the various techniques. The relationship between computational speed and the resolution and frame rate of television imagery that may be transform encoded using the different methods is illustrated. It is shown that for picture element (pixel) rates up to 5 MHz the N2 computation is the more economical to implement.
A new medical training instrument has been developed, which provides a means whereby microsurgery procedures performed using an optical microscope can be viewed by any number of medical students. This is accomplished by picking-up a television picture from the operating microscope and displaying the picture so the operation can be seen "live" on TV monitors at remote locations. Also the picture can be recorded on video tape for later playback for medical teaching purposes. This instrument has the principal advantage over other teaching means of microsurgery in that it can provide a three dimensional color picture so that the student sees the same view the surgeon saw as he performed the operation.
In an age when man is inundated with information, his natural ability to selectively assimilate the data presented to him has become an indispensible tool for survival. By the same token, man's visual perception limitations have been used to reduce the amount of data needed to reproduce pictorial information designed for his consumption. A comparative study of two divergent approaches to the problem of providing an optimum amount of video information to a human viewer is discussed. Video communication systems exemplify one area where the approach consists of reducing the data from real world scenes by video data compression algorithms. The opposite approach is found in visual simulators where scenes are constructed synthetically to approach real world realism by adding cues to the basic structure of the digital image representation used. Such simulators are used in groundbased trainers designed to reduce the cost of training operators of expensive equipment. In both situations there is a need to provide realistic video to a human observer. In the quest for optimum pictorial information transmission, simulated scenes are shown to provide some rather unusual, hitherto unexplored, insights and alternatives.
This paper examines the three dimensional perception of human subjects who observe two 2 dimensional geometric patterns at brief exposure times and time intervals without the aid of a stereoscope. In general, presenting two 2 dimensional geometric patterns at the proper spacing, exposure times, and time intervals produces two dimensional movement perception in human observers. The observer sees a set of patterns moving in a lateral direction, from its first location to its second. The lateral two dimensional movement perception occurs over a wide range of exposure times and time intervals of stimulus patterns. Some geometric patterns presented at certain limited exposure times and time intervals, however, produce three dimensional patterns. The present research investigates the range of the conditions necessary for producing three dimensional pattern recognitions while presenting two dimensional patterns. The research may provide an understanding of human pattern recognition processes and aid in improving man-machine systems.