This paper presents an overview of recent work on the velocity filter approach to target detection and track initiation in infrared surveillance systems for strategic defense applications. This approach has particular advantages when the signal-to-noise ratio is very low or when the number of objects to be detected and tracked is extremely high. Furthermore, it can be implemented with a highly-efficient, parallel-processing computer architecture.
Modern cellular automata, especially those configured to process the three dimensions of x,y,t (two spatial coordinates and time) are shown to be useful in furnishing several decibels of processing gain against white Gaussian noise while monitoring millions of potential target tracks at several frames per second for the purpose of detecting weak, subpixel targets. Using three-dimensional architectures employing the multidimensional binary ranking filter permits these operations to be executed effectively. Calculations of processing gain are furnished along with a discussion of the three-dimensional cellular automaton architecture itself.
This paper presents an integrated parallel architecture for image analysis and computer vision. The architecture consists of an SIMD array processor with multiple broadcast busses, an MIMD multiprocessor with orthogonal access busses, and a two-dimensional memory array shared by both processor systems. Low-level image processing is performed on the SIMD array, while interme-diate and high-level image analysis is performed on the multiprocessor system. The interaction between the two levels is supported by a common shared memory. To illustrate the power of such a two-level system we show that a wide class of image and vision analysis tasks can be performed efficiently on the architecture. Representative problems include transitive closure, histogramming image component labeling, pyramid computation, Hough transform, pattern clustering, and scene labeling.
Current image analysis and image understanding applications in DoD systems require very high performance image pixel processing in real time. To attain the necessary performance within stringent system size, weight, and power constraints requires special-purpose parallel processing hardware architectures. At the same time, it is desirable to retain as much programmability as possible in order to rapidly adapt the hardware to new applications or evolving system requirements. The Parallel Recirculating Pipeline processor uses techniques adopted from image algebra and mathematical morphology to provide a low-cost, low-complexity, high-performance architecture that is suitable for silicon implementation and programmable in high-order languages.
We present a scheme for macropipelining in multicomputer systems to achieve high speeds in processing multiple images. Most image processing applications consist of a sequence of tasks - - e.g., preprocessing, detection, segmentation, feature extraction, and classification. This sequence lends itself to a pipelining strategy. To minimize the effects of bottlenecks in this pipeline, we introduce a performance model for data partitioning which includes both the computation and the communication aspects of parallel processing. With the help of this model, we assign the appropriate number of processors to each task so that the workloads are well-balanced. Then we generate a problem graph describing the relationships among tasks and subtasks. We use an estimator of the frame processing time of the image processing system as an objective function for choosing a mapping of the problem graph to a system graph. This estimator takes account of computation times and communication intensities among the subtasks in the problem graph, and accounts for link contentions. To find an efficient mapping, we use a heuristic optimization technique in which possible bottlenecks are given high priority in the mapping procedure. We tested our macropipelining scheme on a typical image processing application in a simulated hypercube computer system. The results support our belief that this scheme yields effective architectures for high-speed processing of long sequences of images.
A new Fourier technique, the Arithmetic Fourier Transform (AFT) was recently developed for signal processing. This approach is based on the number-theoretic method of Mobius inversion. The AFT needs only additions except for a small amount of multiplications by prescribed scale factors. This new algorithm is also well suited to parallel processing. And there is no accumulation of rounding errors in the AFT algorithm. In this paper, the AFT is used to compute the discrete cosine transform and is also extended to 2-D cases for image processing. A 2-D Mobius inversion formula is proved. It is then applied to the computation of Fourier coefficients of a periodic 2-D function. It is shown that the output of an array of delay-line (or transversal) filters is the Mobius transform of the input harmonic terms. The 2-D Fourier coefficients can therefore be obtained through Mobius inversion of the output the filter array.
Recently, Tufts and Sadasiv  exposed a method for computing the coefficients of a Fourier series of a periodic function using the Mobius inversion of series. They called this method of analysis the Arithmetic Fourier Transform(AFT). The advantage of the AFT over the FN 1' is that this method of Fourier analysis needs only addition operations except for multiplications by scale factors at one stage of the computation. The disadvantage of the AFT as they expressed it originally is that it could be used effectively only to compute finite Fourier coefficients of a real even function. To remedy this the AFT developed in  is extended in  to compute the Fourier coefficients of both the even and odd components of a periodic function.
In this paper, the improved AFT  is extended to a two-dimensional(2-D) Arithmetic Fourier Transform for calculating the Fourier Transform of two-dimensional discrete signals. This new algorithm is based on both the number-theoretic method of Mobius inversion of double series and the complex conjugate property of Fourier coefficients. The advantage of this algorithm over the conventional 2-D FFT is that the corner-turning problem needed in a conventional 2-D Discrete Fourier Transform(DFT) can be avoided. Therefore, this new 2-D algorithm is readily suitable for VLSI implementation as a parallel architecture. Comparing the operations of 2-D AFT of a MxM 2-D data array with the conventional 2-D FFT, the number of multiplications is significantly reduced from (2log2M)M2 to (9/4)M2. Hence, this new algorithm is faster than the FFT algorithm. Finally, two simulation results of this new 2-D AFT algorithm for 2-D artificial and real images are given in this paper.
The Arithmetic Fourier Transform is a numbertheoretic method for calculating Fourier coefficients of continuous time signals. In this paper, we present an efficient realization of the arithmetic Fourier transform algorithm on an optical parallel processor consisting of fiber optic tapped delay lines. The performance of the algorithm in presence of optical errors and noise is analyzed. From results of numerical experiments on a simple test signal we observe that the performance of the algorithm is affected minimally by roundoff errors of the optical system.
This paper is a study of block structures for high-throughput digital filtering. These structures achieve high throughput rates by using a large number of processors working in parallel. In this paper, we investigate their finite-precision behavior, and show how finite wordlength errors can make a nominally time-invariant filter into a time-varying system. To avoid this problem, we develop a new high-throughput structure that does not suffer from the time-varying behavior normally induced in block structures as a result of parameter quantization.
In this paper, we examine state space methods for estimating the direction of arrival of plane waves from a linear equispaced sensor array. From a computational point of view, the approach only involves matrix operations and is suitable for systolic/wavefront implementation.
We introduce a new Bayesian/Geometric method for estimating orientation and depth from shade and/or texture. The method is based on a prior distribution on the space of all posible (random) surfaces, and a variant of the annealing algorithm for constraint global optimization problems. The method is also useful for partially classifying image contours.
In this paper we will be concerned with the optimisation of non-linear unconstrained problems where in particular the number of variables is large and also the second derivative of the function is sparse. We also present a simple data structure in ADA, that we used in automatic differentiation to obtain the first and second order derivative information of the function. Finally we present some numerical results obtained for solving large dimension problems using the Truncated Newton method.
Several important problems in diverse application areas such as image restoration, code design, and VLSI design, contain at their core an optimization problem whose solution crucially determines the performance of the resulting engineering system. Standard descent algorithms for such optimization problems, however, typically get trapped in local minima, and fail to reach solutions at or near the global minimum. Motivated by the problems of determining the global minima of optimization problems, the algorithm of simulated annealing for optimization has been proposed. Here we present recent results on the performance of this algorithm in reaching the global minimum of combinatorial optimization problems.
Spotlight mode SAR and ISAR are microwave imaging systems that are used to obtain high resolution reflectivity images by processing data obtained from many different perspective views of a target. Usually a chirp signal with a high time-bandwidth product is transmitted and the returned signal from the target is coherently demodulated and then sampled, providing polar samples of the 2-D Fourier transform of the target reflectivity. The most commonly used image reconstruction algorithm then interpolates the polar data to a Cartesian grid and applies a 2-D FFT. An alternative reconstruction algorithm, called convolution-back-projection, first filters and inverse transforms each demodulated return using a 1-D FFT and then back-projects the result. In this paper we propose a simplified back-projection algorithm in which the filtered projections are obtained automatically by choosing the radar waveform to be the impulse response of the desired filter. The necessary filtering is then accomplished through the physical mechanism of the waveform reflecting off the target, which is described by a convolution. A parallel processing architecture is proposed for the back-projection step and its computational and memory requirements are analyzed. Finally, we describe a second potential method for obtaining projections, which assumes a conventional linear FM waveform.
It has been recently shown that linearly indexed Assignment Codes can be efficiently used for coding several problems especially in signal processing and matrix algebra. In fact, mathematical expressions for many algorithms are directly in the form of linearly indexed codes, and examples include the formulas for matrix multiplication, any m-dimensional convolution/correlation, matrix transposition, and solving matrix Lyapunov's equation. Systematic procedures for converting linearly indexed Assignment Codes to localized algorithms that are closely related to Regular Iterative Algorithms (RIAs) have also been developed. These localized algorithms can be often efficiently scheduled by modeling them as RIAs; however, it is not always efficient to do so. In this paper we shall analyze and develop systematic procedures for determining efficient schedules directly for the linearly indexed ACs and the localized algorithms. We shall also illustrate our procedures by determining schedules for examples such as matrix transposition and Gauss-Jordan elimination algorithm.
The weighted checksum technique has been demonstrated to be effective in multiple error detection. It has been shown that, in order to guarantee error detection, the chosen weight vectors must satisfy some very specific properties about linear independence. Previous weight generating methods that fulfill the independence criteria have problems with numerical overflow. We present a new scheme that generates weight vectors to meet the requirements about independence and to avoid the difficulties with overflow.
The purpose of this paper is to describe the implementation of a super-resolution (or spectral extrapolation) procedure on a neural network, based on the Hopfield model. This was first proposed by Abbess et al.1 We show the computational advantages and disadvantages of such an approach for different coding schemes and for networks consisting of very simple two state elements as well as those made up of more complex nodes capable of representing a continuum. With the appropriate hardware, we show that there is a computational advantage in using the Hopfield architecture over some alternative methods for computing the same solution. We also discuss the relationship between a particular mode of operation of the neural network and the regularized Gerchberg-Papoulis algorithm.
A high speed neural network simulator and its use for the dynamics and performance analysis of feedback neural architectures are described. The simulator is based on a semi-parallel, analog-digital hybrid architecture which utilizes digital memories to store synaptic weights and analog hardware for high speed computation. A breadboard system with 8-bit gray scale synapses, designed and built at JPL, is successfully serving as a valuable design test-bed for the development of fully parallel, analog, custom VLSI neurochips, currently underway at JPL. The breadboard hybrid simulator indeed allows a detailed evaluation of hardware potential and limitations in implementing full analog operations in such chips. As an example, the paper presents an analysis of the stability and convergence behavior of a feedback neural network applied to the "Concentrator Assignment Problem" in combinatorial optimization, as studied on the analog-digital hybrid simulator. This has already resulted in a VLSI custom design of a fully parallel, analog neuroprocessor with a powerful "analog prompting" feature, for the high-speed, multiparameter optimization function.
This paper presents an overview of a proposed machine vision system which uses artificial neural network type computing to track rapidly moving objects in space. Artificial neural networks theoretically have inherent advantages for this task such as real time computing speed and relatively small hardware size. The system is required to recognize the object in the presence of some other patterns, determine its location relative to a frame of reference and compute its velocity. Trainable ANNs are proposed to solve the computation problem.
This paper proposes using a generalized Hopfield's model 1 , also known as the McCulloh-Pitts model 2, as associative memories. The system always converges to the nearest stored words. This is not true for Hopfield's model. We also determine the information capacity of this generalized model. Our result is better than those in the literature.
Designs that combine simple high order neural circuits, Feed Back Shunting (FBS)1 and Feed Forward Shunting (FFS)2 are presented. A cascaded FFS-FBS design approach results in two new networks. These networks are capable of extracting edge features from moving images in noisy environments. The use of FFS to detect motion gives these networks robustness in the presence of noise. FBS gives them the ability to extract edge features.
The backpropagation algorithm for error gradient calculations in multilayer, feed-forward neural networks is derived in matrix form involving inner and outer products. It is demonstrated that these calculations can be carried out efficiently using systolic processing techniques , particularly using the SPRINT, a 64-element systolic processor developed at Lawrence Livermore National Laboratory. This machine contains one million synapses, and forward-propagates 12 million connections per second, using 100 watts of power. When executing the algorithm, each SPRINT processor performs useful work 97% of the time. The theory and applications are confirmed by some nontrivial examples involving seismic signal recognition.
With the current need for developing very fast computers in comparison to conventional digital chip based systems, the future for optical based signal processing is very bright. Attention has turned to a different application of optics utilizing mathematical operations, in which case operations are numerical, sometimes discrete, and often algebraic in nature. Interest has been so vigorous that many view it as a small revolution in optics, whereby optical signal processing is beginning to encompass what is frequently described as optical computing. The term is fully intended to imply a close comparison with the operations performed by scientific digital canputers. This paper will describe the applications of single element 2-D acousto-optic deflectors for optical multiplication systems.
A new concept, which utilizes 2D data structure, is introduced in optical systems for computing and interconnects. Many Architectural designs using this new concept have obtained more efficient and higher throughput systems than the conventional designs with 1D data structure.
An optical associative processor realizing the versatile Faddeeva algorithm can be used for several computations such as matrix multiplications, matrix decompositions or inversions, and singular value decompositions just by changing the structure of the data stored. Thus such an optical processor is programmable and efficient. In this paper the realization of this algorithm on various optical structures and their performance are described. For unstable data structures it is necessary to use a preprocessed Faddeeva algorithm on the optical processors for accurate results. The procedure of preprocessing and the performance of the preprocessed Faddeeva algorithm are also described.
This paper presents an overview of the work on optically interconnected computer systems being conducted at the University of Delaware. It is our belief that the successful implementation of optical interconnect strategies requires not only the development of new technology, but also the development of new models to describe computer behavior. This is necessary because of the strong interactions between the design of the optical interfaces and their placement with the computer hierarchy. An analytical model of computer interconnection delays is presented which will enable the performance of new interconnection strategies to be evaluated and compared with traditional metallic systems. The results of this work are being used to develop an integrated electrical/optical interface circuit based on the selective growth of GaAs-on-Si. The growth process is described and results on device fabrication are presented.
A novel redundant number system divider chip is described. This design incorporates an SRT division algorithm with internal signed digit representations in the division recurrence loop. The design promises to be faster and requires less physical space.
An on-line CORDIC implementation for computing the Generalized Singular Value Decomposition is presented. Among several algorithms, the implementation shown is based on Luk's parallel version for a triangular processor array, using odd-even ordering. To implement GSVD, the CORDIC approach is attractive compared with using conventional arithmetic units, such as square root, divider and multiplier. However, the CORDIC module is relatively slow because of the requirement of full precision computation to determine the direction of the angle and the variable shifter in the basic step. To avoid this, the use of redundant and on-line CORDIC has been proposed previously for SVD and matrix triangularization. This results in a significant speedup at some additional cost because of the variable scaling factor introduced. The extension of on-line CORDIC approach to the GSVD is presented. The advantages of this approach are more significant in GSVD because of the longer sequence of dependent operations. This makes the combination of the short step time of on-line CORDIC and the overlapping capability of on-line very attractive. By comparing it with conventional approach as well as CORDIC approach with full precision computation, we show that a speedup of about 5 can be achieved.
Simple bit-serial processors are useful in the construction of highly parallel computing architectures where thousands of individual processors may be combined to create very powerful machines. Conventional bit-serial processors are limited by several factors including a von Neumann bottleneck associated with their local memory segment, limited speed of each processor, and very high control bandwidth requirements. The memoryless Bit-Serial Pipeline (BISEP) processing element replaces the conventional local memory segment with a parallel pipeline structure through which data is piped in systolic fashion. In addition to overcoming some of the conventional processor's limitations, the BISEP architecture has higher performance then conventional processors in many applications and is more easily fabricated with some silicon technologies such as gate arrays.