This paper attempts to provide a computational perspective for matrix-based signal processing. Representative areas of modern signal processing are discussed with respect to their dominant computational needs. Parallel algorithms and architectures for the required matrix and vector operations are then surveyed. Issues of interest include modularity and regularity, communication requirements, simplicity of control, and numerical stability.
Eigendecomposition and Singular Value Decomposition of matrices have become important comput at ional tools in signal processing systems. This may be a "natural evolution" since, for some time now, linear algebra (i.e., the algebra of vector spaces) has been providing processing tools for problems such as direction-finding (DF) and Spectral Analysis. The concept of the Signal Subspace has emerged as a fruitful means of characterizing useful structure in sensor data and has led to new methods and algorithms in signal processing. Since the Signal Subspace is an invariant subspace which can be computed via the Eigendecomposition of data matrices, such methods are often referred to as "Eigenvector methods of Signal Processing". The purpose of this paper is to present and discuss such methods as applied to signal processing problems such as; - Multiple signal detection - Multiple signal parameter estimation and demodulation - Multiple source location (e.g., direction finding)
Systolic architectures for eigenvalues and singular values are discussed. One "triangular" processor array and associated algorithms for computing the QR-decomposition, the singular value decomposition, the generalized singular value decomposition and the CS-decomposition are described in detail.
In this paper, we show that every systolic array executes a Regular Iterative Algorithm with a strongly separating hyperplane and conversely, that every such algorithm can be implemented on a systolic array. This characterization provides us with an unified framework for describing the contributions of other authors. It also exposes the relevance of many fundamental concepts that were introduced in the sixties by Hennie, Waite and Karp, Miller and Winograd, to the present day concern of systolic array
This overview paper describes techniques for fault tolerance which can be applied to highly parallel signal processing architec-tures. Classical techniques are outlined and shown applicable to memories and data communications. The recent approach of algorithm-based fault tolerance, which tailors the fault tolerance to the systolic algorithm and processor architecture, is shown to be a natural one for such systems. Various data encoding techniques and resulting fault-tolerant systems are described and critiqued.
We review the use of bit-serial circuit techniques for parallel machine architectures. We identify the key concept of functional parallelism and present some examples of its application, including its use in the FIRST silicon compiler. We introduce a new com-parison of LSB-first and MSB-first serial techniques, and find that the latter approach may have distinct advantages for more complex arithmetic functions (like divide and square root), and for a general floating-point element. Finally, we review the serial programmable array processors and find significant future potential as a general computing substrate.
Today's architectural challenge is to design processors for very large computations that take advantage of the cost savings of VLSI with large production volume parts. Using a Boolean hypercube interconnection of microprocessors is one approach that has been studied at Caltech. Many applications have been implemented and many architectural lessons have been learned. These are summarized and their indications of future directions are discussed.
The Saxpy-1M is a practical, programmable, systolic computer system. Its key features are: 1030 Mflop peak performance in single precision floating point; memory of from 8 to 128 Mwords; memory bandwidth of 320 Mbytes/second; I/O bandwidth of up to 40 Mbytes/second; the VAX/VMS operating system; a unique architecture that combines a control processor, a vector processor, and a programmable, systolic matrix processor; and an extensive set of application software libraries and application software development tools.
This paper is a review of arithmetic algorithms and circuitry used for VLSI implementation of digital signal processing systems. We also briefly review recent introductions of digital signal processor ICs and provide examples of system specific ICs as well. Performance of the ICs is compared where data is available.
A formal algorithmic notation and programming language will be critical in developing VLSI systems. Due to the difficulty of keeping, track of several simultaneously occuring events, parallel programming is significantly more complicated than sequential programming. Therefore, it is desirable to consider new notations or languages which fit better to the array processors, instead of always sticking to conventional languages. The advent of algorithm-oriented VLSI. architectures also necessitate new programming language theory to precisely describe concurrency imbedded in computational algorithms. For example, it is of great interest and importance to develop a complete set of programming techniques and software packages for wavefront/systolic type arrays. Some guidelines for algorithmic notations for such VLSI arrays will be addressed in this paper. We shall also discuss in great details several illustrative programming examples based on a wavefront language and Occam language.
Although a systolic array is often thought of as a "hard wired" device, there are many reasons to want to program systolic algorithms. In this paper the problem of providing an efficacious programming environment is addressed. The difficulties of programming complex parallel algorithms are shown to be reduced by using a new concept of a parallel "program" which maximizes the use of graphical abstractions and minimizes the need for symbolic text. This concept is illustrated by the Poker Parallel Programming Environment which, although designed for a broader class of algorithms, illustrates the main features that a programming environment specialized to systolic computation should have.
During the past decade or so, numerous schemes have been proposed and developed for performing vector-matrix-based algebraic operations optically. This paper reviews the major methods, providing the basis for an understanding of more complicated opto-electronic architectures and algorithms.
A generic optical linear algebra processor architecture is described with attention to components and unique optical features. Various architectures, number representations, and laboratory systems are then reviewed and applications for such processors are briefly highlighted.
The comparison between the capabilities of optical versus electronic signal processors is usually the first issue that is addressed when one considers using optics in a particular signal processing application. In general, it is necessary that optics offers a clear advantage over electronics in order to be able to justify opting for an optical approach. The comparison between optics and electronics is based on a number of related criteria whose relative importance depends on the application. These include processing power (e.g. the number of elementary operations performed per unit time), algorithmic flexibility, accuracy, power consumption, weight, cost. The primary advantage of analog optical systems is the extremely high processing power that can be achieved (in excessof 104 analog multiplication per second is possible with acoustooptic technology) with relatively small power and size requirements. The major disadvantages of conventional analog optical processors are lack of algorithmic flexibility (limited to linear transformations) and low accuracy. These two limitations along with immature device technologies have restricted the utilization of optical signal processors to only a small number of areas, such as synthetic aperture radar and acoustooptic spectrum analyzers. Thus there is strong motivation to find ways to improve the accuracy and extend the range of signal processing operations that can be optically performed in order to extend the applicability of optical techniques. In this paper we examine the method of digital multiplication by analog convolution (DMAC) [l]  which has received attention in recent years as a technique for building accurate optical processors. We will show that DMAC is fundamentally limited because it can provide accuracy only at the expense of using post-detection electronics that are more sophisticated than the electonics required to build the entire system electronically in the first place.