Increasingly complex electronic systems consisting of multiple programmable processors executing multiple processes in real-time require the use of modern software tools for effective design and analysis. Moreover, design and analysis methodologies suitable for realizing such complex systems must be comprehensive and integrated to provide insight into the myriad technical and cost issues attendant on such systems. These issues include system performance, reliability, and testability, all of which must be related to the specified system requirements to ensure that user needs are met. This paper identifies formal methods and supporting tools that are useful in describing, designing, and analyzing complex digital systems, beginning with user requirements and progressing to specific hardware and software implementation issues.
We are developing an expert system to facilitate the CAD of sophisticated and complex chips for real-time non-linear signal processing. The software is called DELPHI which is an acronym for DEsign Laboratory for Processing Hidden Information. The system combines an AI engine, symbolic algebra, and multiprocessor numerical schemes. Sophisticated reasoning, mathematical, and computational tools are provided in a form suitable for immediate use by systems engineers. One of the major advantages of DELPHI is its ability to interact symbolically with the user. The architecture of the DELPHI system is shown in Figure 1. The current architecture can be classified as a shallow coupled system since the knowledge-based system has little knowledge of numeric routines. The numeric routines, which are treated as 'black boxes', are managed by the system to solve the given problem and interpret the results. The system can be used as: (a) a tool for integrated system design, (b) a tool for integration of symbolic and numeric computing, and (c) an advanced teaching aid.
In this paper, we present a procedure to trans-form algorithms into equivalent regular algorithms. Then, starting from these regular algorithms, we show how to synthesize systolic/wavefront arrays that can be programmed to solve problems of arbitrary size. Buffer memory and control of a resulting array is regular and simple. Also, the through-put of the array is balanced with the I/O speed of the host to which it is to be attached. Methods and tools which are presented are consistent with, and embedded in our hierarchical and interactive flow graph integration system HIFI.
This paper describes the extension of the maximum likelihood spectrum estimation procedure  to the multichannel case and compares its properties experimentally with other high resolution methods for multichannel 2-D spectrum analysis [2,3]. The procedure begins with a multichannel optimum filtering problem and extends the original procedure to the estimation of the entire spectral matrix. In so doing it formally encompasses both the original single channel problem and various separate attempts to develop maximum likelihood cross spectral estimates for two random processes [4,5]. The paper also considers the so-called improved maximum likelihood method of Lim and Dowla  and develops it for the multichannel case. The methods are applicable to both one-dimensional and multidimensional multichannel spectral estimation.
In this paper, we study three fault tolerance techniques: data redundancy, algorithm-based fault tolerance, and pair and spare. These techniques, each individually useful, produce excitingly high levels of protection at a relatively low cost when used in combination. We first discuss the techniques separately, extending triple data redundancy to band matrix multiplication while applying algorithm-based fault tolerance to dense matrix multiplication. We then combine double data redundancy and algorithm-based fault tolerance to produce a linear array that corrects transient errors at minimal costs. Finally, we show how the above techniques can be used to implement pair and spare at half the normal cost.
A programmable two-dimensional (2D) processor array is fault-tolerant if faulty processors can be detected, and then avoided during program execution. With VLSI the number of processors that can be implemented in a 2D array increases, and as a result more and more cells can be devoted for fault tolerance purposes. In the literature there are many schemes on detecting faulty processors and reconfiguring data routing to avoid them. However, an efficient implementation of these schemes on a 2D array can be an extremely difficult task if application, fault detection and reconfiguration must all be considered at the same time. The virtual channels mechanism of this paper allows these concerns to be dealt with separately and efficiently. An application or fault detection program may assume that every logical connection between processors is implemented by a dedicated physical connection. A physical connection is composed of a sequence of virtual channels. Since the number of virtual channels between any two processors is not bounded by the number of available physical channels, all dedicated physical connections required by the program can be implemented. The mapping of logical connections to physical connections and the scheduling of a physical channel to implement multiple virtual channels are totally transparent to a program, and can be optimized independently. Various fault tolerance schemes are now readily implementable without programming difficulty. For example, it is straightforward to have concurrent execution of application and fault detection programs on the same 2D array. A switch architecture, suitable for VLSI implementation, is presented for implementing the virtual channels mechanism. The switch captures all the architectural features needed to implement the mechanism and can be used with different processors. By using multiple copies of this switch, a variety of fault-tolerant 2D arrays can be formed. In particular the switch architecture is planned to be used in building a fault-tolerant 2D Warp array.
Maximum entropy method (MEM) and balanced correlation method were used to reconstruct the images of low-intensity x-ray objects obtained experimentally by means of a Uniformly Redundant Array coded aperture system. The reconstructed images from MEM are clearly superior. However, the MEM algorithm is computationally more time consuming because of its iterative nature. On the other hand, both the inherently two-dimensional character of images and the iterative computations of MEM suggest the use of parallel processing machines. Accordingly, computations were carried out on the Massively Parallel. Processor (MPP) at Goddard Space Flight Center as well as on the serial processing machine VAX 8600, and the results are compared.
We describe some preliminary results in the development of a general and systematic methodology to design arrays of processing elements (PEs) for matrix computations, with the capability to handle algorithm and implementation in a unified manner. This is a transformational methodology, based on the dependence graph of the algorithms. It provides mechanisms to deal with issues such as data broadcasting, data synchronization, interconnection structure, I/O bandwidth, number of PEs, throughput, delay, and utilization of PEs. We show that different transformations may lead to entirely different computing structures and that the selection of suitable transformations is directed by the specific restrictions imposed on the implementation. We apply a preliminary version of this methodology to the algorithms for matrix multiplication and LU-decomposition. The approach produces structures which correspond to proposed systolic arrays for these computations, as well as structures that exhibit better efficiency than those arrays.
STRUCTFLOW is an approach to efficient and flexible processing of structured and continuous Data. It combines easy programming and efficient exploitation of prallelity of the data flow computers with flow principles of pipelines and systolic arrays. This leads to a novel static data flow architecture, that does not nead associative mechanisms and needs no lokal flow control. The presented system is pricipially able to process arbitrarily extended streams of data with an unlimited number of processors working in parallel. It is possible to achieve nearly deterministic behaviour and synchrony of data streams, which is not jet to be seen in data flow processing. This paper presents the architecture and instruction set of the processor and discusses the features of this system in the light of signal processing.
Many Adaptive Phased Array Radar (APAR) techniques provide farfield signal power and location. Once this information is known, placing nulls at these locations to cancel jammers can be accomplished through a proper choice of antenna weights. The weight and angular domains are related through Fourier transformation. To obtain a fine sampling in the angular domain to accurately specify the desired nulls, it is required to extend the weight aperture by padding it with zeros. However, in the final weight vector applied to the antenna output, the contribution of these extra elements must be zero since they do not correspond to available antenna elements. This provides two sets of constraints on the solution, the set of desired nulls in the angular domain and the available aperture in the weight domain. A method of finding a solution which matches constraints in both the time and frequency domains is the Gerchberg-Saxton error reduction algorithm, which is often applied to image reconstruction. This paper will describe the investigation into the behavior of this algorithm as applied to the discrete antenna pattern synthesis case. The algorithm is presented in matrix/vector form and its transient and steady state response is derived. To assist in this analysis, we introduce a new matrix operator which greatly simplifies the required derivations. Computer simulation and numerical evaluations of the analytical results are included to demonstrate the applicability of the algorithm to pattern synthesis.
The Systolic Processor with a Reconfigurable Interconnection Network of Transputers (SPRINT) is a sixty-four-element multiprocessor developed at Lawrence Livermore National Laboratory to evaluate systolic algorithms and architectures experimentally. The processors are interconnected in a reconfigurable network which can emulate networks such as the two-dimensional mesh, the triangular mesh, the trapezoidal mesh, the tree, and the shuffle-exchange network. The SPRINT's computation capability surpasses its communication capability. Techniques have been developed to perform the Faddeev Algorithm utilizing most of its computing capability by operating on block matrices. These techniques reduce communication bandwidth requirements for a given computation rate and increase efficiency to close to 100%. The Faddeev algorithm calculates the quantity CX+D, where X is the solution to AX=B and where A, B, C and D are given. All quantities are square matrices. Several linear algebra operations such as the matrix-matrix product and matrix inversion can be calculated by loading appropriate values for A, B, C, and D. The Faddeev algorithm is executed on the SPRINT to compare theory with experiment.
This paper describes a new wavefront array architecture which can efficiently execute sequences of matrix operations, as are typically found in many real-world signal processing problems. The systolic algorithms described in the literature have usually been derived for single matrix operations, without regard for how one might combine them with other matrix operations in a useful sequence. A modified wavefront array architecture allows a sequence of operations to be systolized without sacrificing the efficiency of the individual systolic algorithms in the sequence. The approach consists of dividing the sequence of matrix operations into tasks (each task is one operation) and systolizing each task such that the positioning of data values for input and output is compatible with that required by the preceding and following tasks. The wavefront array then pipelines a sequence of tasks, with the head element of the array beginning a new task as soon as it completes the preceding task. Each wavefront initiates a task iteration when it arrives at a processing element. To increase the speed of the array, wavefronts are passed on when an element begins executing a task iteration rather than after it finishes the task iteration. When this is done, dataflow rather than wavefront propagation determines the execution time of the array for each task. Several examples of matrix operation sequences have been tested on the architecture in simulation, and the results are reported.
We present the sequential detection for diffusion type signals both in the fixed probability of error formulation and in the Bayesian formulation. The optimal strategy in both cases is a threshold policy with explicitly computable thresholds. We provide numerical schemes for approximating the revelant likelihood ratio and provide an architecture for real time signal processing.
The design of a high-speed (250 million 32-bit floating point operations per second) two dimensional systolic array composed of 16 bit/slice microsequencer structured processors will be presented. System design features such as broadcast data flow, tag bit movement, and integrated diagnostic test registers will be described. The software development tools needed to map complex matrix-based signal processing algorithms onto the systolic processor system will be described.
A new design of a multi-dimensional real-time VLSI convolver is presented. A custom VLSI chip is proposed which, when accompanied by memory buffers, can be used to assemble a convolver of arbitrary dimension and with arbitrary input size. The convolver is optimal with respect to the size of memory and has very small latency. Numerous modifications of the basic design are introduced in a framework of a unified graph-theoretic transformation called retiming. This approach guarantees functional equivalence of the original and modified systems.
In coherent pulse-Doppler radars, the signal processing is frequently performed by a digital processor, or by the combination of an analog pulse compressor followed by a digital processor. Traditionally, digital processing is attractive for one or both of the following reasons: complicated signal processing algorithms can be implemented using commercially available building blocks (i.e., digital ICs) and signals with a large dynamic range can be accommodated. However, in applications involving modest dynamic ranges (≈8 bits), analog signal processing offers the potential for higher throughput rates in a smaller, lower power processor than would be possible with a digital implementation. Except in the area of pulse compression, almost all past attempts to realize the potential advantage of analog signal processing have met with only limited success. In particular, conventional charge-coupled devices (CCDs) have not gained wide spread acceptance in commercial or military systems because they have not provided sufficient cost, power and throughput advantages over digital technology.
We have designed and built a portable real-time speech processing system, which incorporates a TMS 32010 (i.e. a co-processor) within an IBM personal computer. The system design is discussed as is the speech therapy software that has been implemented. Displays of loudness, pitch and vocal tract cross-section as computed by the system are illustrated. Preliminary results show that an estimate of the glottal excitation, as extracted using shift-and-add, vary between individuals. We indicate why the estimate of the glottal excitation may be useful in the diagnosis of glottal disorders.
Implementation of iterative algorithms in a real time signal processing environment is described An this paper. The implementation considered here differs from the usual application of these algorithms in that the data flow is allowed to drive the iterations, providing effective real time performance. The particular signal processing application addressed is adaptive noise cancellation. We allow the data flow to continuously update the noise minimization problem, introducing perturbations in the problem. A perturbation analysis is given for the steepest descent algorithm. Numerical results are given for the steepest descent and conjugate gradient algorithms showing how the solution responds to perturbations in the data. An architecture is proposed for the steepest descent algorithm in a real time adaptive acousto-optic noise canceller.
Research in the use of guided wave optics for signal processing is reviewed, and advantages and limitations of the technology are discussed. The signal processors employ electrooptic and fiber optic components to perform such functions as spectral analysis of radio-frequency signals, correlation and matched filtering, code and waveform synthesis, signal delay and storage, and analog-to-digital and digital-to-analog conversion. In most cases the guided-wave approach is distinguished by the ability to perform a particular function at very high analog bandwidths or digital data rates. In concluding remarks, an effort is made to provide some perspective on competing technologies and to indicate some areas where future research might prove fruitful.
An optical system is described which provides a set of microwave signals to be applied to elements in a phased array employing solid state TR modules. The set is globally controlled by two sinusoids to permit beam formation in the 2D angle space of the antenna for both transmission and reception. Experimental results are shown. The basic conceptual design is described in terms of lightwave components which is extended to monopulse beam formation and wideband waveform beam steering which counteracts the intrinsic dispersion of arrays.
The need for high performance A/D converters, the advantages of optics, and current approaches to A/D conversion are reviewed. A novel method is proposed that uses comparators, optical logic, and a table look-up to provide optical digital output from an optical analog input. The advantage of the new method is the ability to produce any digital code, optical input/output, and longer word lengths at high speed. Implementations of the optical logic are proposed that use 1-D SLM devices under development. The optical A/D could make digital multiplication by analog convolution competitive with electronic systems.
Bistable nonlinear optical interfaces have potential advantages of lower absorption and higher switching speed over resonator-based bistable optical gates. The design of parallel vector multipliers based on all-optical logic gates on a nonlinear optical interface is proposed in this letter. These schemes can be extended to parallel multiplications of matrices and to photonic switching.
Besides optical fibers are excellent communication media, they are attractive for real time, high speed signal processing on the ground of that they possess very nice merits such as low-loss, large bandwidth-delay product, light weight, and immunity to electromagnetical interferences. A typical optical fiber signal processing device is composed of light sources, delay lines, attenuators, directional couplers, and photodetectors. Many various functions such as frequency filtering, convolution, pulse compression, high speed pulse generation, encoding, and decoding for the incoherent optical fiber signal processing devices are reported in the literature. Recently the lattice optical fiber structures are investigated intensively due to convenience for mathematical formulation and implementation. Although the optical fiber signal processing devices have many attractive features, they have inherent constraints owing to positive system properties, i.e. nonnegative quantities of signals(optical intensities), attenuations, and coupling coefficients. The constraints of the finite impulse response (FIR) optical fiber filters were presented. In this paper, we establish the constraints of the infinite impulse response (IIR) optical fiber filters by means of investigating the possibility of designing the filters with the desirable properties such as maximally flat or equiripple responses. In addition, the characteristics and design principles of the optical fiber filters can be clearly understood from the processes. The mathematical derivation makes use of the state-variable analysis technique to approach this problem. This technology is suitable to describe for both cases: FIR and IIR. Hence the confirmed constraints can be directly applied to the FIR case.
An optical systolic finite impulse response (FIR) filter (or convolution operation) implementation using barrel shifters and a Modified Signed-Digit (MSD) adder is proposed in this paper. The computational element used in systolic FIR filters in electronics consists of a multiplier and an accumulator. A speed-up in the throughput data rate can be achieved along with a high degree of regularity and concurrency by replacing the multiplier with barrel shifters and accumulators. The basic cell for the optical implementation consists of a barrel-shifter, an intensity-to-polarization converter and an optical MSD adder. The principle underlying barrel-shifting is the same as that of an optical matrix-vector multiplier. The Liquid crystal light valve (LCLV) structure forms the switching matrix whose elements are determined by the number of shifts specified and also the specified output precision. All barrel shifters in the architecture are implemented using different areas on the same LCLV structure. The MSD adder is implemented using symbolic substitution logic (SSL) and the input operands in the various cells are arranged on the same input data plane to give all the required summation terms.The optical implementation of the above architecture offers reconfigurability together with the inherent speed and massive parallelism of optics. It is shown that a FIR filter of order eight can be implemented using one LCLV and one optical MSD adder.
The purpose of this paper is to review a new direct construction of phase-only filters and to further document its potential use for threshold optical correlation detectors. This construction is a descendant of the constructions introduced in Kallman and leads to a significant improvement in signal-to-noise ratio (SNR) over previous methods. Simulations suggest that the resulting filters and their optimized binarizations can be designed to contain a great deal of information, to be stable under perturbations in the training set, to have a very low false alarm rate, and to be insensitive to large amounts of additive zero mean Gaussian or uniform noise.
An optical device, the variable wavelength laser line scanner (VWLS), is described. Using two diffractive optical elements, this structure enables a laser beam to be deflected in angle, according to a prescribed and desired scan pattern. The application of this device to the fabrication and playback of holographic optical elements is described and a numerical result presented.
A comparison of real-time acousto-optic processors for synthetic aperture radar (SAR) image formation has been performed. These processors take advantage of the high processing speed and large time bandwidth product of acousto-optic devices (AOD's) in combination with the multichannel correlation capability of charge coupled devices (CCD) to form the SAR image in real time. They offer significant size, weight and power consumption advantages compared to conventional optical or digital processors. The required two dimensional matched filtering operation is performed as a series of one dimensional operations. First, the matched filtering is performed in range using an AOD. Then the azimuth correlation is performed using a reference mask and a CCD operating as time delay and integrate correlator. The first operation is performed coherently on each radar pulse return while the second is an incoherent correlation performed over the several pulses required to form the synthetic aperture. Two features common to this type of architecture which might limit their applicability are the presence of unwanted signal-dependent bias terms and the inability to perform true complex processing. Architectures utilizing both spatial carriers and subtraction schemes for elimiminating the unwanted bias terms have been analyzed. Also, multichannel architectures for complex (quadrature) processing have been addressed. In addition to imaging performance, the impact of these approaches on system complexity, real-time processing speed and required component capabilities will also be discussed. Results from both our analysis and the experimental implementation of a selected group of these architectures will be presented.
The Lanczos and conjugate gradient algorithms are important in computational linear algebra. In this paper, a parallel pipelined realization of these algorithms on a ring of optical linear algebra processors has been described. The flow of data is designed to minimize the idle times of the optical multiprocessor and the redundancy of computations. It is shown that optical pre conditioning can improve the accuracy of these algorithms substantially. Algorithms for optical preconditioning and results of numerical experiments are discussed. Since the Lanczos algorithm is used mostly with sparse matrices, a folded storage scheme to represent sparse matrices on spatial light modulators is discussed.
This paper presents a systolic optical processor for matrix multiplication, with the property that the results of one matrix multiplication can re-enter the processor and be used in the next multiplication without memorizing the intermediate results. Such a design is said to be a result reusable, optical processor (R.R.O.P.). The applications of the reusable systolic optical matrix multiplication processor include the calculation of matrix powers, and the evaluation of matrix polynomials. Some specific applications are the iterative solution of linear systems of equations; improved performance for eigenvalue-eigenvector computations; iterative matrix inversion and the calculation of the matrix exponential as well as many other matrix functions.
Residue position-coded LED/LD look-up tables are reviewed. The design of a miniaturized LED look-up table is discussed. Emphasis is given to its volume characteristics. Minimum performance requirements are identified via a comparison of GaAs and residue look-up table multipliers.
Multiplicative acousto-optic architectures using time and space dimensions for spectrally resolving long 1-D signals and images are described. The Discrete Fourier Transform algorithm is implemented optically to provide fine frequency resolving power. Experimental results are presented for 1-D and 2-D input signals. Bias removal techniques are discussed, including an approach using a photorefractive crystal as a time integrating bias-free detector.
A multiobject shift invariant pattern recognition system using binary phase-only correlation is presented. The system computes the binary correlation between an input pattern with a generalized set of pattern functions. This technique uses a filter which consists of a set of binary phase-only code division multiplexed reference pattern functions. There are many advantages in binarizing the filter function. Binary spatial light modulators (SLM) have been developed that work well in a binary phase-only mode and can be used to synthesize the spatial filters of this type. Binarization also permits the recording of filters of images with larger samples on currently available binary SLM's that have a limited number of pixels. The functions in the reference set may correspond to either different objects or different variations of the object under study. A computer simulation of the correlator is used to study the performance of the pattern recognition system. The correlation SNR is evaluated as the criterion for the system performance.