One of the main goals of the STAP-BOY program has been the implementation of a space-time adaptive processing (STAP) algorithm on graphics processing units (GPUs) with the goal of reducing the processing time. Within the context of GPU implementation, we have further developed algorithms that exploit data redundancy
inherent in particular STAP applications. Integration of these algorithms with GPU architecture is of primary importance for fast algorithmic processing times. STAP algorithms involve solving a linear system in which the transformation matrix is a covariance matrix. A standard method involves estimating a covariance matrix from a data matrix, computing its Cholesky factors by one of several methods, and then solving the system by substitution. Some STAP applications have redundancy in successive data matrices from which the covariance matrices are formed. For STAP applications in which a data matrix is updated with the addition of a new data row at
the bottom and the elimination of the oldest data in the top of the matrix, a sequence of data matrices have multiple rows in common. Two methods have been developed for exploiting this type of data redundancy when computing Cholesky factors. These two methods are referred to as
1) Fast QR factorizations of successive data matrices
2) Fast Cholesky factorizations of successive covariance matrices.
We have developed GPU implementations of these two methods. We show that these two algorithms exhibit reduced computational complexity when compared to benchmark algorithms that do not exploit data redundancy. More importantly, we show that when these algorithmic improvements are optimized for the GPU architecture,
the processing times of a GPU implementation of these matrix factorization algorithms may be greatly improved.
This paper reviews the implementation of DARPA MTO STAP-BOY program for both Phase I and II conducted
at Science Applications International Corporation (SAIC). The STAP-BOY program conducts fast covariance
factorization and tuning techniques for space-time adaptive process (STAP) Algorithm Implementation on Graphics
Processor unit (GPU) Architectures for Embedded Systems.
The first part of our presentation on the DARPA STAP-BOY program will focus on GPU implementation and
algorithm innovations for a prototype radar STAP algorithm. The STAP algorithm will be implemented on the
GPU, using stream programming (from companies such as PeakStream, ATI Technologies' CTM, and NVIDIA)
and traditional graphics APIs. This algorithm will include fast range adaptive STAP weight updates and
beamforming applications, each of which has been modified to exploit the parallel nature of graphics architectures.
Proc. SPIE. 6979, Independent Component Analyses, Wavelets, Unsupervised Nano-Biomimetic Sensors, and Neural Networks VI
KEYWORDS: Detection and tracking algorithms, Data modeling, Visualization, Fourier transforms, Computer programming, 3D modeling, Signal processing, Object recognition, Computer architecture, 3D image processing
This paper reviews the DARPA MTO STAP-BOY program for both Phase I and II. The STAP-BOY program
conducts fast covariance factorization and tuning techniques for space-time adaptive process (STAP) Algorithm
Implementation on Graphics Processor unit (GPU) Architectures for Embedded Systems.
Emerging capabilities in stream and multi-core computation, along with high speed memory bandwidths in
commercial GPU architectures, are enabling breakthrough low-cost and low-power teraflop computing solutions to
DoD-embedded computing challenges. Under the DARPA MTO STAP-BOY program, SAIC and Duke University,
in cooperation with commercial graphics processor companies, have been mapping complex signal processing
algorithms to GPU architectures. Algorithms undergoing implementation include STAP applications for radar
adaptive beamforming and spin-image surface matching applications for object recognition in 3-D range-image