Since the spring of 1988, Carnegie Mellon University and the Naval Air Development Center have been working together to implement several large signal processing systems on the Warp parallel computer. In the course of this work, we have developed a prototype of a software tool that can automatically and efficiently map signal processing systems to distributed-memory parallel machines, such as Warp. We have used this tool to produce Warp implementations of small test systems. The automatically generated programs compare favorably with hand-crafted code. We believe this tool will be a significant aid in the creation of high speed signal processing systems. We assume that signal processing systems have the following characteristics: •They can be described by directed graphs of computational tasks; these graphs may contain thousands of task vertices. • Some tasks can be parallelized in a systolic or data-partitioned manner, while others cannot be parallelized at all. • The side effects of each task, if any, are limited to changes in local variables. • Each task has a data-independent execution time bound, which may be expressed as a function of the way it is parallelized, and the number of processors it is mapped to. In this paper we describe techniques to automatically map such systems to Warp-like parallel machines. We identify and address key issues in gracefully combining different parallel programming styles, in allocating processor, memory and communication bandwidth, and in generating and scheduling efficient parallel code. When iWarp, the VLSI version of the Warp machine, becomes available in 1990, we will extend this tool to generate efficient code for very large applications, which may require as many as 3000 iWarp processors, with an aggregate peak performance of 60 gigaflops.