Programmable media processors have been emerging to meet the continuously increasing computational demand in complex digital media applications, such as HDTV and MPEG-4, at an affordable cost. These media processors provide the flexibility to implement various image computing algorithms along with high performance, unlike the hardwired approach that has provided high performance for a particular algorithm, but lacks flexibility. However, to achieve high performance on these media processors, a careful and sometimes innovative design of algorithms is essential. In addition, programming techniques, e.g., software pipelining and loop unrolling, are needed to speed up the computations while the data flow can be optimized using a programmable DMA controller. In this paper, we describe an algorithm for two-dimensional convolution, which can be implemented efficiently on many media processors. Implemented on a new media processor called the MAP1000, it takes 7.9 ms to convolve a 512x512 image with a 7x7 kernel, which is much faster than the previously reported software-based convolution and is comparable with the hardwired implementations. High performance in two-dimensional convolution and other algorithms on the MAP1000 clearly demonstrates the feasibility of software-based solutions in demanding imaging and video applications.