Convolution is a fundamental operation to many image processing algorithms and applications. One such algorithm is unsharp masking, which is widely used in medical imaging. A major component in unsharp masking is the computation of a lowpass-filtered image, e.g., via generalized convolution with a Gaussian filter or via specialized convolution with a boxcar filter. Generalized convolution is computationally expensive, e.g., convolution with a 3 X 3 kernel on a 512 X 512 image takes 1.45 sec on SUN SparcStation 20/71. In order to achieve faster computation in convolution, hardwired solutions with ASICs and/or fixed-function chips with little programmability have been traditionally used. The disadvantages associated with hardwired implementations are that they are rigid, uni-functional and not upgradable. Our approach has been programmable convolution, which is flexible, multi-functional, easily-upgradable and has a performance comparable to the hardwired implementations. This paper describes efficient software implementations of both generalized and boxcar convolution on a programmable multimedia processor, the Texas Instruments TMS320C80, also known as Multimedia Video Processor (MVP). Using the MVP's advanced digital signal processors (ADSPs), instruction-level parallelism and intelligent input/output interface, we have been able to significantly improve the performance of both generalized and boxcar convolution. For a 512 X 512 8-bit image, generalized convolution takes 19.5 ms for a 3 X 3 kernel. While the boxcar convolution has similar performance for a 3 X 3 kernel, the performance improvement by a factor of up to 13 has been achieved for large-size kernels such as 21 X 21. Our implementation of convolution algorithms on programmable mediaprocessor clearly demonstrates the feasibility of software-based approach.