Convolution is widely used as an effective tool for enhancing image features, such as points, lines, or edges, and smoothing noise. One major challenge in implementing convolution in real time has been its large computational requirement. For example, convolving a 512 X 512 image with a 7 X 7 kernel requires 50 million operations. Therefore, to achieve the computational performance needed in real-time applications, hardwired solutions with ASICs and/or fixed-function chips with little programmability have been used. The disadvantages associated with hardwired implementations are that they are rigid, unifunctional and not upgradable. Our approach has been programmable convolution, which is flexible, multi-functional, easily upgradable and has a performance comparable to the hardwired implementations. This paper describes an efficient algorithm for convolution, which can be implemented in software on the new generation of VLIW mediaprocessors. These processors can perform multiple multiplication, addition and load/store operations in a single instruction, which can be used effectively in convolution to reduce the execution time. We have implemented this algorithm on a new mediaprocessor called the MAP1000TM where it takes 8.6 ms for the convolution of a 512 X 512 image with a 7 X 7 kernel. This performance is 7 times faster than the previously reported software-based convolution on the Texas Instruments TMS320C80 mediaprocessor and is comparable with the hardwired implementations for the same image and kernel size. This algorithm and its implementation on the next- generation programmable mediaprocessor clearly demonstrate the feasibility of software-based convolution.