The demand for real-time MPEG decoding is growing in multimedia applications. This paper discusses a hardware-software co-design for MPEG-2 Video decoding and describes an efficient parallel implementation of the software module. We have advocated the usage of hardware for VLD since it is inherently serial and efficient hardware implementations are available. The software module is a macro-block level parallel implementation of the IDCT and Motion Compensation. The parallel implementation has been achieved by dividing the picture, into two halves for 2-processor implementation and into four quadrants for 4-processor implementation, and assigning the macro-blocks present in each partition to a processor. The processors perform IDCT and Motion Compensation in parallel for the macro-blocks present in their allotted sections. Thus each processor displays 1/no_of_processors of a picture frame. This implementation minimizes the data dependency among processors while performing the Motion Compensation since data dependencies occur only at the edges of the divided sections. Load balancing among the processors has also been achieved, as all the processors perform computation on an equal number of macro-blocks. Apart from these, the major advantage is that the time taken to perform the IDCT and Motion Compensation reduces linearly with an increase in number of processors.