7 May 2003 Optimized video decoder architecture for TMS320C64x DSP generation
Author Affiliations +
Abstract
The TMS320C64x DSP is a generation of high-speed DSPs with a rich instruction set and an efficient memory system for multimedia processing. Digital video decoding is one of the key applications in multimedia processing. It is a computationally intensive application, which requires high bandwidth to external memory and an efficient DMA engine. Reference models for video decoders typically follow a simple data flow that operates sequentially on one macroblock (MB) at a time. This structure leads to inefficiencies in real-time implementations including less than optimal utilization of program caches and DMA bandwidth. These issues become more significant with high-performance devices like the C64x DSP because the CPU efficiency and high-clock rate allow the core processing to occur much faster than on other processors. At the same time, the bandwidth to external memory has not increased at the same rate as the processing performance. This can lead the performance bottleneck to be I/O bandwidth instead of processing unless the system data flow is carefully designed. This paper describes an optimized flow for MPEG-2 decoding, which processes multiple blocks at a time to obtain optimum cache performance and DMA bandwidth efficiency. With this approach, system overhead is reduced from as high as 100% for worst-case B frames with the conventional flow to less than 20%.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jeremiah Golston, Satish Arora, Ratna Reddy, "Optimized video decoder architecture for TMS320C64x DSP generation", Proc. SPIE 5022, Image and Video Communications and Processing 2003, (7 May 2003); doi: 10.1117/12.476330; https://doi.org/10.1117/12.476330
PROCEEDINGS
8 PAGES


SHARE
Back to Top