As wireless video products evolve, they demand more sophisticated processing at higher resolutions and frame rates. Computational performance and energy efficiency have become critical design issues. This paper presents the Quantized Color Pack eXtension (QCPX) combined with a loop unrolling (LU) technique to improve execution performance and energy efficiency of color image and video processing applications. QCPX applied to a 32-bit datapath processor supports parallel operations on two packed 16-bit YCbCr (Y: luminance, Cr and Cb: chrominance) color pixels, providing greater subword-level parallelism by increasing the number of smaller color pixels packed into a word. Instruction-level parallelism can be further enhanced through loop unrolling. These techniques provide greater performance and efficiency for multimedia workloads on mobile systems. Experimental results on a set of media benchmark applications indicate that the LU plus QCPX-optimized version achieves a speedup ranging from 3.8 to 7.9 while reducing the energy consumption from 76% to 87% over the baseline version on identically configured, dynamically scheduled ILP superscalar processors. The LU plus QCPX-optimized version also outperforms the LU plus MDMX-like (MIPS’s multimedia extension) version.