SIMD processor arrays are becoming popular for their fast parallel executions of low- to medium-complexity image and video processing algorithms, and most stages of the compression standards. In many existing techniques, visual data processing algorithms and compression standards possess a high degree of parallelism. In particular, the processing of a certain pixel/block does not generally require data from a distant pixel/block, and the instructions for processing these pixels/blocks are usually identical. Thus, these algorithms map naturally onto the architecture of the SIMD processor arrays. In this paper, the architectures of the recently SIMD processor arrays will be reviewed together with algorithms demonstrating their superior features. Due to the length of the paper, only processor arrays implemented at the chip-level are discussed, especially those whose logic circuits are merged/embedded in the SRAM or DRAM memory process. While some processor arrays are designed by embedding the memory modules onto the existing processors, a significant number of processor arrays are realized by integrating logic circuits onto existing RAM to save the inherently large memory bandwidth, and thus achieving a performance in the order of Tera instructions per second.