Motion estimation (ME) and compensation (MC) is critical to the performance of an encoder, because the procedure is computationally intensive. To reduce the calculation, people work out some kinds of fast search algorithms for motion estimation, and dramatically improve the performance. This paper uses the Intel Pentium CPU's MMX, XMM registers and some Single Instruction Multiple Data (SIMD) instructions to accelerate the calculation, especially, uses PNI (Prescott New Instruction). We could load more pixels' values to a register at the same time. With PNI’s instruction LDDQU, we could load 16 bytes to XMM register even they cross a cache line boundary. Therefore, we could calculate (add, subtract, average, get absolute differences) multiple samples in a single operation. The parallel operations will significantly increase the speed of the ME and MC, irrespective of which kind of search algorithm.