Recently some efficient parallel architectures for turbo decoder have been proposed. In these architectures the bottleneck for the parallelization of the decoder is the interleaver. On the other hand, turbo codes achieve a very impressive near Shannon-capacity performance in which the interleaver plays a crucial role. Therefore, it is of great interest to design interleavers that are good from both performance and parallelization point of views. In this paper we have proposed an interleaver structure that is suitable for parallelization of turbo decoders. It is shown that such an interleaver can be designed to have good BER performance as well. By this structure not only fast decoders with very low latency can be built, but also the regularity of the decoder and the simplicity of the interleaver structure make them promising for VLSI implementation. We also present a fast algorithm to design such an interleaver, which can be used to design S-random interleavers as well. Such interleavers have been designed and the performances are compared to that of S-random interleavers by simulations.