A new DCT/IDCT architecture capable of handling higher input/output data rates has been proposed. In the proposed architecture, the 8-point input data vector for DCT/IDCT is divided into two 4-point data vectors, the even part and the odd part. These two parts are parallelly processed. As a result, the 8-point DCT/IDCT is completed for 4 clock cycles, while the conventional DCT/IDCT processors need 8 clock cycles. Therefore, our novel DCT/IDCT architecture achieves twice higher data rates, which is useful for the applications like the real- time HDTV. For the purpose of reducing the hardware size, we replaced the Modified Booth Multiplier by the Pre-Rounded Multiplier, in which some lower significant bits of partial sums are rounded before summations. To achieve high data rates, multipliers and accumulators were composed of Carry Save Adders and Pipeline Registers. Although the proposed DCT/IDCT architecture has a larger chip size than the one based on the Distributed Arithmetic method, the size is reasonable in 1.0 micrometers CMOS technology. In spite of a larger chip size, the proposed architecture can achieve higher data rates and high accuracy. The high regularity of the proposed architecture is also appropriate for VLSI implementation.