Efficient VLSI architectures for multi-dimensional (m-D) discrete wavelet transform (DWT), e.g. m=2, 3, are presented, in which the lifting scheme of DWT is used to reduce efficiently hardware complexity. The parallelism of 2m subbands transforms in lifting-based m-D DWT is explored, which increases efficiently the throughput rate of separable m-D DWT. The proposed architecture is composed of m2m-1 1-D DWT modules working in parallel and pipelined, which is designed to process 2m input samples per clock cycle, and generate 2m subbands coefficients synchronously. The total time of computing one level of decomposition for a 2-D image (3-D image sequence) of size N2 (MN2) is approximately N2/4 (MN2/8) intra- clock cycles (ccs). An efficient line-based architecture framework for both 2D+t and t+2D 3-D DWT is first proposed. Compared with the similar works reported in previous literature, the proposed architecture has good performance in terms of production of computation time and hardware cost. The proposed architecture is simple, regular, scalable and well suited for VLSI implementation.