Efficient line-based very large scale integration architectures for the 2-D discrete wavelet transform (DWT) based on a lifting scheme, using the 9/7 wavelet filters adopted in the JPEG 2000 proposal, are proposed. The embedded decimation technique based on folding and time multiplexing was exploited to optimize the architecture, which reduces the size of buffer memory required and the amount of RAM access, and hence the occupied area and power consumption of the devices. Using this technique, a single-input, single-output architecture (SISOA) and a two-input, two-output architecture (TITOA) are proposed. The presented SISOA is designed to generate one output per clock cycle; the TITOA is designed to generate two outputs per clock cycle with the same memory requirement as that for SISOA, where the four subband coefficients of the transformed signal are available interleaved. Because only one line of data is required at a time, a single-port memory can be used. Performance analysis and comparison results demonstrate that the proposed method is economical of hardware cost and computation time. The advantages of the design also include short output latency, simple data flow, regularity, and scalability, as well as suitability for VLSI implementation.