The wavelet transform is a popular signal processing technique, particularly due to its impressive results in data compression. Its usefulness includes two-dimensional data for use in image processing and three-dimensional data for use in video processing. In image processing, the current trends are for image sizes which require a substantial amount of computing power; an application processing a 1024 by 1024 standard quality image requires many millions of processing steps per image frame. When processing sequences of these images for video, the throughput required is considerable in order to attain even low display rates. Three-based architectures have been proposed to provide this throughput rate by processing pixels in a data parallel fashion. Each level of the wavelet transform is processed using an array or a plane of processing elements operating in parallel on shared or distributed data. The largest of these architectures, the plane-based H-tree design, provides a real-time, pipelineable implementation of the 2DWT, but is costly in terms of VLSI area due to its requirement of O(n2) processors for a n by n data-set. In this paper, we look at methods for improving the practicality of these architectures by reducing the required area for a given problem size. This is achieved by adding extra processors at the root of the tree, which allows processing of larger images with an insignificant addition of hardware in exchange for a detrimental effect on the processing speed. We conclude the paper by presenting area/time trade-offs which can be used to evaluate cost/performance specifications.