This paper presents a flexible hardware architecture for performing the Discrete Wavelet Transform (DWT) on a digital image. The proposed architecture uses a variation of the lifting scheme technique and provides advantages that include small memory requirements, fixed-point arithmetic implementation, and a small number of arithmetic computations. The DWT core may be used for image processing operations, such as denoising and image compression. For example, the JPEG2000 still image compression standard uses the Cohen-Daubechies-Favreau (CDF) 5/3 and CDF 9/7 DWT for lossless and lossy image compression respectively. Simple wavelet image denoising techniques resulted in improved images up to 27 dB PSNR. The DWT core is modeled using MATLAB and VHDL. The VHDL model is synthesized to a Xilinx FPGA to demonstrate hardware functionality. The CDF 5/3 and CDF 9/7 versions of the DWT are both modeled and used as comparisons. The execution time for performing both DWTs is nearly identical at approximately 14 clock cycles per image pixel for one level of DWT decomposition. The hardware area generated for the CDF 5/3 is around 15,000 gates using only 5% of the Xilinx FPGA hardware area, at 2.185 MHz max clock speed and 24 mW power consumption.