The more a priori knowledge we encode into a signal processing algorithm, the better performance we can expect. In this paper, we overview several approaches to capturing the structure of singularities (edges, ridges, etc.) in wavelet-based signal processing schemes. Leveraging results from approximation theory, we discuss nonlinear approximations on trees and point out that an optimal tree approximant exists and is easily computed. The optimal tree approximation inspires a new hierarchical interpretation of the wavelet decomposition and a tree-based wavelet denoising algorithm that suppresses spurious noise bumps.