Hi Jack The 2D DWT is basically a subband filtering of the image in a combination of highpass and lowpass parts. For example, at level 1, you get 4 images: LL, LH, HL, and HH (L = lowpass, H= highpass)
So LL is lowpass in the x-direction and lowpass in the y-direction. LH is lowpass in the x-direction and highpass in the y-direction.
The wavelet transform then iterates on the LL image to obtain more narrow subbands at successive levels. Another 4 images at each level.
Because features in many real-world signals and images tend to be sparse in these subbands, the wavelet transform coefficients can localize the features of interest and do it more sparsely than other representations. Of course this depends on the type of image (signal).
As one example, images which are piecewise smooth punctuated by abrupt transitions (edges) are sparse in the wavelet domain (represented by relatively few wavelet coefficients) while not so sparsely represented in the Fourier domain (the abrupt transitions require a large number of high frequency terms to approximate).