Learning-based approaches to image compression have demonstrated comparable, or even superior performance when compared to conventional approaches in terms of compression efficiency and visual quality. A typical approach in learning-based image compression is through autoencoders, which are architectures consisting of two main parts: a multi-layer neural network encoder and a dual decoder. The encoder maps the input image in the pixel domain to a compact representation in a latent space. Consequently, the decoder reconstructs the original image in the pixel domain from its latent representation, as accurately as possible. Traditionally, image processing algorithms, and in particular image denoising, are applied to the images in the pixel domain before compression, and eventually even after decompression. The combination of the denoising operation with the encoder might reduce the computational cost while achieving the same performance in accuracy. In this paper, the idea of fusing the image denoising operation with the encoder is examined. The results are evaluated both by simulating the human perspective through objective quality metrics, and by machine vision algorithms for the use case of face detection.
|