Nowadays, image and video are the data types that consume most of the resources of modern communication channels, both in fixed and wireless networks. Thus, it is vital to compress visual data as much as possible, while maintaining some target quality level, to enable efficient storage and transmission. Deep learning (DL) image coding solutions, typically using an auto-encoder architecture, promise significant improvements in compression efficiency. These methods adopt a novel coding approach where the encoder-decoder architecture is mostly based on neural networks, notably with analysis and synthesis transforms learned from a large amount of training data and an appropriate loss function. There are limited amount of works targeting the subjective evaluation of DL learning-based image coding solutions compression performance. Since learning-based image codecs use complex and highly non-linear generative models, very different artifacts are present in the decoded images, when compared to conventional artifacts such as blockiness, blurring and ringing distortions typical of traditional DCT block-based and wavelet image coding. In this context, the main objective of this paper is to review, characterize and evaluate some of the most relevant learning-based image coding solutions in the literature. Regarding the subjective quality evaluation, the assessment tests were conducted during the 84th JPEG meeting in Brussels, Belgium, by a mix of experts and naive observers. These subjective tests evaluated the performance of five state-of-the-art learning-based image coding solutions against four conventional, standard image coding (HEVC, WebP, JPEG 2000 and JPEG), applied to eight natural images, at four different coding bitrates. The experimental results obtained show that the subjective quality obtained with the selected learning-based image coding solution are competitive with conventional codecs. Moreover, a thorough inspection on the visual results has revealed some of the typical artifacts encountered in the learning -based image coding.
In this paper, two new end-to-end image compression architectures based on convolutional neural networks are presented. The proposed networks employ 2D wavelet decomposition as a preprocessing step before training and extract features for compression from wavelet coefficients. Training is performed end-to-end and multiple models operating at di↵erent rate points are generated by using a regularizer in the loss function. Results show that the proposed methods outperform JPEG compression, reduce blocking and blurring artifacts, and preserve more details in the images especially at low bitrates.
The Joint Photographic Experts Group (JPEG) is currently in the process of standardizing JPEG XL, the next generation image coding standard that o↵ers substantially better compression efficiency than existing image formats. In this paper, the quality assessment framework of proposals submitted to the JPEG XL Call for Proposals is presented in details. The proponents were evaluated using objective metrics and subjective quality experiments in three di↵erent laboratories, on a dataset constructed for JPEG XL quality assessment. Subjective results were analyzed using statistical significance tests and presented with correlation measures between the results obtained from di↵erent labs. Results indicate that a number of proponents superseded the JPEG standard and performed at least as good as the state-of-the-art anchors in terms of both subjective and objective quality on SDR and HDR contents, at various bitrates.
Quality assessment of images is of key importance for mulmedia applications. In this paper we present a new full reference objective metric to predict the quality of images using deep neural networks. The network makes use of both the color as well as frequency information extracted from reference and distorted images. Our method comprises of extracting a number of equal sized random patches from the reference image and the corresponding patches from the distorted image, then feeding the patches themselves as well as their 3-scale wavelet transform coefficients as input to a neural network. The architecture of the network consists of four branches, with the first three generating frequency features and the fourth extracting color features. Feature extraction is carried out using 12 to 15 convolutional layers and one pooling layer, while two fully connected layers are used for regression. The overall image quality is computed as a weighted sum of patch scores, where local weights are also learned by the network using two additional fully connected layers. The network was trained using TID2013 and tested on TID2013, CSIQ and LIVE image databases. Our results show high correlations with subjective test scores, are generalizable for certain types of distortions and are competitive with respect to the state-of-the-art methods.