Adversarial and adaptive tone mapping operator: multi-scheme generation and multi-metric evaluation

Abstract. Tone mapping is one of the main techniques to convert high-dynamic range (HDR) images into low-dynamic range (LDR) images. We propose to use a variant of generative adversarial networks to adaptively tone map images. We designed a conditional adversarial generative network composed of a U-Net generator and patchGAN discriminator to adaptively convert HDR images into LDR images. We extended previous work to include additional metrics such as tone-mapped image quality index (TMQI), structural similarity index measure, Fréchet inception distance, and perceptual path length. In addition, we applied face detection on the Kalantari dataset and showed that our proposed adversarial tone mapping operator generates the best LDR image for the detection of faces. One of our training schemes, trained via 256  ×  256 resolution HDR–LDR image pairs, results in a model that can generate high TMQI low-resolution 256  ×  256 and high-resolution 1024  ×  2048 LDR images. Given 1024  ×  2048 resolution HDR images, the TMQI of the generated LDR images reaches a value of 0.90, which outperforms all other contemporary tone mapping operators.


Introduction
The dynamic range of an image is described as the variation of luminance in different parts of the image. 1 The majority of real-life images are of low dynamic range (LDR) and are generally represented by an 8-bit integer per pixel format. 2 In contrast, high dynamic range (HDR) uses more bits (16/32) to quantify the pixel values. Even though HDR images can better describe a scene, most common 8-bit display methods are not compatible with HDR images. A costeffective method of displaying HDR images is to convert them into LDR images as opposed to using a 16-bit display setting.
Many tone mapping operators (TMOs) have been proposed and have shown incredible progress in many scenarios. Even though tone mapping is one of the most common ways to perform HDR to LDR conversion, TMOs have many limitations, such as generalization, parameter turning, expert knowledge, and model instability.
The main research question of this work is: Is it possible to propose a TMO that can adaptively tone-map all HDR images with different contents? In this paper, we seek to answer this question by exploring deep learning techniques. We propose a specific deep learning network, a conditional generative adversarial network (cGAN), 3 to adaptively convert an HDR image into an LDR image. Our proposed model is training via HDR-LDR image pairs containing assorted content, including natural scenarios, indoor/outdoor scenes, regular/irregular geometric shapes, colorful/monochrome objects, and drastic luminance changes. *Address all correspondence to Svetlana Yanushkevich, syanshk@ucalgary.ca In general, the implementation of any generative adversarial networks (GANs) requires an objective loss function. In deep learning networks, the loss function measures the difference between the output and input images. Common loss functions are the absolute (called L 1 ) or squared (called L 2 ). In this work, we implement a unique network composed of general cGAN loss, feature matching loss, and perceptual loss. Combining these losses allows the proposed adversarial tone mapping operator (adTMO) to learn the distribution of ideally tonemapped images.
For low-resolution image-to-image translation tasks, cGAN has shown great success in generating high-quality target images. 4 However, for high-resolution image-to-image translation tasks, many problems exist. These problems require complex models to combat tilling patterns, local blurring, and saturated artifacts. 5,6 One of the main deterrences of using high-resolution images is the amount of resources required for training, specifically the amount of time required for convergence. In our work, we explore the possibility of using low-resolution images to train a cGAN model ("U-Net" G and PatchGAN D). We extended the work on adTMO 7 to include additional metrics such as structural similarity index measure (SSIM), perceptual path length (PPL), Fréchet inception distance (FID), and multi-scale structural similarity index measure (MS-SSIM), as well as the performance metrics for face detection. We show that adTMO outperforms most other TMOs when testing on low-and high-resolution HDR images.
This paper aims to design a smart TMO that can adaptively convert complex scenic HDR images into LDR images. The main contributions of our work are listed as follows.
1. We propose adTMO, a variant of cGAN capable of adaptively generating high-resolution and high-quality LDR images. 2. We explore different training and testing schemes, in order to find the best possible combination to generate the highest quality images. 3. We evaluate the performance of adTMO and other TMOs using metrics such as SSIM and FID. In addition, we look at the performance of face detection applied to the different tone-mapped images.
This paper is organized as follows: Section 2 provides a literature review related to TMOs, cGAN, and metrics used for evaluating image-to-image translation tasks. Section 3 describes the architecture of adTMO and the different training/testing schemes we apply. Section 4 details the databases used for training and the preprocessing and postprocessing steps applied to the images. Section 5 summarizes the results of adTMO. Section 6 concludes our paper.

Related Work
In this section, we provide a short review of tone-mapping literature, cGAN, and metrics used for evaluating image-to-image translation tasks.

TMOs
Over the past 20 years, different TMOs have been designed to convert HDR images into LDR images. They can be divided into two categories, global TMOs and local TMOs, based on how they work on image pixels. Global TMOs, such as Larson et al. 8 and Drago et al., 9 apply the same function on all pixels of an image. Global TMOs take less time to convert HDR images, but the output LDR images have reduced contrast. Local TMOs, e.g., Chiu et al. 10 and Tumblin et al., 11 calculate the output pixel value based on the input and its neighboring pixels. Local TMOs can preserve the local structure and generate good contrast but at a cost of more computation time. In addition, most TMOs can only deal with some specific scenarios and do not generalize well with regard to image content.

Generative Adversarial Networks
First proposed by Goodfellow in 2014, 12 GAN has shown great success in many fields. GAN consists of a generator model (G) and a discriminator model (D). The goal of G is to generate fake samples that are real enough to fool D. For D, its goal is to distinguish real samples from collected databases and fake samples generated by G. By training G and D simultaneously, they can compete with each other and achieve an equilibrium allowing G to implicitly learn the distribution of real samples from the collected databases, without the need of complex loss functions.
In this paper, we adopt cGAN, 3 so that the goal of G changes to generating fake samples under new conditions. Many low-resolution image-to-image translation tasks, such as semantic labels to photos and architectural labels to photo, adopt cGAN to generate target images and achieve satisfactory results. 4 Patel et al. 13 conducted a similar work using cGAN to convert HDR images into LDR images, but they only tested with 256 × 256 resolution image crops. A complex multi-scale architecture for high-resolution image-to-image tasks is proposed by Wang et al. 5 and Rana et al. 6 Those proposed networks required high-resolution training images and took many resources including memory and time to train. It took a week to train the multi-scale network 6 using a 12-GB NVIDIA Titan-X GPU on a Intel Xeon e7 core i7 machine.
Due to the downsampling process in the generation part of cGAN, it is challenging for the input images to preserve the fine details. A bilateral filter is a common method to perform edgepreserving and noise-reducing operations which can be adopted to preserve the finer details of an image. 14 A method that optimizes the bilateral filtering method to have a constant time O(1) was proposed by Porikli. 15 Others proposed to preserve edges in images include global image smoothing based on the weighted least squares (WLS) 16 and guided image filter. 17 Extended work on WLS was conducted by Min et al. 18 to create a fast variant, achieving comparable results but requiring much less computational time. Optimization to the guided image filtering technique was performed by incorporating an edge-aware weighting into the guided filter, which greatly reduced the halo artifacts in images. 19 Zheng et al. 20 proposed to create a hybrid model that consists of both a model-driven and data-driven approach to generate a higher quality image. In this paper, we have mainly focused on the data-driven approach via the use of cGAN. However, there is an immense value in a hybrid model; thus we plan to create such a hybrid model in future works by integrating the model-driven portion into our data-driven model.

Evaluation for Image-to-Image Translation Task
Evaluation of image-to-image translation tasks remains an open question. SSIM was proposed by Wang et al. 21 to compare the structural information based on the human visual system. SSIM is commonly used to compare the similarity between the generated images and the ground-truth images. It is defined by Wang et al. 21 as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 3 0 3 where μ is the mean with respect to x or y, σ is the variance with respect to x or y, and C 1 , C 2 are the constants defined as ð0.01LÞ 2 and ð0.03LÞ 2 (L is the dynamic range of the pixels), respectively. Based on SSIM, a metric called multi-scale structural similarity (MS-SSIM) 22 was designed to incorporate the variations of viewing conditions. FID 23 was proposed to capture the similarity between the generated and ground-truth images. To compute FID, both the generated and real images are propagated through a pretrained Inception V3 model 24 and their difference from the last pooling layer is used. A smaller FID represents higher similarity, that is given an FID of 0, two images are identical. The FID is defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 3 6 where μ represents the mean for the real (r) and generated (g) images, Σ represents the covariance for the real (r) and generated (g) images, and tr is the trace linear function.
Similar to FID, PPL 25 uses the pretrained VGG16 26 as embeddings to calculate the perceptual similarity between two images. As with FID, a smaller PPL means that two images have a greater perceptual similarity.
Evaluating the performance of TMOs is also an issue for tone mapping operations. One intuitive solution is a subjective evaluation, which involves human participants ranking LDR images generated by different TMOs based on their subjective preference. Such subjective evaluation takes a lot of time and energy, with the results unstable across different participant groups. 27 Another solution is objective metrics, e.g., tone-mapped image quality index (TMQI) 28 and TMQI-II, 29 widely used in tone-mapping optimization studies. 6,30 TMQI represents a form of indexing that considers the naturalness of tone-mapped LDR images, and structural fidelity between the HDR and tone-mapped LDR images expressed as 28 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 6 0 4 where H and L denote the original HDR image and the tone-mapped LDR image, S and N denote the structural fidelity and statistical naturalness measures, respectively. α and β control the sensitivities of S and N, and 0 ≤ a ≤ 1 adjusts the relative weights between S and N. In this paper, we use the default α, β, and a, recommended by Yeganeh and Wang. 28

Proposed Method
In this section, we will detail our proposed adTMO to convert HDR images into LDR images, the architecture of our G and D, the objective function we use, and the different training/testing schemes we deploy.

cGAN-Based adTMO
In this paper, we construct adTMO based on the principle of cGAN 3 that can translate HDR images into LDR images. Figure 1 shows the training pipeline of our proposed adTMO. We train D using (HDR, LDR) pairs where D is trying to predict (HDR, RealLDR) pair as real and predict (HDR, FakeLDR) pair as fake. G is trying to generate FakeLDR that is real enough so that D is unable to distinguish FakeLDR from RealLDR. We train G and D simultaneously, specifically, in each iteration, we train D twice with weight set to 0.5 [once using the (HDR, RealLDR) pair, and once using the (HDR, FakeLDR) pair].

Network Architectures
We adopt the network architectures from Isola et al., 4 where G is a U-Net 31 and D is a 70 × 70 PatchGAN, 32 both using convolution-BatchNorm-LeakyRelu 33 blocks with α ¼ 0.2. Fig. 1 Training pipeline of cGAN. D is trained to distinguish ground truth LDR image from the generated LDR image. G is trained to generate LDR image that is real enough to fool D. Figure 2 shows the architecture of our G, which is a U-Net consisting of one input block, seven encoding blocks, one bottleneck, seven decoding blocks, and one output block. Each encoding block will down-sample image size by 1∕4 (1/2 of width and 1/2 of height) of the previous block with strides ¼ 2, and each decoding block will up-sample the previous block by 4 times. We added direct connections between the encoding and decoding blocks in order to preserve some of the finer details that may have been lost during the downsampling process. This direct connection, also called skip connection, allows for the gradient of the later layers to propagate back to the earlier layers. Such propagation prompts the model to learn, more efficiently, the mapping between the input and output layers, allowing for the finer details to be recovered from the downsampling process. For the i'th decoding block, we add a direct skip from the last i'th encoding block and concatenate the two blocks in channel before applying the LeakyRelu activation function. The filter size is set to 4 × 4 for all blocks. The filter number is set to 64 for the first encoding block and doubles for each of the next encoding block until it reaches 512, then remains unchanged. The filter number for each decoding block is the same as the encoding block with which it connects. For the bottleneck block, the filter number is set to 512, and the activation function is ReLU. For the output block, the filter number is set to 1 and the activation function is sigmoid. We can feed our G with images of different sizes given it is fully convolutional. Figure 3 shows the architecture of our D. This is a 70 × 70 PatchGAN consisting of one input layer, five encoding blocks, and one output block. The input layer concatenates the input HDR and LDR image in the color channel. Each of the first four encoding blocks will down-sample image size to 1∕4 of the previous block with strides ¼ 2. For the last encoding block, we set strides ¼ 1, leaving the image size unchanged. The number of filters for each encoding blocks is defined as follows 64, 128, 256, 512, and 512. The output block has 1 filter, with strides ¼ 1, a sigmoid activation and outputs a 16 × 16 matrix. Each value in the output matrix maps to a 70 × 70 receptive field in the input layer, identifying this patch as either real or fake. Fig. 2 Architecture of the U-Net generator with one input block, seven encoding blocks, one bottleneck block, seven decoding blocks, and one output block. There is a direct skip connecting each encoding-decoding pair.

Objective Function
As discussed earlier, the goal of G is to convert an HDR image into its tone-mapped LDR version, and the goal of D is to distinguish the generated LDR image from the ground-truth LDR image. The objective of cGAN 3 can therefore be written as where G tries to minimize L G ðG; DÞ, and D tries to minimize L D ðG; DÞ.
In addition to the cGAN loss, we incorporated a feature matching loss L FM based on D. We extract features from multiple layers of D and attempt to match these intermediate representations between the real and generated LDR image, i.e., we minimize the difference between the features via the L1 norm: where D ðiÞ denotes the i'th layer with U i activations of D, and M is the number of layers of D.
In this experiment, we chose five convolution layers in the five encoding blocks of D.
Additionally, we appended the perceptual loss L prp used by Johnson et al., 34 which consists of the features computed from every single layer of the pretrained Inception V3 network, 24 given by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 4 5 7 where F ðiÞ denotes the i'th layer with V i activations of the Inception V3 network, and N is the selected number of layers in the Inception V3 network. In this experiment, we empirically choose five activation layers of the Inception V3 network as F to calculate L prp . With L FM and L prp , we are able to keep both low-level image characteristics and high-level perceptual information. Combining these losses together, our final objective is expressed as where α and β control the weight of L FM and L prp with respect to L cGAN . Here we set α ¼ 10 and β ¼ 10, recommended by Rana et al. 6

Training and Testing
We deploy different training and testing scheme combinations to achieve better performance.

Training
We adopt three training schemes.
• Training scheme A (see purple box in Fig. 4). All HDR images were resized into 256 × 256 resolution, and TMOs were used to generate tone-mapped LDR images. The generated 748 HDR-LDR image pairs were used to train our adTMO. • Training scheme B (see blue box in Fig. 4). This scheme required resizing the HDR images into 1024 × 1024 resolution and using TMOs to generate tone-mapped LDR images. The next step was to randomly crop the corresponding 256 × 256 resolution regions from HDR images and LDR images. We generated 23,936 HDR-LDR image pairs to train the adTMO. • Training scheme C. The resized and cropped 256 × 256 resolution images were combined from training schemes A and B to provide all together 24,684 training pairs.
All training schemes used 256 × 256 resolution images as the training database, so the training process took less time and resources than using high-resolution images. The Adam optimizer 35 was used for all three schemes, with learing rate ¼ 0.0002, β 1 ¼ 0.5, β 2 ¼ 0.999. We set the batch size to 1 and trained until the loss converged. The training process was deployed on an NVIDIA GeForce RTX 2080, and each training process can be finished within 30 h, which is much shorter than the 1-week training time in the muti-scale network propose by Rana et al. 6

Testing
We deploy different testing schemes to evaluate the performance of our proposed adTMO.
• Testing scheme W (see the red box of Fig. 5). Test with resized 256 × 256 resolution images, we resized original HDR images into 256 × 256 resolution then fed them into G and generated the target LDR images.   5 The red, blue, brown, and purple boxes, respectively, show the process of test schemes W, X, Y, and Z.
• Testing scheme X (see the blue box of Fig. 5). Test with resized 1024 × 2048 resolution images, we resized original HDR images into 1024 × 2048 resolution then fed them into G and generated the target LDR images. • Testing scheme Y (see the brown box of Fig. 5). Test with cropped 256 × 256 resolution images, we cropped 1024 × 2048 resolution HDR images into 256 × 256 resolution pieces, then fed them into G, and generated the target LDR pieces. • Testing scheme Z (see the purple box of Fig. 5). Test with 4 × 8 concatenated cropped 256 × 256 resolution images, we cropped 1024 × 2048 resolution HDR images into 32 256 × 256 resolution pieces, fed them into G and generated the target LDR images, and then concatenated them together into the complete 1024 × 2048 resolution images.

Experimental Setup
In this section, we will detail the HDR image databases collected, how we pre-and postprocessed these databases.

Databases
From the many open-source HDR image databases accessible online, we selected our databases based on their content diversity, usability, resolution, and quality. Table 1 summarizes the HDR image databases we used, with the majority being high-resolution. We used 105 images from Kalantari and Ramamoorthi 45 to test adTMO, and 748 images from other 10 databases in Table 1 to train adTMO.

Resizing
We used two collections of 256 × 256 resolution images for training. The first set of images were the original images resized to 256 × 256 resolution (based on training scheme A), whereas the second set of images were randomly cropped from resized 1024 × 1024 images (based on training scheme B). For testing purpose, we resized HDR images into two resolutions: 256 × 256 and 1024 × 2048.

Target LDR Images Generation
All the collected HDR images were unlabeled, i.e., the ground-truth LDR images were unknown. To solve this problem, for each HDR image, we applied 30 different TMOs to get 30 LDR image candidates using the MATLAB HDR TOOLBOX 46 and followed the suggestion to apply GammaTMO after tone-mapping as some specific TMOs require gamma encoding. From these 30 LDR image candidates, we selected the one with the highest TMQI as the ground-truth LDR image. Table 2 summarizes the performance of each TMO when applied to the resized 256 × 256 HDR images. In Table 2, we provide the average TMQI for each TMO after applying it to the whole training set, and the number of LDR images with the highest TMQI among 30 candidates. The last row tabulates the average TMQI of the selected 748 target LDR images. Among the TMOs provided by the MATLAB HDR TOOLBOX, WardHistAdjTMO reaches the highest average TMQI and provides the most ground-truth LDR images (124 images). Apart from RamanTMO, which contributed 0 ground-truth images, all other TMOs provide at least one image for the ground-truth set. This approach to generate target LDR images is similar to the one proposed by Cai et al. 47 to generate high-contrast images. Both our work and theirs aim to reproduce satisfactory natural LDR images. Although we focus on keeping the structural similarity from the HDR images and retaining the color naturalness, Cai et al. aimed to produce a high-contrast image from an under-/over-exposed image. Difference also exists in how to select the "ground-truth" target image. We use an objective metric TMQI to select a ground-truth LDR image, whereas Cai et al. used a subjective ranking to select a ground-truth high-contrast image.

Normalization
We linearly normalized the pixel value of input HDR and LDR images into [0, 1]. For input HDR images, the min/max normalization was applied: where v max and v min are the maximum and minimum pixel values of the input HDR image, respectively. For input LDR image, we applied v out ¼ v in ∕255 to do the normalization so that the pixel values of input LDR image are also in the range of [0, 1].

Luminance Extraction and Color Reproduction
When training and testing our proposed adTMO, we used the luminance channel rather than the RGB channels of the input images to ease the computation complexity and reduce the memory requirement. Before training, we calculated the weighted sum of the RGB channels to extract the luminance channel with the weights from Ref. 6: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 6 6 8 After generating the luminance channel from G, we used C out ¼ C in · L out ∕L in to reproduce the RGB channels, where L in and L out are the input and output luminance channels, respectively, and C in and C out are the RGB channels of the original HDR image and the generated LDR image after color reproduction. After color reproduction, some pixel values would be larger than 255 and they were reduced to 255 to maintain the 8-bit RGB range.

Results
In this section, we discuss the results of our proposed adTMO, in terms of multiple metrics of the generated LDR images in different training/testing schemes. Figure 6 demonstrates one scenario of LDR content in the RGB channels after color reproduction, in different training/testing schemes. We omit the generated LDR content in testing scheme Y because they were the images used for constructing the images in testing scheme Z. LDR images in testing scheme W [(a), (d), and (g)] have higher TMQI, but such conversion is meaningless, as many details are lost in the resizing operation. LDR images of testing scheme X, Z in training scheme A [(b), (c)] have lower TMQI with shadows around the flowers, because we only trained adTMO with resized 256 × 256 images so that many fine details from the original images were lost. After we add cropped images into training databases, adTMO was able to learn how to keep the details of the original images. Therefore, the LDR images of testing scheme X in training scheme B, C [(e) and (h)] look more natural and have higher TMQI. The LDR images of testing scheme Z [(c), (f), and (i)] show "concatenated" edges, because cropping a complete image into pieces and generating their tone-mapped LDR images individually break the internal connections between these pieces. Future work is required to generate these individual images and combine them in such a way that these edges are removed while maintaining the high contrast in each individual image. Some finer details are not kept well by using the proposed adTMO. It should be noted that edge-preserving techniques such as bilateral filtering or guided image filtering have shown great promise in alleviating this problem. Further experimentation is required, and we plan in the future to incorporate these techniques into a deeplearning based TMO to create a more robust operator. We chose training scheme C to train the proposed adTMO, testing scheme W to tone-map 256 × 256 resolution images and testing scheme X to tone-map 1024 × 2048 resolution images given that train scheme C has the larger data set for training, and the resulting LDR images [(g) and (h)] have higher TMQI.
In Fig. 7, we demonstrate qualitative comparisons of adTMO and other top-9-ranked TMOs that produce the highest TMQI for four different scenarios, in generating 1024 × 2048 resolution images. In most scenarios, including indoor/outdoor, irregular geometric shape, large colors range, and drastic luminance changes, our adTMO outperforms all other TMOs on TMQI metric. As well, the LDR images generated by adTMO do not suffer contrast problems like other LDR images. Tables 3 and 4 list different metrics mentioned in Sec. 2 of the test dataset tone-mapped by 30 TMOs and the proposed adTMO. We modify the PPL so that it can be used to evaluate TMOs. Specifically, the PPL is calculated as follows: PPL ¼ E½ 1 ϵ 2 dðgflerp½fðz 1 Þ; fðz 2 Þ; tg; gflerp½fðz 1 Þ; fðz 2 Þ; t þ ϵgÞ: (10) where fðzÞ represent the function mapping latent space to style vector in adTMO, t is uniformly distributed between 0 and 1, lerp represents for linear interpolation, g is the generator function to create image, d measures the perceptual distance between the images, and ϵ is set as 10 −4 here. In generating 256 × 256 resolution images, our proposed adTMO outperforms all other TMOs with regard to the metric FID and outperforms most of TMOs with regard to other metrics. In generating 1024 × 2048 resolution images, our proposed adTMO outperforms all other TMOs with regard to the metrics TMQI, SSIM, and MS-SSIM and outperforms most other TMOs with regard to FID and PPL. We also divided the images into two sets, one for indoor scenes and another for outdoor scenes. Both reach high TMQI (0.89 and 0.90) for 1024 × 2048 resolution images. Our deep learning-based tone mapping algorithm uses a mixture of best features from other TMOs. In the absence of interactive parameter adjustment as it is not always available, our approach offers the best TMQI. In addition to the above-mentioned metrics, we also applied a face detection technique to the generated 1024 × 2048 LDR images to measure the face detection accuracy as HDR-LDR translation is often used in security and healthcare applications. The face detection accuracy is defined as acc ¼ TP∕ðTP þ FNÞ, where TP and FN represent the number of faces that are detected and not detected, respectively. The face detector used in this paper is the Haar cascades face detector, 48 and the test set we used for evaluation is by Kalantari and Ramamoorthi, 45 which consists of HDR images containing human faces. Our proposed adTMO reaches the highest face detection accuracy compared with other TMOs. The main reason contributing to this is that we use the pretrained Inception V3 network 24 to derive the perceptual loss, so our generated LDR images look more natural, and the face detector trained on natural images can achieve higher accuracy in LDR images generated by our adTMO. Overall, adTMO output has the highest quality, regarding high-resolution 1024 × 2048 images and is comparable to the results for 256 × 256 images.

Conclusion
We propose an adTMO, which can adaptively generate high-resolution and high-quality LDR images. We explore different training and testing schemes and find the best possible combination to generate the highest quality images. We use multiple metrics including TMQI, SSIM, MS-SSIM, and face detection accuracy to measure the performance of the proposed adTMO. When testing on low-resolution LDR images, our adTMO has the highest performance on the FID metric across all other TMOs. When testing on high-resolution LDR images, our adTMO has the highest performance on TMQI, SSIM, MS-SSIM, and face detection accuracy over all other TMOs. Looking specifically at the TMQI metric, the proposed adTMO achieves a TMQI of 0.90 AE 0.06, which is superior to the DeepTMO's 6 0.88 AE 0.06. In addition, we have the advantage in the training time where we spend 30 h for training, which is much short than DeepTMO's 1 week. engineering and customized real-time digital signal processing algorithms in the context of mobile embedded systems and biomedical instrumentation. He is a senior member of IEEE.
Svetlana Yanushkevich received her Dr.Tech.Sc. (Dr. Habilitated) degree from the Warsaw University of Technology in 1999. She is currently a professor in the Department of Electrical and Software Engineering at the University of Calgary. She is directing the Biometric Technologies Laboratory and conducting research in the area of biometric-based authentication technologies. She is a senior member of IEEE.