## 1.

## Introduction

To obtain high visual quality in digital camera systems, a high spatial resolution is required. The most direct way to increase spatial resolution in a camera is to reduce the size of the sensor pixel elements or to increase the chip size. However, there are applications for which this may not be feasible. For example, to overcome sensor noise at low exposure levels, smaller pixels have to be exposed for a longer duration. Consequently, in terms of hardware, a high spatial resolution is in contradiction to having a high temporal resolution, which is a requirement for high-quality video systems. Additionally, a longer exposure duration means that frames become susceptible to motion blur effects. Super-resolution reconstruction (SRR), as an alternative, is a signal processing approach to enhance spatial resolution. Multiple low-resolution (LR) images, captured in a sequence, of the same original scene are used to construct one high-resolution (HR) image. A further introduction to super resolution is given in Sec. 1.3.

Camera sensors that use a single fixed-exposure duration for each captured image have dynamic range limitations. This is particularly the case for compact cameras that have a smaller pixel size. The limitation is experienced e.g., when capturing an image of an outdoor environment on a sunny day or an indoor scene with a window. Certain image areas will then be overexposed or underexposed, which has a large impact in terms of visual quality. The human visual system (HVS) is able to perceive detailed information in bright and dim areas simultaneously, and thus it is undesired of the camera system to impose this limitation. By taking two or more images of the same scene, using different exposure settings, a single high dynamic range (HDR) image can be created, which is free from saturation. Digital display devices, for their part, also have a low dynamic range (LDR) compared to real-world scenes. The dynamic range limitation for display devices, however, can practically be overcome by tonemapping HDR image information to a LDR image in such a way that it, to the HVS, resembles the HDR image that it was created from.^{1} Before proceeding to a review of relevant literature on HDR as well as SRR, the HVS and how light is interpreted is briefly introduced.

## 1.1.

### The Human Visual System

Understanding the main principles of the HVS is of crucial importance to researchers in image processing. When dealing with standard 8-bit pixel image formats, much is abstracted into how those images are coded. For example, in the standard red, green, and blue (RGB) color spaces, data is coded linearly with respect to the perception of brightness in the HVS. This implies that image operations or filters naturally work as expected. Also, storage, compression or filtering of image data in this domain makes sense, because of the property that pixel value errors have the same perceptual impact across the whole range of pixel values.

Light that is incident on the camera sensor is nonlinear with regard to the HVS in several respects. As will be presented, various camera components are designed to mimic the HVS in order to produce an output image, which is perceived to be of high fidelity. Consider a light spectrum observed by the human eye. The incident energy is registered by the photoreceptors of the eye, the three different types of cone cells (corresponding to, roughly, red, green and blue light), that are responsible for human color vision. The three cone types each have their respective spectral responses. Their combined spectral response can be described in terms of grayscale vision by a single function of wavelength, the luminous efficacy curve. For simplicity as well as to stay within the main scope of the paper, the discussion of the HVS is from here on reduced to grayscale vision. The luminous efficacy curve is a dimensionless function that tells what fraction of radiant power at any given wavelength contributes to the luminous properties of the HVS. It is nonzero for wavelengths between, roughly, 380 to 700 nm, the so-called visible spectrum.^{2}

The term for the incident power spectrum of light is irradiance. The mathematical inner product of irradiance with the luminous efficacy curve is denoted illuminance. As opposed to irradiance, a radiometric unit, illuminance is a term defined based on the HVS, and is thus a photometric unit. While illuminance is the light registered by the human photoreceptors, the cones, it is still nonlinear with respect to perceived brightness. Illuminance changes of an absolute measure have a much higher perceived impact at low illuminance levels as compared to at high illuminance levels. Perceived brightness is subjective. Based on tests of human vision on large sample sizes, standardized units such as lightness and luma are defined and used in color spaces for LDR image processing. Table 1 summarizes the terms presented in this section and how they relate to the HVS and digital cameras. An incident light spectrum, irradiance, on a camera sensor is filtered according to the same principle as that of the cones of the eye. Thus, light contributes differently, depending on its wavelength, to the amount of current that is excited from sensor elements. The term illuminance is used here also when referring to the sensor output due to the filtered light, even though the term is really defined for the HVS. An image domain that is an approximation of perceived brightness, i.e., which is approximately linear to perceived brightness, will henceforth be called a perceptually uniform (PU) domain. Examples of (grayscale) PU domains are lightness and luma; however, they are only designed for LDR images. Any PU domain will correspond to a concave nonlinear mapping from the illuminance domain.

## Table 1

Description of light units in the context of the HVS and digital cameras.

Unit | Description | HVS | Camera |
---|---|---|---|

Irradiance (Radiometric) | Incident power spectrum I(λ) | Sensitive to light of wavelength λ∈ [380,700] nm | Filter I(λ) according to approximated luminous efficacy |

Illuminance (Photometric) | ∫−∞∞I(λ)V(λ)dλ, where V(λ) is the luminous efficacy curve of the HVS | The power registered by the photoreceptors | Filtered light exposes the sensor to approximate illuminance of the HVS |

Perceived brightness, e.g., Lightness, Luma (Photometric) | A nonlinear function of illuminance | A subjective measure | Standardized units such as lightness and luma are approximations of perceived brightness |

## 1.2.

### High Dynamic Range Images

The HVS can adapt to illuminance levels of about 10 orders of magnitude, and at any given adaptation level about five orders of magnitudes are perceived. As mentioned, for a given exposure setting, camera sensors have a lower dynamic range than that of many natural scenes (and of the HVS), which will cause saturation of sensor elements, either from over- or underexposure. Thus, the sensors are said to have a LDR. In HDR image reconstruction, a single HDR image is constructed from two or more LDR images captured with different exposure settings. The LDR images must be aligned and should be captured in a quick sequence, such that local motion in the scene is kept to a minimum. The images are merged by weighted average in a PU image domain. Weights of saturated pixels are naturally set to zero. Given a set of LDR images in the pixel value domain, their corresponding sensor exposures are retrieved by an inverse mapping with the camera response function (CRF). Illuminance domain images are subsequently obtained by scaling the images with the inverse of their respective exposure durations. Finally, the images are transformed to e.g., the log-illuminance domain, which is an approximation of lightness (because perceived brightness of the HVS is much closer to logarithmic than linear in illuminance), where the weighted average is taken.^{3}

To reproduce an image of an HDR scene on a display device, e.g., a computer monitor, it has to be tonemapped to the lower dynamic range of the display device.^{1}^{,}^{4} The tonemapping essentially compresses the dynamic range uniformly in a suitable domain, but more sophisticated tonemappers also do local processing. For example, image contrasts may be compressed more on edges between image segments than on details within image segments. Additionally, suitable color spaces are used in order to maintain high color fidelity. The output LDR image, typically stored in a standard 8-bit format such as *sRGB*, is perceived by the HVS to still contain HDR information. HDR applications are recently starting to reach the general public. For example, smartphone applications that take two images in a sequence and generate a merged HDR image have reached widespread use, and similar features are common in modern commercial cameras. The result is often satisfactory, although artifacts sometimes occur due to spatial misalignment.

## 1.3.

### Super-Resolution Reconstruction

The signal processing technique SRR makes use of multiple degraded LR images of the same original scene.^{5}^{,}^{6} In order to be useful, each image must contain new information of the scene, which is the case if the images are shifted relative to each other on a subpixel scale (e.g., from natural camera movement) or if they are blurred by different blur functions. In the corresponding mathematical model, this means that the images each contribute with linearly independent observations of the desired HR image representation. To reconstruct an HR image, an inverse problem is solved, including image registration (alignment) with high precision as well as blur estimation, either as pre-processing or jointly with estimating the HR image. A property of SRR is that the resulting inverse problem is fundamentally ill-conditioned, which presents big challenges.^{7}^{,}^{8} If a larger number of observations are used, the conditioning improves, but only slightly. Furthermore, because complex motion between frames is undesired, a relatively small number of images is preferred so that they are close temporally. Thus, to improve the conditioning, a regularization term is often added to the problem formulation. The regularization function should enforce a solution that matches the statistics of the image, or rather of natural images in general, since the statistics of the unknown image are unavailable.

Bayesian interpretations, in which the unknown image is viewed as stochastic, of the SRR problem are common.^{9}^{,}^{10} Similarly to deterministic methods, they lead to minimizing a (regularized) objective function. A prior is assumed on the HR image and often also on other system parameters that should be estimated. For example, fine-scale image registration can be cast as a stochastic problem, with a prior selected for the registration error that remains after a rough pre-processing registration. Blind super resolution (BSR) addresses the joint estimation of HR image and blur kernels (that include subpixel level shifts).^{11} BSR has clear similarities with multichannel blind deconvolution,^{12} with the added extension of downsampling. BSR is very challenging in terms of finding the global optimal point, due to its nonconvex structure. Often, simple parameterizations of the blur kernel are made, or it is assumed static to model only the blur effect of the sensor. Applications of SRR exist both for offline processing, such as astronomy and forensics, and for real-time processing, as for video streaming. Work on real-time implementations are increasingly common, including SRR implementations on custom hardware such as Field Programmable Gate Arrays. Custom sensors may also be used, that can e.g., increase pixel size for desired improvement of temporal resolution in situations with moving objects that are susceptible to motion blur.^{13} The spatial resolution is then maintained by performing SRR on the frames with reduced resolution. Optical flow models are implemented to handle local object motion within the image frames.

## 1.4.

### Super-resolution Reconstruction of HDR Images

Traditionally, SRR is performed on similarly exposed LDR pixel valued images. However, during the past few years SRR has also been applied to HDR images.^{14}^{,}^{15} The combined dynamic range of the reconstructed HR, HDR image has no restrictions, as the exposure durations of the input images can be selected freely. A good reconstruction result requires accurate image registration, both for spatial shifts and to photometrically align the images in an HDR image domain. Facing both geometric and photometric registration adds some extra challenges, as compared to only geometric registration.^{16} Ambitious methods implement optical flow techniques to handle local spatial motion within the images.^{17} If the input images to the HDR SRR are given as 8-bit pixel valued images, the CRF is needed in order to retrieve the illuminance information from the earlier stage of the camera processing chain. However, if the device was designed to perform HDR SRR, the illuminance information (of a higher bit depth) could be more readily available.

A common factor for the published work on SRR for HDR images is that the reconstruction takes place in the illuminance domain, which, as discussed in Sec. 1.1, is highly nonlinear to human perception.^{18} This is opposed to the case of LDR SRR methods, where the reconstruction is performed in the PU pixel value domain. Because illuminance errors at low illuminance levels have such a high perceived impact, reconstruction errors in dim image areas cause noticeable artifacts. Therefore, in this work, the HDR SRR is performed in a tonemapped, PU domain. The HDR illuminance information is thus mapped by a nonlinear tonemapping function, modifying the inverse SRR problem from a least squares (LS) problem to a nonlinear problem of equal dimension. Note that the choice of tonemapping function is nontrivial. They all aim to mimic the HVS but should merely be seen as approximations.^{1}^{,}^{4} In the formulation of the proposed method, a simple tonemapping function is used, which nevertheless captures the main characteristics of the HVS well (but should not be the preferred choice for the application of visualizing HDR data). Methods for image registration or the estimation of other system parameters are outside the scope of this paper, the focus is on the formulation of the inverse problem used in the SRR. The remainder of this paper is organized as follows. Section 2 describes the camera model for the image data acquisition. In Sec. 3, a method for SRR on HDR images in a PU domain is proposed. Section 4 provides experimental results for the proposed method and a related illuminance domain approach. Finally, Sec. 5 concludes the paper.

## 2.

## Camera Model

When acquiring an image, the camera sensor is illuminated by a real-world scene for a given exposure duration. Let the image $\mathbf{X}$, of size ${X}_{1}\times {X}_{2}$, denote an HR representation with dynamic range ${X}_{\mathrm{max}}/{X}_{\mathrm{min}}$ of an original continuous scene, and let $\mathbf{x}$ be its corresponding vectorized representation of length $({X}_{1}{X}_{2})\times 1\triangleq n\times 1$. It is assumed that the camera provides a set of low-resolution and LDR images ${\mathbf{y}}_{k}$ according to the expression

## (1)

$${\mathbf{y}}_{k}=f[\mathrm{\Delta}{t}_{k}(\mathbf{D}{\mathbf{C}}_{{\mathbf{H}}_{k}}\mathbf{x}+{\mathbf{n}}_{k})]+{\mathbf{q}}_{k},\phantom{\rule[-0.0ex]{2em}{0.0ex}}k=1,\dots ,K.$$For each of the multiple observations, ${\mathbf{C}}_{{\mathbf{H}}_{k}}$, of size $n\times n$, performs 2-D convolution on the vectorized HR image $\mathbf{x}$. Its convolution kernel ${\mathbf{H}}_{k}$, of support ${H}_{1}\times {H}_{2}$, represents blurring as well as planar small-scale spatial shifts (assuming that rough image registration can be performed as pre-processing) for ${\mathbf{y}}_{k}$ relative to a reference image (e.g., ${\mathbf{y}}_{1}$). The downsampling matrix $\mathbf{D}$, of size $(n/{L}^{2})\times n$, decimates the spatial resolution a factor $L$ in the $x$- and $y$-directions, and $\mathrm{\Delta}{t}_{k}$ is the exposure duration. Additive Gaussian noise ${\mathbf{n}}_{k}$ of variance ${\sigma}_{n}^{2}$ models the noise in the image sensor. In addition, by somewhat exaggerating the value of ${\sigma}_{n}^{2}$, other error sources can be contained in ${\mathbf{n}}_{k}$. The ${\mathbf{y}}_{k}$ images are quantized to 8-bit pixel values, with uniformly spaced quantization levels $Y\in \{0,\dots ,1\}$. This quantization effect is modeled by ${\mathbf{q}}_{k}$, while quantization in the A/D converter to a higher bit depth is considered to be part of ${\mathbf{n}}_{k}$.

The low-resolution exposure of the camera sensor is ${\mathbf{e}}_{k}=\mathrm{\Delta}{t}_{k}(\mathbf{D}{\mathbf{C}}_{{\mathbf{H}}_{k}}\mathbf{x}+{\mathbf{n}}_{k})$. The ${\mathbf{e}}_{k}$ vector is, similar to ${\mathbf{y}}_{k}$, ${\mathbf{n}}_{k}$, and ${\mathbf{q}}_{k}$, of size $(n/{L}^{2})\times 1$. At each pixel $i$, the exposure ${[{\mathbf{e}}_{k}]}_{i}$ is mapped by the pixelwise, nonlinear CRF, $f$, to a pixel value in the output image. A reasonable (simplified) model of a real CRF, adopted for the simulations in this paper, is

## (2)

$$f(E)=\{\begin{array}{ll}0& ,E\le {E}_{\mathrm{min}}\\ {\left(\frac{E-{E}_{\mathrm{min}}}{{E}_{\mathrm{max}}-{E}_{\mathrm{min}}}\right)}^{{\gamma}_{\mathrm{LDR}}}& ,{E}_{\mathrm{min}}\le E\le {E}_{\mathrm{max}}\\ 1& ,E\ge {E}_{\mathrm{max}}\end{array},$$## 2.1.

### Alternative Camera Models

It is interesting to note that the camera model used for LDR SRR on pixel domain images differs from the camera model for HDR SRR methods, without exception. The forward model assumed for LDR SRR is

## (3)

$${\mathbf{y}}_{k}=\mathbf{D}{\mathbf{C}}_{{\mathbf{H}}_{k}}f(\mathrm{\Delta}t\mathbf{x})+{\mathbf{n}}_{k},\phantom{\rule[-0.0ex]{2em}{0.0ex}}k=1,\dots ,K,$$^{5}From the set of observations, $\{{\mathbf{y}}_{k}\}$, the desired pixel valued HR image $\mathbf{z}=f(\mathrm{\Delta}t\mathbf{x})$ is to be reconstructed. The model in Eq. (3) is thus used in the formulation of the inverse problem, where it relates the unknown HR image to the actual observed LR data. The inconsistency in the choice of camera model gives rise to the question of what model, Eq. (1) or Eq. (3), is closer to the real physics of the camera. One of the two, LDR SRR or HDR SRR, has a larger model mismatch that comes into play when real-world data is considered. The choice of camera model is not investigated further in this paper.

## 3.

## Image Reconstruction in a Perceptually Uniform Domain

To reconstruct, or rather to find a good estimate of $\mathbf{x}$, an inverse problem should be formulated and solved. To do so, knowledge of the system parameters in ${\mathbf{H}}_{k}$, including blur functions and subpixel-precision relative motion between image frames, is necessary. Assuming these, as well as the CRF $f$, to be known or to have been estimated, a compact notation is introduced in the beginning of this section to aid the inverse problem formulations. Then, in the remainder of the section, three different HDR SRR formulations are described, one which has an objective function that is expressed in the illuminance domain, and the other two in which a PU domain is employed. For mathematical convenience, the impact of the quantization noise ${\mathbf{q}}_{k}$ will be neglected in the inverse problem formulation (quantization is nevertheless used when generating ${\mathbf{y}}_{k}$), as it typically has little impact on reconstruction results compared to other error sources, e.g., ${\mathbf{n}}_{k}$ (and in the more general case, to errors in system parameter estimates).

To align the observed images ${\mathbf{y}}_{k}$ photometrically, the corresponding illuminance domain images [of the same size $(n/{L}^{2})\times 1$]

are recovered by mapping each pixel value in ${\mathbf{y}}_{k}$ with the inverse (not in the strict sense, due to the clipping effect) CRF,## (5)

$$g(Y)={Y}^{(1/{\gamma}_{\mathrm{LDR}})}({E}_{\mathrm{max}}-{E}_{\mathrm{min}})+{E}_{\mathrm{min}},$$## (6)

$${\mathbf{W}}_{k}{\mathbf{i}}_{k}={\mathbf{W}}_{k}[g({\mathbf{y}}_{k})/\mathrm{\Delta}{t}_{k}]=\phantom{\rule{0ex}{0ex}}={\mathbf{W}}_{k}(\mathbf{D}{\mathbf{C}}_{{\mathbf{H}}_{k}}\mathbf{x}+{\mathbf{n}}_{k}),\phantom{\rule[-0.0ex]{2em}{0.0ex}}k=1,\dots ,K,$$More generally, $\mathbf{W}$ could be designed to weigh observed data differently, as a function of e.g., its pixel value and exposure duration, according to the noise properties of Eq. (1).^{14} In the image reconstruction, the objective is to minimize the difference, in some sense, between $\mathbf{W}\mathscr{H}\mathbf{x}$ and the faithfully (unsaturated) observed information $\mathbf{W}\mathbf{i}$. Thus, a distance measure of the residual function $\mathbf{r}(\mathbf{x})$ should be minimized in order to obtain

## (8)

$$\begin{array}{c}\widehat{\mathbf{x}}=\underset{\mathbf{x}}{\mathrm{arg}\mathrm{min}}\rho [\mathbf{r}(\mathbf{x})]=\sum _{i}h[{r}_{i}(\mathbf{x})],\phantom{\rule[-0.0ex]{2em}{0.0ex}}{h}_{i}[{r}_{i}(\mathbf{x})]\ge 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall i\end{array},$$## 3.1.

### The Least Squares Solution

Under the assumption of additive Gaussian noise $\mathbf{n}$, the LS estimate of $\mathbf{x}$ is optimal in the maximum likelihood sense. However, the corresponding inverse problem is equivalent to minimizing the sum of squared reconstruction errors in the illuminance domain, a domain that is nonlinearly related to perceived error by the HVS, yet it is still used for SRR of HDR images.^{14}^{,}^{15}^{,}^{17} The LS estimate is included mainly as a benchmark for the proposed method. In order to improve the condition number of the inverse problem and provide a stable solution, a regularization term is appended. Thus, the LS estimate is

## (9)

$$\begin{array}{c}{\widehat{x}}_{\mathrm{LS}}=\underset{x}{\mathrm{arg}\mathrm{min}}{\Vert \left[\begin{array}{c}\mathbf{W}\mathscr{H}\\ \sqrt{\lambda}\mathrm{\Gamma}\end{array}\right]\mathbf{x}-\left[\begin{array}{c}\mathbf{Wi}\\ \mathbf{0}\end{array}\right]\Vert}_{2}^{2}\triangleq \underset{\mathbf{x}}{\mathrm{arg}\mathrm{min}}{\parallel {\mathbf{r}}_{\mathrm{LS}}(\mathbf{x})\parallel}_{2}^{2},\end{array}$$## (10)

$$\begin{array}{c}{\mathcal{O}}_{1}={\parallel {\mathbf{r}}_{\mathrm{LS}}(\mathbf{x})\parallel}_{2}^{2}=\sum _{i=1}^{m+n}{r}_{\mathrm{LS},i}^{2}(\mathbf{x}),\end{array}$$## 3.2.

### The Proposed Objective Function

For the proposed objective function, the residual function $\mathbf{r}(\mathbf{x})$ is formulated in such a way that errors perceived by the HVS are penalized uniformly in the reconstruction. Thus, for the estimate of $\mathbf{x}$,

## (11)

$$\begin{array}{ccc}{\widehat{x}}_{\mathrm{PU}}& =\underset{x}{\mathrm{arg}\mathrm{min}}{\Vert \begin{array}{c}W[\stackrel{\sim}{f}(\mathscr{H}x)-\stackrel{\sim}{f}(i)]\\ \sqrt{\lambda}\mathrm{\Gamma}\stackrel{\sim}{f}(\mathbf{x})\end{array}\Vert}_{2}^{2}& \triangleq \underset{\mathbf{x}}{\mathrm{arg}\mathrm{min}}\parallel {\mathbf{r}}_{\mathrm{PU}}(\mathbf{x}){\parallel}_{2}^{2},\end{array}$$^{3}the illuminance information, $\mathbf{i}$, is normalized before applying $\stackrel{\sim}{f}$. The same notation is kept. The normalized illuminance information is then tonemapped to a PU domain by $\stackrel{\sim}{f}(\xb7)={(\xb7)}^{{\gamma}_{\mathrm{HDR}}}$. In this work, no special care is taken to customize the value of ${\gamma}_{\mathrm{HDR}}$ depending on the dynamic range of $\mathbf{i}$. It should be noted that, to instead minimize $\stackrel{\sim}{f}[\mathbf{W}(\mathscr{H}\mathbf{x}-\mathbf{i})]$, i.e., the tonemapped difference, would be incorrect, since the absolute illuminance level determines how sensitive the perception of the HVS is. The objective function of Eq. (11) is

## (12)

$$\begin{array}{c}{\mathcal{O}}_{2}={\parallel {\mathbf{r}}_{\mathrm{PU}}(\mathbf{x})\parallel}_{2}^{2}=\sum _{i=1}^{m+n}{r}_{\mathrm{PU},i}^{2}(\mathbf{x}),\end{array}$$To solve the nonlinear minimization problem in Eq. (11), an iterative nonlinear programming method must be employed. An initial estimate, ${\mathbf{x}}^{(0)}$, may be taken as an interpolated version of an ${\mathbf{i}}_{k}$ image. Furthermore, let ${\mathbf{x}}^{(k)}$ be an estimate of $\mathbf{x}$ at iteration $k$, then the estimate is updated according to

where ${\mathbf{d}}^{(k)}$ is the search direction and ${\alpha}^{(k)}$ is the step length at iteration $k$. The search direction is of great importance with regard to the convergence rate. The 1st order gradient descent method is typically very slow to converge for the SRR problem. Because of that, higher order information is considered in order to determine ${\mathbf{d}}^{(k)}$. A 2nd order Taylor approximation of $\mathcal{O}=\rho [\mathbf{r}(\mathbf{x})]$ in the neighborhood of ${\mathbf{x}}^{(k)}$, as used in Newton methods, is## (14)

$$\mathcal{O}({\mathbf{x}}^{(k)}+{\mathbf{d}}^{(k)})\approx \mathcal{O}({\mathbf{x}}^{(k)})+{\mathbf{G}}^{T}({\mathbf{x}}^{(k)}){\mathbf{d}}^{(k)}+{\mathbf{d}}^{(k)T}\mathbf{H}({\mathbf{x}}^{(k)}){\mathbf{d}}^{(k)},$$## (15)

$$\mathbf{G}({\mathbf{x}}^{(k)})=[{\nabla}_{x}\mathcal{O}(\mathbf{x})]{|}_{{\mathbf{x}}^{(k)}}=2\sum _{i=1}^{m+n}{r}_{i}(\mathbf{x}){\nabla}_{x}{r}_{i}(\mathbf{x})=2{\mathbf{J}}_{\mathbf{r}}^{T}({\mathbf{x}}^{(k)})\mathbf{r}({\mathbf{x}}^{(k)}),$$## (16)

$$\mathbf{H}({\mathbf{x}}^{(k)})=[{\nabla}_{x}^{2}\mathcal{O}(\mathbf{x})]{|}_{{\mathbf{x}}^{(k)}}=2\sum _{i=1}^{m+n}[{\nabla}_{x}{r}_{i}(\mathbf{x}){\nabla}_{x}^{T}{r}_{i}(\mathbf{x})+{r}_{i}(\mathbf{x}){\nabla}_{x}^{2}{r}_{i}(\mathbf{x})]{|}_{{\mathbf{x}}^{(k)}}=2{\mathbf{J}}_{\mathbf{r}}^{T}({\mathbf{x}}^{(k)}){\mathbf{J}}_{\mathbf{r}}({\mathbf{x}}^{(k)})+\mathbf{S}({\mathbf{x}}^{(k)}).$$Note that $\mathcal{O}(\mathbf{x})$, $\mathbf{r}(\mathbf{x})$, $\mathbf{G}(\mathbf{x})$, and $\mathbf{H}(\mathbf{x})$ all depend on $\mathbf{x}$, and thus need to be re-evaluated for each update in the iterative optimization method. However, for compactness, this dependency will not always be emphasized by writing out the argument. The commonly used Gauss-Newton method fits nicely for the optimization problem in Eq. (11). To see this, consider the nonlinear residual function

## (17)

$$\begin{array}{cc}{\mathbf{r}}_{\mathrm{PU}}(\mathbf{x})& =\left[\begin{array}{c}{W}_{11}[{\left(\sum _{j=1}^{n}{\mathscr{H}}_{1j}{x}_{j}\right)}^{\gamma}-{i}_{1}^{\gamma}]\\ \vdots \\ {W}_{\mathrm{mm}}[{\left(\sum _{j=1}^{n}{\mathscr{H}}_{mj}{x}_{j}\right)}^{\gamma}-{i}_{m}^{\gamma}]\\ \sqrt{\lambda}\sum _{j=1}^{n}{\mathrm{\Gamma}}_{1j}({x}_{j}^{\gamma})\\ \vdots \\ \sqrt{\lambda}\sum _{j=1}^{n}{\mathrm{\Gamma}}_{nj}({x}_{j}^{\gamma})\end{array}\right]\end{array},$$## (18)

$$\begin{array}{cc}{\mathbf{J}}_{\mathbf{r}}& =\left[\begin{array}{ccc}\frac{\partial {r}_{1}}{\partial {x}_{1}}& \dots & \frac{\partial {r}_{1}}{\partial {x}_{n}}\\ \vdots & & \vdots \\ \frac{\partial {r}_{p}}{\partial {x}_{1}}& \dots & \frac{\partial {r}_{m}}{\partial {x}_{n}}\\ \frac{\partial {r}_{m+1}}{\partial {x}_{1}}& \dots & \frac{\partial {r}_{m+n}}{\partial {x}_{n}}\\ \vdots & \ddots & \vdots \\ \frac{\partial {r}_{m+1}}{\partial {x}_{1}}& \dots & \frac{\partial {r}_{m+n}}{\partial {x}_{n}}\end{array}\right]=\left[\begin{array}{ccc}{W}_{11}\gamma {\mathscr{H}}_{11}{\left(\sum _{j=1}^{n}{\mathscr{H}}_{1j}{x}_{j}\right)}^{\gamma -1}& \dots & {W}_{11}\gamma {\mathscr{H}}_{1n}{\left(\sum _{j=1}^{n}{\mathscr{H}}_{1j}{x}_{j}\right)}^{\gamma -1}\\ \vdots & & \vdots \\ {W}_{\mathrm{mm}}\gamma {\mathscr{H}}_{m1}{\left(\sum _{j=1}^{n}{\mathscr{H}}_{1j}{x}_{j}\right)}^{\gamma -1}& \dots & {W}_{\mathrm{mm}}\gamma {\mathscr{H}}_{\mathrm{mn}}{\left(\sum _{j=1}^{n}{\mathscr{H}}_{1j}{x}_{j}\right)}^{\gamma -1}\\ \sqrt{\lambda}{\mathrm{\Gamma}}_{11}\gamma {x}_{1}^{\gamma -1}& \dots & \sqrt{\lambda}{\mathrm{\Gamma}}_{1n}\gamma {x}_{n}^{\gamma -1}\\ \vdots & \ddots & \vdots \\ \sqrt{\lambda}{\mathrm{\Gamma}}_{n1}\gamma {x}_{1}^{\gamma -1}& \dots & \sqrt{\lambda}{\mathrm{\Gamma}}_{nn}\gamma {x}_{n}^{\gamma -1}\end{array}\right]\end{array}$$## (19)

$${\mathbf{d}}_{\mathrm{GN}}^{(k)}=-[{\mathbf{J}}_{\mathbf{r}}^{T}({\mathbf{x}}^{(k)}){\mathbf{J}}_{\mathbf{r}}{({\mathbf{x}}^{(k)})]}^{-1}[{\mathbf{J}}_{\mathbf{r}}^{T}({\mathbf{x}}^{(k)})\mathbf{r}({\mathbf{x}}^{(k)})],$$## 3.3.

### Reconstruction Using Robust Norm and Robust Regularization

So far, the methods of reconstructing $\mathbf{x}$ have involved minimizing an objective function that is the L2-norm of a residual vector. The LS solution is suitable when the residual, at the optimum, has a Gaussian distribution. This was the case for the (unsaturated) observations $\mathbf{i}$ in the LS formulation of ${\mathbf{r}}_{\mathrm{LS}}$ in Eq. (9). The residual function ${\mathbf{r}}_{\mathrm{PU}}(\mathbf{x})$ in Eq. (11), however, does not follow a Gaussian distribution. Thus, using another norm function could give a more suitable estimate. A second reason to consider other functions than the L2-norm is that any errors in the system parameters of $\mathbf{\mathscr{H}}$ (which would be the case for a practical setup where system parameters themselves need to be estimated), as used in the inverse reconstruction model, give a model mismatch with the observation model.

As mentioned, a regularization term is usually added to avoid noise amplification and thus stabilizes the solution. Depending on the amount of noise, the weight parameter $\lambda $ for the regularization term should be adjusted accordingly. The larger $\lambda $ is, the more important it becomes that the solution preferred by the regularization term matches the true statistical properties of $\mathbf{x}$. For this purpose, large values of ${\mathbf{r}}_{m+1:m+n}(\mathbf{x})$ should be interpreted as edges, which are a natural part in images, and should not be penalized. The Lorentzian “norm” (not really a norm since it violates the triangle inequality),

## (20)

$$\begin{array}{c}{\rho}_{\mathrm{Lor}}(\mathbf{r})=\sum _{i}{h}_{\mathrm{Lor},i}({r}_{1})=\sum _{i}\mathrm{log}[1+\frac{1}{2}{(\frac{{r}_{i}}{{T}_{i}})}^{2}],\end{array}$$^{19}The ${T}_{i}$ are interpreted as threshold values. For ${r}_{i}$ smaller than ${T}_{i}$, ${\rho}_{\mathrm{Lor}}(\xb7)$ acts as the L2-norm and penalizes residual components quadratically, while for ${r}_{i}$ greater than ${T}_{i}$, ${\rho}_{\mathrm{Lor}}(\xb7)$ acts as the L1-norm, making the reconstruction less sensitive to large residuals that correspond to outliers in the data term or edges in the regularization term. In this regard, the simple linear Tikhonov regularization with the Lorentzian norm achieves the same edge-preserving property, which is the motivation for the use of the bilateral total variation regularization.

^{6}

^{,}

^{19}

Using the same residual function $\mathbf{r}={\mathbf{r}}_{\mathrm{PU}}$ as for ${\mathcal{O}}_{2}$, combined with the Lorentzian norm, a third estimate of $\mathbf{x}$ is

## (21)

$$\begin{array}{c}{\widehat{x}}_{\mathrm{Lor}}=\underset{x}{\mathrm{arg}\mathrm{min}}{\rho}_{\mathrm{Lor}}[{r}_{\mathrm{PU}}(x)].\end{array}$$The corresponding objective function is

## (22)

$$\begin{array}{c}{\mathcal{O}}_{3}={\rho}_{\mathrm{Lor}}[{\mathbf{r}}_{\mathrm{PU}}(\mathbf{x})]=\left\{\sum _{i=1}^{m}\mathrm{log}\right[1+\frac{1}{2}{\left(\frac{{r}_{\mathrm{PU},i}(\mathbf{x})}{{T}_{1}}\right)}^{2}]\phantom{\rule{0ex}{0ex}}+\sum _{i=m+1}^{m+n}\mathrm{log}[1+\frac{1}{2}{\left(\frac{{r}_{\mathrm{PU},i}(\mathbf{x})}{{T}_{2}}\right)}^{2}\left]\right\},\end{array}$$## (23)

$${\mathbf{x}}^{(k+1)}={\mathbf{x}}^{(k)}+{\mathbf{P}}^{(k)}{\mathbf{d}}_{\mathrm{Grad}}^{(k)},$$## (24)

$${\mathbf{d}}_{\mathrm{Grad}}^{(k)}={\nabla}_{x}{\mathcal{O}}_{3}(\mathbf{x})=\sum _{i=1}^{m+n}{{h}_{i}}^{\prime}({r}_{i}){\nabla}_{x}{r}_{i}(\mathbf{x})={\mathbf{J}}_{\mathbf{r}}^{T}(\mathbf{x})\psi (\mathbf{x})$$## (25)

$$\psi ={[{h}_{\mathrm{Lor},1}^{\prime}({r}_{1}),\dots ,{h}_{\mathrm{Lor},m+n}^{\prime}({r}_{m+n})]}^{T}={[\frac{2{r}_{1}}{2{T}_{1}^{2}+{r}_{1}^{2}},\dots ,\frac{2{r}_{m+n}}{2{T}_{2}^{2}+{r}_{m+n}^{2}}]}^{T}.$$If ${\mathbf{P}}^{(k)}$ is the Hessian matrix

## (26)

$${\mathbf{H}}_{{\mathcal{O}}_{3}}(\mathbf{x})={\nabla}_{x}^{2}{\mathcal{O}}_{3}(\mathbf{x})=\sum _{i=1}^{m+n}\{{h}_{\mathrm{Lor},i}^{\prime \prime}({r}_{i}){\nabla}_{x}{r}_{i}(\mathbf{x}){\nabla}_{x}^{T}{r}_{i}(\mathbf{x})\phantom{\rule{0ex}{0ex}}+{h}_{\mathrm{Lor},i}^{\prime}({r}_{i}){\nabla}_{x}^{2}{r}_{i}(\mathbf{x})\},$$## 4.

## Experimental Results and Discussion

In this section, the three SRR formulations presented in Sec. 3 are evaluated in terms of experimental results. To compare the quality of the estimates, ${\widehat{\mathbf{x}}}_{\mathrm{LS}}$, ${\widehat{x}}_{\mathrm{PU}}$, and ${\widehat{x}}_{\mathrm{Lor}}$, their respective tonemapped results are presented. This provides a subjective comparison. As objective measures, PSNR and MSSIM values are given for the tonemapped results relative to the tonemapped original image $\mathbf{x}$.^{20} The tonemapping function in MATLAB is used to display all HDR images, as it gives more visually appealing images than when tonemapping with $\stackrel{\sim}{f}$.

Two original images are used in the presented experiments, Memorial Church and Mount Tam (Images are courtesy of Greg Ward, available at Ref. 21), displayed in Figs. 1(a) and 2(a), respectively. From each of these images, a set of semisynthetical LR, LDR observations are generated according to the model in Eq. (1). Parameters used for the CRF are ${\gamma}_{\mathrm{LDR}}=1/2.2$ and $[{E}_{\mathrm{min}},{E}_{\mathrm{max}}]=[0.1,100]$. The pixel values of ${\mathbf{y}}_{k}$ are quantized to 8 bits. The corresponding illuminance domain images ${\mathbf{i}}_{k}$, contained in $\mathbf{i}$, are obtained as in Eq. (4). For both experiments, $K=4$ images are used to reconstruct HR images with a resolution enhancement factor of $L=2$. This setup would give a full rank inverse problem if all images were free from saturation, which is however not the case here due to the original images being HDR. The differently exposed ${\mathbf{y}}_{k}$ images are shown in Fig. 1(b) as well as in Fig. 2(b), with saturation visible in all of them. The blur kernels used to generate ${\mathbf{y}}_{k}$, contained in the respective ${\mathbf{H}}_{k}$, all have a Gaussian shape with standard deviation of 1 and a support of $4\times 4$ pixels. The kernels of ${\mathbf{H}}_{2}$, ${\mathbf{H}}_{3}$, and ${\mathbf{H}}_{4}$ are shifted spatially [0.5,0], [0,0.5], and [0.5,0.5] pixels, respectively, on the HR pixel grid relative to that of ${\mathbf{H}}_{1}$, such that the corresponding ${\mathbf{y}}_{k}$ is shifted accordingly, providing linearly independent observations of $\mathbf{x}$. Finally, ${\sigma}_{n}^{2}=0.01$ is the noise variance of ${\mathbf{n}}_{k}$. In experiment 1 on the Memorial Church image (of size $384\times 256$), the exposure durations are $\mathrm{\Delta}t=\phantom{\rule{0ex}{0ex}}\{{2}^{-11},{2}^{-6},{2}^{-6},{2}^{-1}\}$. For experiment 2 on the Mount Tam image (of size $182\times 302$), $\mathrm{\Delta}t=\{{2}^{-5},{2}^{-5},{2}^{-1},{2}^{-1}\}$.

The pixelwise function $\stackrel{\sim}{f}(\xb7)={(\xb7)}^{{\gamma}_{\mathrm{HDR}}}$ is used for ${\mathcal{O}}_{2}$ and ${\mathcal{O}}_{3}$ in order to tonemap the normalized illuminance domain data, thus achieving a residual function ${\mathbf{r}}_{\mathrm{PU}}$ that is PU with respect to the HVS. There is no established value for the exponent ${\gamma}_{\mathrm{HDR}}$ of $\stackrel{\sim}{f}$. Instead, it is selected here as ${\gamma}_{\mathrm{HDR}}=\phantom{\rule{0ex}{0ex}}1/6$ based on empirical tests. Thus, it is lower than that of the LDR case, ${\gamma}_{\mathrm{LDR}}=1/2.2$. It is noted that the authors of the image appearance model iCAM, similarly, suggest that an exponent of $1/6$ is required in order to clearly perceive details in dim areas (that is, to make that data roughly perceptually uniformly coded), when using a power law function such as $\stackrel{\sim}{f}$ for visualization of an HDR image.^{22} In their updated model, iCAM06, the HDR data is raised by an exponent of of $1/3$ at one step and later by an additional exponent of 0.7. Due to use of several color spaces and additional manipulation, however, no clear value for the total gamma is given.^{1}

As a lowest acceptable performance limit for SRR methods, consider the interpolated images in Figs. 1(c) and 2(c). These have been obtained by first creating an LR, HDR image using perfectly registered LR, LDR images, one for each unique exposure duration, and the exact $g$ function for photometric alignment, followed by bicubic interpolation in order to increase the resolution a factor $L=2$. Figure 1(d)–1(f) shows the reconstructed images, ${\widehat{x}}_{\mathrm{LS}}$, ${\widehat{x}}_{\mathrm{PU}}$, and ${\widehat{x}}_{\mathrm{Lor}}$, obtained from minimizing ${\mathcal{O}}_{1}$, ${\mathcal{O}}_{2}$, and ${\mathcal{O}}_{3}$, for the Memorial Church image. The regularization weight parameter used is $\lambda =0.02$, 0.03, and 0.02 respectively. Similarly, the results for the Mount Tam image are shown in Fig. 2(d)–2(f), all acquired with $\lambda =0.05$. The threshold values for the Lorentzian norm in ${\mathcal{O}}_{3}$ are set to ${T}_{1}={T}_{2}=0.1$ for both experiments, selected to give the highest possible MSSIM value, the same criterion used as for selecting $\lambda $ (parameter values were selected empirically). PSNR and MSSIM values for the tonemapped reconstruction results are given in Table 2. Since PSNR is a crude measure for image quality, emphasis should be on the MSSIM values. Edge artifacts are clearly visible in the dim regions of the LS illuminance domain estimate ${\widehat{x}}_{\mathrm{LS}}$. These artifacts are not present for the SRR results in the PU domain $\stackrel{\sim}{f}(\xb7)$. Thus, visual inspection is needed to complement the PSNR or MSSIM values that do not fully convey the results of the reconstruction. However, the proposed objective function ${\mathcal{O}}_{2}$ and its alternated version ${\mathcal{O}}_{3}$ do result in higher objective quality measures as well. There is a slight improvement by using the Lorentzian norm of ${\mathbf{r}}_{\mathrm{PU}}$ as in ${\mathcal{O}}_{3}$ rather than the L2-norm in ${\mathcal{O}}_{2}$, but it is minor. Simulations using imperfectly estimated ${\mathbf{H}}_{k}$ in the SRR do not seem to further enhance the performance difference of using the robust Lorentzian norm as compared to using the L2-norm; however, more rigorous investigations for this case would be interesting.

## Table 2

Experimental results. PSNR and MSSIM values are given for tonemapped SRR estimates t(x^) relative to the tonemapped original image t(x), where t(·) denotes the MATLAB tonemapping function. The highest PSNR and MSSIM values for each of the original images are highlighted in bold.

Image | Method | Parameters | PSNR(t(x^),t(x)) | MSSIM(t(x^),t(x)) |
---|---|---|---|---|

Memorial Church | O1 | λ=0.02 | 35.659 | 0.8813 |

O2 | λ=0.03 | 36.435 | 0.9259 | |

O3 | λ=0.02, T1=0.1, T2=0.1 | 36.707 | 0.9275 | |

Mount Tam | O1 | λ=0.05 | 35.901 | 0.8771 |

O2 | λ=0.05 | 37.307 | 0.9054 | |

O3 | λ=0.05, T1=0.1, T2=0.1 | 37.290 | 0.9055 |

For the Mount Tam image, additional reconstruction results for the objective function ${\mathcal{O}}_{1}$ are shown in Fig. 2(g)–2(i), with $\lambda =10$, 0.1 and ${10}^{-6}$ used respectively. It is seen that the increased value of $\lambda $ from 0.05 to 0.1 gives a less noisy result compared to the LS reconstruction in Fig. 2(d); however, the edge artifacts are magnified. A too-high value for $\lambda $ leads to oversmoothing, while a too-low value for $\lambda $ results in severe noise amplification due to the poor condition number of the inverse problem. For the reconstruction result in Fig. 2(i) with $\lambda ={10}^{-6}$, $K=8$ images are used, four for each of the two different exposure durations, which means that the problem is full rank. Thus, it is clear that due to the ill-conditioned structure, regularization is necessary, which in turn gives numerical reconstruction errors particularly at image edges. These errors have a large perceptual impact for low-illuminance levels, as results show when performing SRR in the illuminance domain as for ${\mathcal{O}}_{1}$. For ${\mathcal{O}}_{2}$ and ${\mathcal{O}}_{3}$, as an example, setting $\lambda $ for SRR of the Mount Tam image to 0.01 or 0.1 instead of the current choice of 0.05 does not cause much noticeable difference in the results. The SRR method presented in this paper could straightforwardly be applied to color images, by treating each RGB color channel individually. This gives satisfactory results, even though it is not optimal in terms of color fidelity. The RGB channels should then be coded jointly, such as for the LDR L*a*b color space.

## 5.

## Conclusions

This paper treats SRR of differently exposed LDR images. A formulation of the HDR SRR problem in which perceived difference is minimized in an appropriate, PU image domain has been proposed. As shown in experimental results, even small numerical reconstruction errors will have a large perceptual impact in dim image areas if the illuminance domain is used in the objective function, due to the nonlinearity of the illuminance domain to perceived brightness of the HVS. This issue is overcome by using the proposed approach. Because the proposed residual function is non-Gaussian, an alternative to the L2-norm was suggested, which, however, only gave a minor improvement in the presented experiments.

## References

## Biography

**Tomas Bengtsson** is a PhD student in the signal processing research group at the Chalmers University of Technology, in Gothenburg, Sweden. He obtained a BS degree in electrical engineering and an MS degree in communication engineering at Chalmers University of Technology between 2004 and 2009. His research lies in signal processing and sensing, focusing on image processing, in particular high dynamic range imaging and super-resolution techniques. He is involved in a project on visual quality measures, with the aim to explore algorithms for the enhanced display of images and videos with applications toward the automotive industry, and in particular car safety.

**Tomas McKelvey** obtained an MS degree in electrical engineering at Lund University between 1987 and 1991 and received his PhD in automatic control at Linköping University in 1995. Between 1995 and 1999, he held research and teaching positions at Linköping University and became docent in 1999. Between 1999 and 2000 he was a visiting researcher at University of Newcastle, Australia. Since 2000, he has been with Chalmers University of Technology, becoming a full professor in 2006 and the head of the signal processing group at Chalmers in 2011. Professor McKelvey’s research interests are model based in statistical signal processing, system identification, and image processing with applications to biomedical engineering and combustion engines.

**Irene Yu-Hua Gu** received her PhD degree in electrical engineering from Eindhoven University of Technology, Eindhoven, The Netherlands, in 1992. From 1992 to 1996, she was a research fellow at Philips Research Institute IPO, Eindhoven, The Netherlands, and did post-doctorate work at Staffordshire University, Staffordshire, United Kingdom. She was also a lecturer at the University of Birmingham, Birmingham, United Kingdom. Since 1996, she has been with the Department of Signals and Systems, Chalmers University of Technology, Göteborg, Sweden, where she has been a full professor since 2004. Her research interests include statistical image and video processing, object tracking and video surveillance, pattern classification, and signal processing with applications to electric power systems. She has been an associate editor with the EURASIP *Journal on Advances in Signal Processing* since 2005, and on the editorial board for the *Journal on Ambient Intelligence and Smart Environments (JAISE)* since 2011.