24 May 2013 Super-resolution reconstruction of high dynamic range images in a perceptually uniform domain
Author Affiliations +
Optical Engineering, 52(10), 102003 (2013). doi:10.1117/1.OE.52.10.102003
Abstract
Super resolution is a signal processing method that utilizes information from multiple degraded images of the same scene in order to reconstruct an image with enhanced spatial resolution. The method is typically employed on similarly exposed pixel valued images, but it can be extended to differently exposed images with a high combined dynamic range. We propose a novel formulation of the joint super-resolution, high dynamic range image reconstruction problem, using an image domain in which the residual function of the inverse problem relates to the perception of the human visual system. Simulated results are presented, including a comparison with a conventional method, demonstrating that the proposed approach avoids some severe reconstruction artifacts.
Bengtsson, McKelvey, and Gu: Super-resolution reconstruction of high dynamic range images in a perceptually uniform domain

1.

Introduction

To obtain high visual quality in digital camera systems, a high spatial resolution is required. The most direct way to increase spatial resolution in a camera is to reduce the size of the sensor pixel elements or to increase the chip size. However, there are applications for which this may not be feasible. For example, to overcome sensor noise at low exposure levels, smaller pixels have to be exposed for a longer duration. Consequently, in terms of hardware, a high spatial resolution is in contradiction to having a high temporal resolution, which is a requirement for high-quality video systems. Additionally, a longer exposure duration means that frames become susceptible to motion blur effects. Super-resolution reconstruction (SRR), as an alternative, is a signal processing approach to enhance spatial resolution. Multiple low-resolution (LR) images, captured in a sequence, of the same original scene are used to construct one high-resolution (HR) image. A further introduction to super resolution is given in Sec. 1.3.

Camera sensors that use a single fixed-exposure duration for each captured image have dynamic range limitations. This is particularly the case for compact cameras that have a smaller pixel size. The limitation is experienced e.g., when capturing an image of an outdoor environment on a sunny day or an indoor scene with a window. Certain image areas will then be overexposed or underexposed, which has a large impact in terms of visual quality. The human visual system (HVS) is able to perceive detailed information in bright and dim areas simultaneously, and thus it is undesired of the camera system to impose this limitation. By taking two or more images of the same scene, using different exposure settings, a single high dynamic range (HDR) image can be created, which is free from saturation. Digital display devices, for their part, also have a low dynamic range (LDR) compared to real-world scenes. The dynamic range limitation for display devices, however, can practically be overcome by tonemapping HDR image information to a LDR image in such a way that it, to the HVS, resembles the HDR image that it was created from.1 Before proceeding to a review of relevant literature on HDR as well as SRR, the HVS and how light is interpreted is briefly introduced.

1.1.

The Human Visual System

Understanding the main principles of the HVS is of crucial importance to researchers in image processing. When dealing with standard 8-bit pixel image formats, much is abstracted into how those images are coded. For example, in the standard red, green, and blue (RGB) color spaces, data is coded linearly with respect to the perception of brightness in the HVS. This implies that image operations or filters naturally work as expected. Also, storage, compression or filtering of image data in this domain makes sense, because of the property that pixel value errors have the same perceptual impact across the whole range of pixel values.

Light that is incident on the camera sensor is nonlinear with regard to the HVS in several respects. As will be presented, various camera components are designed to mimic the HVS in order to produce an output image, which is perceived to be of high fidelity. Consider a light spectrum observed by the human eye. The incident energy is registered by the photoreceptors of the eye, the three different types of cone cells (corresponding to, roughly, red, green and blue light), that are responsible for human color vision. The three cone types each have their respective spectral responses. Their combined spectral response can be described in terms of grayscale vision by a single function of wavelength, the luminous efficacy curve. For simplicity as well as to stay within the main scope of the paper, the discussion of the HVS is from here on reduced to grayscale vision. The luminous efficacy curve is a dimensionless function that tells what fraction of radiant power at any given wavelength contributes to the luminous properties of the HVS. It is nonzero for wavelengths between, roughly, 380 to 700 nm, the so-called visible spectrum.2

The term for the incident power spectrum of light is irradiance. The mathematical inner product of irradiance with the luminous efficacy curve is denoted illuminance. As opposed to irradiance, a radiometric unit, illuminance is a term defined based on the HVS, and is thus a photometric unit. While illuminance is the light registered by the human photoreceptors, the cones, it is still nonlinear with respect to perceived brightness. Illuminance changes of an absolute measure have a much higher perceived impact at low illuminance levels as compared to at high illuminance levels. Perceived brightness is subjective. Based on tests of human vision on large sample sizes, standardized units such as lightness and luma are defined and used in color spaces for LDR image processing. Table 1 summarizes the terms presented in this section and how they relate to the HVS and digital cameras. An incident light spectrum, irradiance, on a camera sensor is filtered according to the same principle as that of the cones of the eye. Thus, light contributes differently, depending on its wavelength, to the amount of current that is excited from sensor elements. The term illuminance is used here also when referring to the sensor output due to the filtered light, even though the term is really defined for the HVS. An image domain that is an approximation of perceived brightness, i.e., which is approximately linear to perceived brightness, will henceforth be called a perceptually uniform (PU) domain. Examples of (grayscale) PU domains are lightness and luma; however, they are only designed for LDR images. Any PU domain will correspond to a concave nonlinear mapping from the illuminance domain.

Table 1

Description of light units in the context of the HVS and digital cameras.

UnitDescriptionHVSCamera
Irradiance (Radiometric)Incident power spectrum I(λ)Sensitive to light of wavelength λ∈ [380,700] nmFilter I(λ) according to approximated luminous efficacy
Illuminance (Photometric)∫−∞∞I(λ)V(λ)dλ, where V(λ) is the luminous efficacy curve of the HVSThe power registered by the photoreceptorsFiltered light exposes the sensor to approximate illuminance of the HVS
Perceived brightness, e.g., Lightness, Luma (Photometric)A nonlinear function of illuminanceA subjective measureStandardized units such as lightness and luma are approximations of perceived brightness

1.2.

High Dynamic Range Images

The HVS can adapt to illuminance levels of about 10 orders of magnitude, and at any given adaptation level about five orders of magnitudes are perceived. As mentioned, for a given exposure setting, camera sensors have a lower dynamic range than that of many natural scenes (and of the HVS), which will cause saturation of sensor elements, either from over- or underexposure. Thus, the sensors are said to have a LDR. In HDR image reconstruction, a single HDR image is constructed from two or more LDR images captured with different exposure settings. The LDR images must be aligned and should be captured in a quick sequence, such that local motion in the scene is kept to a minimum. The images are merged by weighted average in a PU image domain. Weights of saturated pixels are naturally set to zero. Given a set of LDR images in the pixel value domain, their corresponding sensor exposures are retrieved by an inverse mapping with the camera response function (CRF). Illuminance domain images are subsequently obtained by scaling the images with the inverse of their respective exposure durations. Finally, the images are transformed to e.g., the log-illuminance domain, which is an approximation of lightness (because perceived brightness of the HVS is much closer to logarithmic than linear in illuminance), where the weighted average is taken.3

To reproduce an image of an HDR scene on a display device, e.g., a computer monitor, it has to be tonemapped to the lower dynamic range of the display device.1,4 The tonemapping essentially compresses the dynamic range uniformly in a suitable domain, but more sophisticated tonemappers also do local processing. For example, image contrasts may be compressed more on edges between image segments than on details within image segments. Additionally, suitable color spaces are used in order to maintain high color fidelity. The output LDR image, typically stored in a standard 8-bit format such as sRGB, is perceived by the HVS to still contain HDR information. HDR applications are recently starting to reach the general public. For example, smartphone applications that take two images in a sequence and generate a merged HDR image have reached widespread use, and similar features are common in modern commercial cameras. The result is often satisfactory, although artifacts sometimes occur due to spatial misalignment.

1.3.

Super-Resolution Reconstruction

The signal processing technique SRR makes use of multiple degraded LR images of the same original scene.5,6 In order to be useful, each image must contain new information of the scene, which is the case if the images are shifted relative to each other on a subpixel scale (e.g., from natural camera movement) or if they are blurred by different blur functions. In the corresponding mathematical model, this means that the images each contribute with linearly independent observations of the desired HR image representation. To reconstruct an HR image, an inverse problem is solved, including image registration (alignment) with high precision as well as blur estimation, either as pre-processing or jointly with estimating the HR image. A property of SRR is that the resulting inverse problem is fundamentally ill-conditioned, which presents big challenges.7,8 If a larger number of observations are used, the conditioning improves, but only slightly. Furthermore, because complex motion between frames is undesired, a relatively small number of images is preferred so that they are close temporally. Thus, to improve the conditioning, a regularization term is often added to the problem formulation. The regularization function should enforce a solution that matches the statistics of the image, or rather of natural images in general, since the statistics of the unknown image are unavailable.

Bayesian interpretations, in which the unknown image is viewed as stochastic, of the SRR problem are common.9,10 Similarly to deterministic methods, they lead to minimizing a (regularized) objective function. A prior is assumed on the HR image and often also on other system parameters that should be estimated. For example, fine-scale image registration can be cast as a stochastic problem, with a prior selected for the registration error that remains after a rough pre-processing registration. Blind super resolution (BSR) addresses the joint estimation of HR image and blur kernels (that include subpixel level shifts).11 BSR has clear similarities with multichannel blind deconvolution,12 with the added extension of downsampling. BSR is very challenging in terms of finding the global optimal point, due to its nonconvex structure. Often, simple parameterizations of the blur kernel are made, or it is assumed static to model only the blur effect of the sensor. Applications of SRR exist both for offline processing, such as astronomy and forensics, and for real-time processing, as for video streaming. Work on real-time implementations are increasingly common, including SRR implementations on custom hardware such as Field Programmable Gate Arrays. Custom sensors may also be used, that can e.g., increase pixel size for desired improvement of temporal resolution in situations with moving objects that are susceptible to motion blur.13 The spatial resolution is then maintained by performing SRR on the frames with reduced resolution. Optical flow models are implemented to handle local object motion within the image frames.

1.4.

Super-resolution Reconstruction of HDR Images

Traditionally, SRR is performed on similarly exposed LDR pixel valued images. However, during the past few years SRR has also been applied to HDR images.14,15 The combined dynamic range of the reconstructed HR, HDR image has no restrictions, as the exposure durations of the input images can be selected freely. A good reconstruction result requires accurate image registration, both for spatial shifts and to photometrically align the images in an HDR image domain. Facing both geometric and photometric registration adds some extra challenges, as compared to only geometric registration.16 Ambitious methods implement optical flow techniques to handle local spatial motion within the images.17 If the input images to the HDR SRR are given as 8-bit pixel valued images, the CRF is needed in order to retrieve the illuminance information from the earlier stage of the camera processing chain. However, if the device was designed to perform HDR SRR, the illuminance information (of a higher bit depth) could be more readily available.

A common factor for the published work on SRR for HDR images is that the reconstruction takes place in the illuminance domain, which, as discussed in Sec. 1.1, is highly nonlinear to human perception.18 This is opposed to the case of LDR SRR methods, where the reconstruction is performed in the PU pixel value domain. Because illuminance errors at low illuminance levels have such a high perceived impact, reconstruction errors in dim image areas cause noticeable artifacts. Therefore, in this work, the HDR SRR is performed in a tonemapped, PU domain. The HDR illuminance information is thus mapped by a nonlinear tonemapping function, modifying the inverse SRR problem from a least squares (LS) problem to a nonlinear problem of equal dimension. Note that the choice of tonemapping function is nontrivial. They all aim to mimic the HVS but should merely be seen as approximations.1,4 In the formulation of the proposed method, a simple tonemapping function is used, which nevertheless captures the main characteristics of the HVS well (but should not be the preferred choice for the application of visualizing HDR data). Methods for image registration or the estimation of other system parameters are outside the scope of this paper, the focus is on the formulation of the inverse problem used in the SRR. The remainder of this paper is organized as follows. Section 2 describes the camera model for the image data acquisition. In Sec. 3, a method for SRR on HDR images in a PU domain is proposed. Section 4 provides experimental results for the proposed method and a related illuminance domain approach. Finally, Sec. 5 concludes the paper.

2.

Camera Model

When acquiring an image, the camera sensor is illuminated by a real-world scene for a given exposure duration. Let the image X, of size X1×X2, denote an HR representation with dynamic range Xmax/Xmin of an original continuous scene, and let x be its corresponding vectorized representation of length (X1X2)×1n×1. It is assumed that the camera provides a set of low-resolution and LDR images yk according to the expression

(1)

yk=f[Δtk(DCHkx+nk)]+qk,k=1,,K.

For each of the multiple observations, CHk, of size n×n, performs 2-D convolution on the vectorized HR image x. Its convolution kernel Hk, of support H1×H2, represents blurring as well as planar small-scale spatial shifts (assuming that rough image registration can be performed as pre-processing) for yk relative to a reference image (e.g., y1). The downsampling matrix D, of size (n/L2)×n, decimates the spatial resolution a factor L in the x- and y-directions, and Δtk is the exposure duration. Additive Gaussian noise nk of variance σn2 models the noise in the image sensor. In addition, by somewhat exaggerating the value of σn2, other error sources can be contained in nk. The yk images are quantized to 8-bit pixel values, with uniformly spaced quantization levels Y{0,,1}. This quantization effect is modeled by qk, while quantization in the A/D converter to a higher bit depth is considered to be part of nk.

The low-resolution exposure of the camera sensor is ek=Δtk(DCHkx+nk). The ek vector is, similar to yk, nk, and qk, of size (n/L2)×1. At each pixel i, the exposure [ek]i is mapped by the pixelwise, nonlinear CRF, f, to a pixel value in the output image. A reasonable (simplified) model of a real CRF, adopted for the simulations in this paper, is

(2)

f(E)={0,EEmin(EEminEmaxEmin)γLDR,EminEEmax1,EEmax,
where γLDR<1 makes the curve concave in the unsaturated region. What causes the yk images to be of LDR is the fixed operational range of exposure values of the CRF, [Emin,Emax]. Exposure values outside of this interval are clipped and can not be recovered (from that single image). The CRF is said to perform gamma compression on the exposure image, to an LDR, PU output domain.

2.1.

Alternative Camera Models

It is interesting to note that the camera model used for LDR SRR on pixel domain images differs from the camera model for HDR SRR methods, without exception. The forward model assumed for LDR SRR is

(3)

yk=DCHkf(Δtx)+nk,k=1,,K,
for which the order of f and DCHk is interchanged as compared to Eq. (1).5 From the set of observations, {yk}, the desired pixel valued HR image z=f(Δtx) is to be reconstructed. The model in Eq. (3) is thus used in the formulation of the inverse problem, where it relates the unknown HR image to the actual observed LR data. The inconsistency in the choice of camera model gives rise to the question of what model, Eq. (1) or Eq. (3), is closer to the real physics of the camera. One of the two, LDR SRR or HDR SRR, has a larger model mismatch that comes into play when real-world data is considered. The choice of camera model is not investigated further in this paper.

3.

Image Reconstruction in a Perceptually Uniform Domain

To reconstruct, or rather to find a good estimate of x, an inverse problem should be formulated and solved. To do so, knowledge of the system parameters in Hk, including blur functions and subpixel-precision relative motion between image frames, is necessary. Assuming these, as well as the CRF f, to be known or to have been estimated, a compact notation is introduced in the beginning of this section to aid the inverse problem formulations. Then, in the remainder of the section, three different HDR SRR formulations are described, one which has an objective function that is expressed in the illuminance domain, and the other two in which a PU domain is employed. For mathematical convenience, the impact of the quantization noise qk will be neglected in the inverse problem formulation (quantization is nevertheless used when generating yk), as it typically has little impact on reconstruction results compared to other error sources, e.g., nk (and in the more general case, to errors in system parameter estimates).

To align the observed images yk photometrically, the corresponding illuminance domain images [of the same size (n/L2)×1]

(4)

ik=g(yk)/Δtk
are recovered by mapping each pixel value in yk with the inverse (not in the strict sense, due to the clipping effect) CRF,

(5)

g(Y)=Y(1/γLDR)(EmaxEmin)+Emin,
followed by a scaling with 1/Δtk. To handle saturation in f, each observation yk has a corresponding diagonal weight matrix Wk, of size (n/L2)×(n/L2), which is zero for diagonal elements corresponding to saturated pixels in yk and one for the remaining diagonal elements. Saturated pixels are those where [ek]i[Emin,Emax]. However, since only the yk are at hand, saturated pixels are taken to be those where [yk]i is either 0 or 1, the lowest and the highest quantization levels respectively. Based on the camera model in Eq. (1),

(6)

Wkik=Wk[g(yk)/Δtk]==Wk(DCHkx+nk),k=1,,K,
where the second equality holds if the quantization effect is neglected, which means that g(f()) is the identity function for the unsaturated region of f. By stacking the individual LR, LDR illuminance domain observations in i=[i1T,,iKT]T, the noise vector n=[n1T,,nKT]T, both of size (nK/L2)×1, defining a block diagonal weight matrix W=diag(W1,,WK), of size (nK/L2)×(nK/L2), and [(DCH1)T,,(DCHK)T]T, of size (nK/L2)×nm×n, it follows that

(7)

Wi=W(x+n).

More generally, W could be designed to weigh observed data differently, as a function of e.g., its pixel value and exposure duration, according to the noise properties of Eq. (1).14 In the image reconstruction, the objective is to minimize the difference, in some sense, between Wx and the faithfully (unsaturated) observed information Wi. Thus, a distance measure of the residual function r(x) should be minimized in order to obtain

(8)

x^=argminxρ[r(x)]=ih[ri(x)],hi[ri(x)]0,i,
where ρ is a nonnegative, norm-like function. The objective functions to be minimized in this paper all have the structure above. The rank of W depends on the number of observations K, and specifically the unsaturated regions of the images yk, as well as the selection of L. For example, if there is no saturation (which however implies that x cannot be an HDR image) and L=2, K=L2=4 images or more implies that W is full rank. However, due to the structure of W, or more precisely due to the blur functions Hk contained within, the inverse problem is inherently ill-conditioned. Thus, even for a full-rank problem, the solution is very sensitive to noise. By adding a regularization term of a certain weight, the condition number of the overall problem improves, at the cost of removing fidelity from the data term. The remainder of this section presents three specific formulations of the general inverse problem in Eq. (8). These are then evaluated in Sec. 4.

3.1.

The Least Squares Solution

Under the assumption of additive Gaussian noise n, the LS estimate of x is optimal in the maximum likelihood sense. However, the corresponding inverse problem is equivalent to minimizing the sum of squared reconstruction errors in the illuminance domain, a domain that is nonlinearly related to perceived error by the HVS, yet it is still used for SRR of HDR images.14,15,17 The LS estimate is included mainly as a benchmark for the proposed method. In order to improve the condition number of the inverse problem and provide a stable solution, a regularization term is appended. Thus, the LS estimate is

(9)

x^LS=argminx[WλΓ]x[Wi0]22argminxrLS(x)22,
for some linear regularization matrix Γ of size n×n with regularization weight parameter λ. Because images are typically piecewise smooth, Γ is selected as a matrix that performs 2D convolution on the vectorized image x with a 3×3 Laplacian kernel, that enforces a smooth solution by penalizing the 2nd order x- and y-derivative. The residual vector here is denoted rLS(x). In the above case, the objective function is

(10)

O1=rLS(x)22=i=1m+nrLS,i2(x),
where rLS,i(x) denotes the i’th component function in rLS(x). Because the system of equations is very sparse, an analytic solution is computationally feasible for a fairly large size of the problem.

3.2.

The Proposed Objective Function

For the proposed objective function, the residual function r(x) is formulated in such a way that errors perceived by the HVS are penalized uniformly in the reconstruction. Thus, for the estimate of x,

(11)

x^PU=argminxW[f(x)f(i)]λΓf(x)22argminxrPU(x)22,
reconstruction errors are measured in a PU domain. Similar to the CRF, f, discussed in Sec. 2, f here is taken to be a power law function but with a different exponent, γHDR. Because the method that is used to generate an HDR ground truth image (from a set of LDR images) only provides relative illuminance information, and not the true absolute values,3 the illuminance information, i, is normalized before applying f. The same notation is kept. The normalized illuminance information is then tonemapped to a PU domain by f(·)=(·)γHDR. In this work, no special care is taken to customize the value of γHDR depending on the dynamic range of i. It should be noted that, to instead minimize f[W(xi)], i.e., the tonemapped difference, would be incorrect, since the absolute illuminance level determines how sensitive the perception of the HVS is. The objective function of Eq. (11) is

(12)

O2=rPU(x)22=i=1m+nrPU,i2(x),
where rPU,i(x) denotes the i’th component function in rPU(x).

To solve the nonlinear minimization problem in Eq. (11), an iterative nonlinear programming method must be employed. An initial estimate, x(0), may be taken as an interpolated version of an ik image. Furthermore, let x(k) be an estimate of x at iteration k, then the estimate is updated according to

(13)

x(k+1)=x(k)+α(k)d(k),
where d(k) is the search direction and α(k) is the step length at iteration k. The search direction is of great importance with regard to the convergence rate. The 1st order gradient descent method is typically very slow to converge for the SRR problem. Because of that, higher order information is considered in order to determine d(k). A 2nd order Taylor approximation of O=ρ[r(x)] in the neighborhood of x(k), as used in Newton methods, is

(14)

O(x(k)+d(k))O(x(k))+GT(x(k))d(k)+d(k)TH(x(k))d(k),
where G is the gradient vector and H is the Hessian matrix of O. For the specific case of a (possibly) nonlinear LS objective function, as for O2 (and O1), the gradient and Hessian can be written in terms of r as

(15)

G(x(k))=[xO(x)]|x(k)=2i=1m+nri(x)xri(x)=2JrT(x(k))r(x(k)),
and

(16)

H(x(k))=[x2O(x)]|x(k)=2i=1m+n[xri(x)xTri(x)+ri(x)x2ri(x)]|x(k)=2JrT(x(k))Jr(x(k))+S(x(k)).

Note that O(x), r(x), G(x), and H(x) all depend on x, and thus need to be re-evaluated for each update in the iterative optimization method. However, for compactness, this dependency will not always be emphasized by writing out the argument. The commonly used Gauss-Newton method fits nicely for the optimization problem in Eq. (11). To see this, consider the nonlinear residual function

(17)

rPU(x)=[W11[(j=1n1jxj)γi1γ]Wmm[(j=1nmjxj)γimγ]λj=1nΓ1j(xjγ)λj=1nΓnj(xjγ)],
which consists of a data term rPU,1m(x) and a regularization term rPU,m+1m+n(x). The data term elements have small residual values for x close to the true image. The regularization term is designed such that the true image should be penalized as little as possible, but in practice its contribution to the residual is certainly not zero. Because the residual elements are small near x, the Hessian H in Eq. (16) is well approximated by HGN=2JrTJr, dropping the term S, which contains the second order derivatives of ri. In order to obtain dGN, at each iteration

(18)

Jr=[r1x1r1xnrpx1rmxnrm+1x1rm+nxnrm+1x1rm+nxn]=[W11γ11(j=1n1jxj)γ1W11γ1n(j=1n1jxj)γ1Wmmγm1(j=1n1jxj)γ1Wmmγmn(j=1n1jxj)γ1λΓ11γx1γ1λΓ1nγxnγ1λΓn1γx1γ1λΓnnγxnγ1]
is computed. From here on, Jr refers to the Jacobian matrix of the residual function r=rPU. By minimizing the Taylor expansion in Eq. (14) with regard to d(k), and replacing H(k) with HGN(k),

(19)

dGN(k)=[JrT(x(k))Jr(x(k))]1[JrT(x(k))r(x(k))],
is obtained as the search direction to be used in the iterative minimization in Eq. (13). By taking the inner product of the gradient and the search direction dGN, it is seen that the GN search direction is indeed a descent direction. This is a nice property of the GN method, and it is a consequence of the positive definiteness of HGN. Standard GN does not include any step size α(k) in its updates. However, while HGN=2JrTJr is positive definite (by construction), Jr is only valid in a region around x(k) for which it was evaluated, thus the update step may be taken too large. By including a line search method to determine the step size at each iteration, the robustness of the method is increased. The extended method is famously known as damped Gauss-Newton, and it is used here to minimize O2.

3.3.

Reconstruction Using Robust Norm and Robust Regularization

So far, the methods of reconstructing x have involved minimizing an objective function that is the L2-norm of a residual vector. The LS solution is suitable when the residual, at the optimum, has a Gaussian distribution. This was the case for the (unsaturated) observations i in the LS formulation of rLS in Eq. (9). The residual function rPU(x) in Eq. (11), however, does not follow a Gaussian distribution. Thus, using another norm function could give a more suitable estimate. A second reason to consider other functions than the L2-norm is that any errors in the system parameters of (which would be the case for a practical setup where system parameters themselves need to be estimated), as used in the inverse reconstruction model, give a model mismatch with the observation model.

As mentioned, a regularization term is usually added to avoid noise amplification and thus stabilizes the solution. Depending on the amount of noise, the weight parameter λ for the regularization term should be adjusted accordingly. The larger λ is, the more important it becomes that the solution preferred by the regularization term matches the true statistical properties of x. For this purpose, large values of rm+1m+n(x) should be interpreted as edges, which are a natural part in images, and should not be penalized. The Lorentzian “norm” (not really a norm since it violates the triangle inequality),

(20)

ρLor(r)=ihLor,i(r1)=ilog[1+12(riTi)2],
where Ti>0 are constant parameters, has been introduced and showed promising results for SRR in which several different noise models were tested.19 The Ti are interpreted as threshold values. For ri smaller than Ti, ρLor(·) acts as the L2-norm and penalizes residual components quadratically, while for ri greater than Ti, ρLor(·) acts as the L1-norm, making the reconstruction less sensitive to large residuals that correspond to outliers in the data term or edges in the regularization term. In this regard, the simple linear Tikhonov regularization with the Lorentzian norm achieves the same edge-preserving property, which is the motivation for the use of the bilateral total variation regularization.6,19

Using the same residual function r=rPU as for O2, combined with the Lorentzian norm, a third estimate of x is

(21)

x^Lor=argminxρLor[rPU(x)].

The corresponding objective function is

(22)

O3=ρLor[rPU(x)]={i=1mlog[1+12(rPU,i(x)T1)2]+i=m+1m+nlog[1+12(rPU,i(x)T2)2]},
where T1 and T2 are the threshold values of the Lorentzian norm for the data term and the regularization term respectively. The nonlinear objective is minimized iteratively. The GN method used for Eq. (11) does not fit this problem. Instead, the update equation for the iterative minimization of Eq. (22) is

(23)

x(k+1)=x(k)+P(k)dGrad(k),
where P(k) is a linear transformation performed on the gradient vector

(24)

dGrad(k)=xO3(x)=i=1m+nhi(ri)xri(x)=JrT(x)ψ(x)

(25)

ψ=[hLor,1(r1),,hLor,m+n(rm+n)]T=[2r12T12+r12,,2rm+n2T22+rm+n2]T.

If P(k) is the Hessian matrix

(26)

HO3(x)=x2O3(x)=i=1m+n{hLor,i(ri)xri(x)xTri(x)+hLor,i(ri)x2ri(x)},
the iterative Eq. (23) update is the Newton method. However, P(k) may be any linear transformation, as long as it gives a higher convergence rate as compared to that of the gradient direction it is beneficial. It is preferred to be positive definite, as this guarantees to give a descent direction in the neighborhood of x(k). The choice P(k)=(JrTJr)1 in Eq. (23) is empirically proven to give a high convergence rate for Eq. (21), and is thus used for the experiments.

4.

Experimental Results and Discussion

In this section, the three SRR formulations presented in Sec. 3 are evaluated in terms of experimental results. To compare the quality of the estimates, x^LS, x^PU, and x^Lor, their respective tonemapped results are presented. This provides a subjective comparison. As objective measures, PSNR and MSSIM values are given for the tonemapped results relative to the tonemapped original image x.20 The tonemapping function in MATLAB is used to display all HDR images, as it gives more visually appealing images than when tonemapping with f.

Two original images are used in the presented experiments, Memorial Church and Mount Tam (Images are courtesy of Greg Ward, available at Ref. 21), displayed in Figs. 1(a) and 2(a), respectively. From each of these images, a set of semisynthetical LR, LDR observations are generated according to the model in Eq. (1). Parameters used for the CRF are γLDR=1/2.2 and [Emin,Emax]=[0.1,100]. The pixel values of yk are quantized to 8 bits. The corresponding illuminance domain images ik, contained in i, are obtained as in Eq. (4). For both experiments, K=4 images are used to reconstruct HR images with a resolution enhancement factor of L=2. This setup would give a full rank inverse problem if all images were free from saturation, which is however not the case here due to the original images being HDR. The differently exposed yk images are shown in Fig. 1(b) as well as in Fig. 2(b), with saturation visible in all of them. The blur kernels used to generate yk, contained in the respective Hk, all have a Gaussian shape with standard deviation of 1 and a support of 4×4 pixels. The kernels of H2, H3, and H4 are shifted spatially [0.5,0], [0,0.5], and [0.5,0.5] pixels, respectively, on the HR pixel grid relative to that of H1, such that the corresponding yk is shifted accordingly, providing linearly independent observations of x. Finally, σn2=0.01 is the noise variance of nk. In experiment 1 on the Memorial Church image (of size 384×256), the exposure durations are Δt={211,26,26,21}. For experiment 2 on the Mount Tam image (of size 182×302), Δt={25,25,21,21}.

Fig. 1

Top row, from left to right: (a) Original image Memorial Church. (b) The observed images yk (c) Bicubic interpolation of an LR, HDR version of Memorial Church. Bottom row: Reconstruction results using (d) O1, (e) O2, and (f) O3.

OE_52_10_102003_f001.png

Fig. 2

Top row, from left to right: (a) Original image Mount Tam. (b) The observed images yk. (c) Bicubic interpolation of an LR, HDR version of Mount Tam. Middle row: Reconstruction results using (d) O1, (e) O2, and (f) O3. Bottom row: More reconstruction results using O1, with λ set to (g) 10, (h) 0.1, and (i) 106.

OE_52_10_102003_f002.png

The pixelwise function f(·)=(·)γHDR is used for O2 and O3 in order to tonemap the normalized illuminance domain data, thus achieving a residual function rPU that is PU with respect to the HVS. There is no established value for the exponent γHDR of f. Instead, it is selected here as γHDR=1/6 based on empirical tests. Thus, it is lower than that of the LDR case, γLDR=1/2.2. It is noted that the authors of the image appearance model iCAM, similarly, suggest that an exponent of 1/6 is required in order to clearly perceive details in dim areas (that is, to make that data roughly perceptually uniformly coded), when using a power law function such as f for visualization of an HDR image.22 In their updated model, iCAM06, the HDR data is raised by an exponent of of 1/3 at one step and later by an additional exponent of 0.7. Due to use of several color spaces and additional manipulation, however, no clear value for the total gamma is given.1

As a lowest acceptable performance limit for SRR methods, consider the interpolated images in Figs. 1(c) and 2(c). These have been obtained by first creating an LR, HDR image using perfectly registered LR, LDR images, one for each unique exposure duration, and the exact g function for photometric alignment, followed by bicubic interpolation in order to increase the resolution a factor L=2. Figure 1(d)1(f) shows the reconstructed images, x^LS, x^PU, and x^Lor, obtained from minimizing O1, O2, and O3, for the Memorial Church image. The regularization weight parameter used is λ=0.02, 0.03, and 0.02 respectively. Similarly, the results for the Mount Tam image are shown in Fig. 2(d)2(f), all acquired with λ=0.05. The threshold values for the Lorentzian norm in O3 are set to T1=T2=0.1 for both experiments, selected to give the highest possible MSSIM value, the same criterion used as for selecting λ (parameter values were selected empirically). PSNR and MSSIM values for the tonemapped reconstruction results are given in Table 2. Since PSNR is a crude measure for image quality, emphasis should be on the MSSIM values. Edge artifacts are clearly visible in the dim regions of the LS illuminance domain estimate x^LS. These artifacts are not present for the SRR results in the PU domain f(·). Thus, visual inspection is needed to complement the PSNR or MSSIM values that do not fully convey the results of the reconstruction. However, the proposed objective function O2 and its alternated version O3 do result in higher objective quality measures as well. There is a slight improvement by using the Lorentzian norm of rPU as in O3 rather than the L2-norm in O2, but it is minor. Simulations using imperfectly estimated Hk in the SRR do not seem to further enhance the performance difference of using the robust Lorentzian norm as compared to using the L2-norm; however, more rigorous investigations for this case would be interesting.

Table 2

Experimental results. PSNR and MSSIM values are given for tonemapped SRR estimates t(x^) relative to the tonemapped original image t(x), where t(·) denotes the MATLAB tonemapping function. The highest PSNR and MSSIM values for each of the original images are highlighted in bold.

ImageMethodParametersPSNR(t(x^),t(x))MSSIM(t(x^),t(x))
Memorial ChurchO1λ=0.0235.6590.8813
O2λ=0.0336.4350.9259
O3λ=0.02, T1=0.1, T2=0.136.7070.9275
Mount TamO1λ=0.0535.9010.8771
O2λ=0.0537.3070.9054
O3λ=0.05, T1=0.1, T2=0.137.2900.9055

For the Mount Tam image, additional reconstruction results for the objective function O1 are shown in Fig. 2(g)2(i), with λ=10, 0.1 and 106 used respectively. It is seen that the increased value of λ from 0.05 to 0.1 gives a less noisy result compared to the LS reconstruction in Fig. 2(d); however, the edge artifacts are magnified. A too-high value for λ leads to oversmoothing, while a too-low value for λ results in severe noise amplification due to the poor condition number of the inverse problem. For the reconstruction result in Fig. 2(i) with λ=106, K=8 images are used, four for each of the two different exposure durations, which means that the problem is full rank. Thus, it is clear that due to the ill-conditioned structure, regularization is necessary, which in turn gives numerical reconstruction errors particularly at image edges. These errors have a large perceptual impact for low-illuminance levels, as results show when performing SRR in the illuminance domain as for O1. For O2 and O3, as an example, setting λ for SRR of the Mount Tam image to 0.01 or 0.1 instead of the current choice of 0.05 does not cause much noticeable difference in the results. The SRR method presented in this paper could straightforwardly be applied to color images, by treating each RGB color channel individually. This gives satisfactory results, even though it is not optimal in terms of color fidelity. The RGB channels should then be coded jointly, such as for the LDR L*a*b color space.

5.

Conclusions

This paper treats SRR of differently exposed LDR images. A formulation of the HDR SRR problem in which perceived difference is minimized in an appropriate, PU image domain has been proposed. As shown in experimental results, even small numerical reconstruction errors will have a large perceptual impact in dim image areas if the illuminance domain is used in the objective function, due to the nonlinearity of the illuminance domain to perceived brightness of the HVS. This issue is overcome by using the proposed approach. Because the proposed residual function is non-Gaussian, an alternative to the L2-norm was suggested, which, however, only gave a minor improvement in the presented experiments.

References

1. 

J. KuangG. JohnsonM. Fairchild, “icam06: a refined image appearance model for HDR image rendering,” J. Vis. Comun. Image Represent. 18(5), 406–414 (2007).JVCRE71047-3203http://dx.doi.org/10.1016/j.jvcir.2007.06.003Google Scholar

2. 

E. AllenS. Triantaphillidou, The Manual of Photography, 10th edn., Focal Press, Oxford (2011).Google Scholar

3. 

P. DebevecJ. Malik, “Recovering high dynamic range radiance maps from photographs,” in Proc. 24th Ann. Conf. Computer Graphics and Interactive Techniques, SIGGRAPH ‘97, pp. 369–378, ACM Press/Addison-Wesley Publishing Co., New York (1997).Google Scholar

4. 

M. Cadiket al., “Evaluation of HDR tone mapping methods using essential perceptual attributes,” Comput. Graph. 32(3), 330–349 (2008).0097-8493http://dx.doi.org/10.1016/j.cag.2008.04.003Google Scholar

5. 

S. C. ParkM. K. ParkM. G. Kang, “Super-resolution image reconstruction: a technical overview,” Signal Process. Mag. 20(3), 21–36 (2003).ISPRE61053-5888http://dx.doi.org/10.1109/MSP.2003.1203207Google Scholar

6. 

S. Farsiuet al., “Fast and robust multiframe super resolution,” IEEE Trans. Image Process. 13(10), 1327–1344 (2004).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2004.834669Google Scholar

7. 

M. EladA. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Trans. Image Process. 6(12), 1646–1658 (1997).IIPRE41057-7149http://dx.doi.org/10.1109/83.650118Google Scholar

8. 

S. Farsiuet al., “Advances and challenges in super-resolution,” Int. J. Imaging Syst. Technol. 14(2), 47–57 (2004).IJITEG0899-9457http://dx.doi.org/10.1002/(ISSN)1098-1098Google Scholar

9. 

L. Pickup, “Bayesian methods for image super-resolution,” Comput. J. 52(1), 101–113 (2009).CMPJA60010-4620http://dx.doi.org/10.1093/comjnl/bxm091Google Scholar

10. 

S. D. BabacanR. MolinaA. K. Katsaggelos, “Variational Bayesian super resolution,” IEEE Trans. Image Process. 20(4), 984–999 (2011).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2010.2080278Google Scholar

11. 

F. SroubekG. CristobalJ. Flusser, “A unified approach to superresolution and multichannel blind deconvolution,” IEEE Trans. Image Process. 16(9), 2322–2332 (2007).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2007.903256Google Scholar

12. 

G. HarikumarY. Bresler, “Perfect blind restoration of images blurred by multiple filters: theory and efficient algorithms,” IEEE Trans. Image Process. 8(2), 202–219(1999).IIPRE41057-7149http://dx.doi.org/10.1109/83.743855Google Scholar

13. 

M. Angelopoulouet al., “Robust real-time super-resolution on FPGA and an application to video enhancement,” ACM Trans. Reconfig. Technol. Syst. 2(4), 22:1–22:29 (2009).1936-7406http://dx.doi.org/10.1145/1575779Google Scholar

14. 

J. ChoiM. K. ParkM. G. Kang, “High dynamic range image reconstruction with spatial resolution enhancement,” Comput. J. 52(1), 114–125 (2009).CMPJA60010-4620http://dx.doi.org/10.1093/comjnl/bxm080Google Scholar

15. 

F. SchubertK. SchertlerK. Mikolajczyk, “A hands-on approach to high-dynamic-range and superresolution fusion,” in IEEE Workshop on Applications of Computer Vision (WACV), pp. 1–8 (2009).Google Scholar

16. 

M. GevrekciB. Gunturk, “Superresolution under photometric diversity of images,” EURASIP J. Appl. Signal Process. 2007, 036076 (2007).1110-8657http://dx.doi.org/10.1155/2007/36076Google Scholar

17. 

H. ZimmerA. BruhnJ. Weickert, “Freehand HDR imaging of moving scenes with simultaneous resolution enhancement,” Comput. Graph. Forum 30(2), 405–414 (2011).CGFODY0167-7055http://dx.doi.org/10.1111/j.1467-8659.2011.01870.xGoogle Scholar

18. 

T. Bengtssonet al., “Regularized optimization for joint super-resolution and high dynamic range image reconstruction in a perceptually uniform domain,” in Int. Conf. on Acoustics, Speech, and Signal Proc., ICASSP 2012, pp. 1097–1100 (2012).Google Scholar

19. 

V. PatanavijitS. Jitapunkul, “A Lorentzian stochastic estimation for a robust iterative multiframe super-resolution reconstruction with Lorentzian-Tikhonov regularization,” EURASIP J. Adv. Signal Process. 2007(2), 1–21 (2007).http://dx.doi.org/10.1155/2007/34821Google Scholar

20. 

Z. Wanget al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2003.819861Google Scholar

21. 

22. 

M. FairchildG. Johnson, “iCAM framework for image appearance, differences, and quality,” J. Electron. Imaging 13(1), 126–138 (2004).JEIME51017-9909http://dx.doi.org/10.1117/1.1635368Google Scholar

Biography

OE_52_10_102003_d001.png

Tomas Bengtsson is a PhD student in the signal processing research group at the Chalmers University of Technology, in Gothenburg, Sweden. He obtained a BS degree in electrical engineering and an MS degree in communication engineering at Chalmers University of Technology between 2004 and 2009. His research lies in signal processing and sensing, focusing on image processing, in particular high dynamic range imaging and super-resolution techniques. He is involved in a project on visual quality measures, with the aim to explore algorithms for the enhanced display of images and videos with applications toward the automotive industry, and in particular car safety.

OE_52_10_102003_d002.png

Tomas McKelvey obtained an MS degree in electrical engineering at Lund University between 1987 and 1991 and received his PhD in automatic control at Linköping University in 1995. Between 1995 and 1999, he held research and teaching positions at Linköping University and became docent in 1999. Between 1999 and 2000 he was a visiting researcher at University of Newcastle, Australia. Since 2000, he has been with Chalmers University of Technology, becoming a full professor in 2006 and the head of the signal processing group at Chalmers in 2011. Professor McKelvey’s research interests are model based in statistical signal processing, system identification, and image processing with applications to biomedical engineering and combustion engines.

OE_52_10_102003_d003.png

Irene Yu-Hua Gu received her PhD degree in electrical engineering from Eindhoven University of Technology, Eindhoven, The Netherlands, in 1992. From 1992 to 1996, she was a research fellow at Philips Research Institute IPO, Eindhoven, The Netherlands, and did post-doctorate work at Staffordshire University, Staffordshire, United Kingdom. She was also a lecturer at the University of Birmingham, Birmingham, United Kingdom. Since 1996, she has been with the Department of Signals and Systems, Chalmers University of Technology, Göteborg, Sweden, where she has been a full professor since 2004. Her research interests include statistical image and video processing, object tracking and video surveillance, pattern classification, and signal processing with applications to electric power systems. She has been an associate editor with the EURASIP Journal on Advances in Signal Processing since 2005, and on the editorial board for the Journal on Ambient Intelligence and Smart Environments (JAISE) since 2011.

Tomas B. Bengtsson, Tomas McKelvey, Irene Yu-Hua Gu, "Super-resolution reconstruction of high dynamic range images in a perceptually uniform domain," Optical Engineering 52(10), 102003 (24 May 2013). http://dx.doi.org/10.1117/1.OE.52.10.102003
Submission: Received ; Accepted
JOURNAL ARTICLE
11 PAGES


SHARE
KEYWORDS
High dynamic range imaging

Cameras

Super resolution

Sensors

Inverse problems

Data modeling

Image registration

RELATED CONTENT

Superresolution-of-text-from-nonideal-video
Proceedings of SPIE (January 19 2006)
Super-resolution-PMD-camera-for-applied-metrology
Proceedings of SPIE (March 03 2016)

Back to Top