Snapshot spectropolarimetric imaging using a pair of filter array cameras

Abstract. Spectral and polarization imaging (SPI) is an emerging sensing method that permits the analysis of both spectral and polarization information of a scene. The existing acquisition systems are limited by several factors, such as the space requirement, the inability to capture quickly the information, and the high cost. We propose an SPI acquisition system with a high spatial resolution that combines six spectral channels and four polarization channels. The optical setup employs two color-polarization filter array cameras and a pair of bandpass filters. We define the processing pipeline, consisting of preprocessing, geometric calibration, spectral calibration, and data transformation. We show that, around specular highlights, the spectral reconstruction can be improved by filtering the polarized intensity. We provide a database of 28 spectropolarimetric scenes with different materials for future simulation and analysis by the research community.

Based on a recent review that we conducted, 22 snapshot SPI acquisition systems have been categorized as a division of amplitude [23][24][25] or polarization grating devices. [26][27][28] Table 1 summarizes the recent practical devices from the state-of-the-art systems. Many methods have significant complexity and require either modulation processes, high-cost optics, or moving parts that reduce the sampling resolution. Some imaging setups are also often big to accommodate for the optical paths. We believe that a high resolution and simpler spectropolarimetric imaging system is still to be investigated.
Polarization filter array (PFA) and color polarization filter array (CPFA) sensors, such as the Sony IMX250 MZR or IMX250 MYR, 33 have been integrated by several camera manufacturers in affordable commercial devices. However, these sensors do not have sufficient spectral resolution for accurate spectral sensing as they capture only three spectral channels. Based on these devices, this paper proposes a compact and cost-effective setup capable of a snapshot acquisition of multispectral polarization data. Compared with other snapshot systems presented in Table 1, this work has the advantage of a high spatial resolution, with a relatively good spectral resolution in the full visible range.
The paper is organized as follows. In Sec. 2, we present the hardware and the preprocessing steps of the developed prototype. In Sec. 3, we define the procedure to calibrate the spectropolarimetric data. We then describe the transformation of the data, before providing a discussion in Sec. 5 and a conclusion in Sec. 6.

Hardware
The proposed SPI setup consists of two Triton TRI050S-QC 5.0 MP color polarization cameras, manufactured by Lucid Vision Lab. These cameras embed the SONY IMX250 MYR sensor, 33 the properties of which are shown in Fig. 1(a). This is a division-of-focal-plane sensor, in which each pixel senses one spectral channel and one polarization channel. The filter arrays are composed of three spectral channels (c ∈ R; G; B) spatially organized in a Quad-Bayer arrangement 34 and four polarization channels (p ∈ f0;45;90;135g deg) arranged in a Chun pattern 35 [see Fig. 1(b)].
As shown in Fig. 1(d), the two cameras are in a stereo configuration. In front of each camera, a lens with a fixed focal length of 12 mm is added. They are configured not to perform any analog/digital gain or white balance corrections, to ensure that the raw data are linear and not corrupted by any preprocessing. MSI usually requires complex and expensive acquisition devices. One can reduce the complexity by assuming that the reflectance spectrum is smooth across the visible wavelengths and thus by limiting the number of spectral channels. Examples in the literature demonstrate that MSI in the visible can be achieved using six channels with a relatively good spectral estimation. 36 A practical method is to use conventional RGB cameras combined with a set of spectral bandpass filters. 1,37,38 This is usually done in multiple shots, placing one filter at a time. We use a similar technique by combining each of the two cameras with a fixed bandpass filter, so the two cameras have different spectral sensitivities. The first bandpass filter is a blue-green BG39 filter (noted bg), and the second one is a yellow GG475 filter (noted y), both manufactured by Schott Glass Technologies. We selected the same filters as those of Berns et al. 37 The bandpass filters are directly mounted on the objective lenses. From these combinations, we obtain six spectral channels in the visible range. Figure 2 shows the spectral sensitivities of the global system, along with the acquisition configuration.
A uniform, unpolarized light source is positioned approximately normal to the base of the configuration.
In the next sections, we will describe the processing of the images from the setup. The steps are shown in Fig. 3.

Preprocessing
First, two preprocessing are applied to each of the raw images: a demosaicking algorithm and a flat-field correction.

Demosaicking
CPFA images have a resolution of 2448 × 2048 pixels, with each sensor pixel sensing only one spectral band among three and one polarization direction among four. Similar to color filter array (CFA), CPFA images need their spatial resolution to be reconstructed to avoid misinterpretation in channel registration for image analysis. This is especially important because CPFA images are composed of 12 channels, and thus the spatial distribution of channels is more sparse compared with that of CFA. Color and/or polarization artifacts may occur at the edges of objects. A dedicated demosaicking algorithm is therefore required to recover the incomplete color and polarization samples per pixel position. To this end, we employ the recent state-of-the-art demosaicking algorithm dedicated to CPFA, which is the Edge-aware residual interpolation. 39 It is adapted to the specific spatial arrangement of the Sony IMX250 MYR sensor. The full-resolution output is thus a 12-channel image. We call the pixel response ρ i , where i ¼ fc; θg indexes the channel with c ∈ R; G; B and p ∈ f0;45;90;135g deg.

Flat-field correction
The linearity behavior of the cameras was verified using the procedure described in Ref. 40, and we found that there is no need to correct for nonlinearity. Nevertheless, fixed-pattern noise, i.e., the dark noise and the spatial nonuniformity of the illuminant, has to be characterized and corrected. The dark noise is measured by capturing an image with the cap on the camera (called the dark image), producing a dark uniform field. For correcting the spatial nonuniformity of the lighting, an image of the Xrite ColorChecker white balance chart is acquired (called the white image), which produces a white uniform field. The flat-field correction method is from Hardeberg 41 and is implemented in the Colour toolbox from Westland et al. 40 The correction is done for each pixel of the demosaicked image and for each channel i as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 6 6 3 where ρ i;w and ρ i;d are the pixel responses of channel i in the white and dark images, respectively, and ρ i;w and ρ i;d are the mean values over all pixels in the white and dark images, respectively.

Calibration
Two calibrations are done after preprocessing: (1) a geometric stereo calibration to establish the correspondence between the image pair from the two cameras and (2) a spectral calibration to estimate the reflectance of the scene. (Using the polarimetric characterization in Ref. 42, we found that the polarization measurement is near to ideal for these sensors (very high extinction ratios of the micropolarizers), so there is no need to calibrate polarimetrically.)

Geometric Stereo Calibration
The stereo camera calibration aims at finding a geometric transform between two cameras in a world coordinate, which is mathematically modeled by a projection matrix. The projection matrix consists of a pair of intrinsic and extrinsic matrices, which are relative to the camera position, orientation, focal length, and lens distortion coefficients. The determination of these parameters is done by taking multiple image pairs of a checkerboard with known dimensions and oriented at different angles. Then, a rectification step projects the images onto a common image plane in such a way that the corresponding points are located on the same row. 43 This is followed by the disparity map computation through semiglobal matching 44 to find the displacement between conjugate pixels in the stereo image pair. The channel i ¼ fG; 0 degg is used for stereo matching and the disparity map computation. After matching, we recombine the 12 channels of the left and right cameras (with bg and y filter, respectively) to construct an image with a total of 24 channels. With this optical configuration (baseline, working distance, and focal length), the usable spatial resolution is 1330 × 1920 pixels (2.6 MP). We note that the pixel response ρ j , where j ¼ fs; θg, indexes the spectropolarimetric channel with s ∈ R bg ; R y ; G bg ; G y ; B bg ; B y and p ∈ f0;45;90;135g deg.

Spectral reconstruction method
The spectral properties of surfaces are approximated by a few basis vectors and described by a low-dimensional linear model. 45 Here, we describe the spectral calibration to estimate the reflectance data r of a scene from the vector ρ containing the digital values ρ j . We use the method based on linear regression that links the spectral reflectance r to the camera responses I directly 46 as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 5 0 where the matrix M is a reconstruction operator and I is a vector containing the measured intensities by the spectral system. In our case, I is of size ½1 × 6 and contains the camera intensities I s , where s ∈ R bg ; R y ; G bg ; G y ; B bg ; B y . This is equivalent to calculating the first Stokes component 47 from ρ j as follows: Here, M is estimated using a set of reference spectra, which are previously measured by a spectrometer. We selected the Xrite Macbeth ColorChecker PassPort (MCCPP), which is composed of two main charts with known spectra: (1)  We assume that the training set is representative enough of the surface reflectances, which will be estimated a posteriori by the system. Another assumption is that the reflection from the charts is purely diffuse, so they do not reflect any specular component. We call R train and R test the set of known training and testing reflectance data, respectively, that are obtained from the Spectral Library of Chromaxion website. 48 The calibration of M is done using the pseudoinversion as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 2 0 5M where the + operator is the right Moore-Penrose pseudoinverse operator, I train is a ½6 × 24 matrix formed by column vectors containing averaged camera signals (over a 10 × 10 pixel area) of the 24 patches, R train is a ½36 × 24 matrix with the reference reflectance factors of the patches from 380 to 730 nm with a step of 10 nm, andM is a ½36 × 6 matrix that is an estimate of M.

Spectral estimation evaluation
To evaluate the performance of the spectral estimation, we apply the trained linear model to a set of camera responses from the creative enhancement chart [see Fig. 4(c)]. Thus, the spectral estimation is done on a set of 26 patches, different from the set used for training. The computation is performed on I test , which is a ½6 × 26 matrix containing the averaged camera responses, as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 6 9 9R test ¼MI test : To quantify the reconstruction error, we compute the goodness of fit coefficient (GFC) for each estimated patches with reflectance vectorr test . This gives a scalar value, computed as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 6 2 9 where N w ¼ 36 is the total number of samples in the considered wavelength range and k:k is the two-norm of a vector. A GFC of 0 indicates a bad fit, whereas 1 is a perfect fit. Figure 4(e) shows the comparison between the estimated and measured (reference) reflectance spectra of the 26 patches of the creative enhancement chart. We can see that the estimated spectra are close to the reference data. The worst GFC values occur especially when high reflectances are encountered (patches 1, 7, and 8 from the first line of patches). This can be due to the relatively low sensitivities (and thus a low SNR) of our system in this range of wavelengths.
As a comparison, we computed the GFC on the test chart using either three channels (s ¼ R bg ; G bg ; B bg ) or six channels. We found that our six-channel system increases the GFC by an amount of 0.16 in average over the 26 patches.

Data Transform
Here, we transform the calibrated data using three selected representations: (1) Stokes formalism, (2) reflectance computed from polarization filtered intensities, and (3) color image.

Stokes
The Stokes parameters are determined from an average of intensity measurements over area, wavelength, and solid angle. If we consider a multispectral system with relatively narrow bands, the spectral dependence of polarization information can be considered to be an additional useful information so that each channel s senses a different Stokes vector. 49 Thus, from ρ j , we estimate the Stokes vector S at each pixel position and for each spectral channel s as The azimuth angle of linear polarization (AOLP) is also computed from the Stokes components. It represents the angular orientation of the main axis of the polarization ellipse with respect to the chosen angular reference used for the system: In Fig. 5, we plot the luminance and DOLP G y of the 24 MCCPP patches used for spectral calibration. First, we can see that there is a low but significant polarization signature for the patches. Second, we can see that we have an inverse relationship between luminance and DOLP, especially by looking at the values of the last six neutral patches (19 to 24). It can be explained by the fact that darker patches have a much smaller diffuse component (and thus a higher specular component, which is more polarized) due to a greater absorption of light. This effect is also shown in Fig. 6, in which darker balls exhibit a higher degree of polarization.
A visualization of the Stokes transform on the scene called "resin balls" is shown in Fig. 6.

Reflectance
The reflection of light upon a surface can be analyzed by an additive model, in which the total intensity is a diffuse component (resulting from the subsurface reflection phenomenon) plus a specular component (resulting from the surface reflection). Existing methods for separating reflection components use a polarization filter rotated in front of the image sensor. 8,50 It is often assumed that the diffuse component tends to be weakly polarized compared with the specular component. In our case, we believe that the polarization intensity filtering can benefit the reflectance estimation when a significant specular reflection occurs.
To this end, we compute the polarization filtered intensities, 51 called U s , by the following equation: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 6 0 9 This has the effect of removing the polarization component from the total intensity I s . Thus, we can compute the spectral reflectance for any pixel position from vector U instead of vector I [as in Eq. (5)]: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 5 3 2r 0 ¼MU: To verify the effect of removing the polarized intensity component, we compute the reflectance using either I or U. We selected 5 × 5 pixel regions near occluding boundaries of the objects, where specular reflection is assumed to have a greater influence on the DOLP of the reflected light. Results are shown in Fig. 7 for four regions of interest. It appears that we get a better fit when using U for the four cases.
where C (36 × 3) are the CIE (1931) color matching functions (CMFs). 52,53 Then, we convert the CIE XYZ values to sRGB 54 for visualization. The XYZ to sRGB transform is typically achieved in two steps: a first linear transform of XYZ values to linear RGB values, followed by a gamma correction of 2.2 to adapt for nonlinear behavior of monitors.

Database
We provide a database of 28 scenes with different materials, including plastic, wood, and metal. The scene called "chart" is custom plastic patches manufactured with a 3D printer, with a varying layer height parameter. Figure 8 shows the database in sRGB representation, after the color transform from Sec. 4. The brand logos are removed from the database images.The images are cropped to select the relevant content of the objects. The code and database folder is organized as follows: 1. Data folder: a. Pairs of preprocessed images for each scene, after demosaicking, flat-field correction, and geometrical correction. They are multipage.tif files (one page per polarization angle).

Misc folder:
a. Spectral calibrated matrix and b. CIE XYZ values.
Three MATLAB scripts are provided to transform the data to Stokes, reflectance, and color images, respectively. The dataset and code are available in a GitHub repository via this link 55 or can be sent on request to the corresponding author.

Discussion
Our stereo setup offers the possibility to perform reflectometry by fusing observations of the polarization state of light reflected off the surface from two different views. 51 In a binocular stereo configuration, the specular intensity can be different from the two sensors, depending on the shape of the object to be considered and the lightning configuration. As an example, our stereo setup can be configured to optimize the roughness measurement by modifying the vergence, the working distance, or the focal length of the camera lenses. In the database, this is unconstrained because surfaces in the scene are multiple and unknown. But in the case of measurement on samples with a priori knowledge (roughness range, convexity shape, and index of refraction), the setup can be slightly adapted. 56 This is not investigated in this work and will be considered in future work.
Some drawback can come from the disparity computation, especially when imaging thick objects. From the stereo point of view, typical problems of matching occur when (1) there are occlusions (no corresponding data in one of the two images), (2) there are specular highlights, or (3) a pixel pertains to a more or less flat region. To be successful, the matching algorithm needs textures (but not periodic); if there is nothing to really match inside an image pair, the algorithm cannot make correspondences efficiently. Another potential factor comes from the use of an image pair taken with different spectral sensitivities. 57 This produces a noisy disparity map and thus noisy Stokes/spectral/color representations of data. We believe that an evolved disparity map computation method that is spectrally invariant is still to be developed.
From our knowledge, computing the reflectance spectra from polarimetric Stokes data (S 1 and S 2 ) has never been investigated in the literature. We verified that polarization information can help the reflectance estimation, whereas previous work assumed that all of the surfaces measured by a spectral camera are mainly diffuse. Some other works used a rotated linear polarizer in front of the spectral camera to filter globally the specular reflection. In our work, this is done by pixel because we have the measurement of the state of polarization for each pixel. Nevertheless, this method is limited by the amount of polarization intensity that is able to be filtered, which varies with the object and lightning configuration, i.e., the angle of incidence/reflection of the light. We believe that a dedicated separation of diffuse/specular components using spectral and polarization, such as in a prior work, 8 can further improve the reflectance estimation for specular surfaces. This is not included in this work but can be investigated as an additional processing block to be included in the framework from Fig. 3.
This work opens perspectives in several application fields, from the stereo acquisition of spectropolarimetric images to new image processing of the data. The output of our framework, which is transformed data (Stokes, reflectance, and color) can be extended to specific surfaces and be an input to computer vision tasks such as feature extraction using principal component analysis, semantic segmentation, or defect detection.

Conclusion
We developed a snapshot spectropolarimetric acquisition system. It has the advantages of a high spatial resolution and a relatively good spectral reconstruction precision. We designed a full processing pipeline that consists of preprocessing, geometric calibration, spectral calibration, and data transform. We also provide a method to better estimate the reflectance of specular surfaces using a filtering process based on the Stokes components. A database of 28 high-resolution spectropolarimetric images is made available online, along with the code to transform the data.
The configuration of the proposed system makes it possible to reconstruct the 3D spectral information using two complementary strategies: (1) the polarization information (the so-called "shape from polarization" method) or (2) the stereo depth computation. Image processing using both polarization and stereo information is to be investigated in future work.