## 1.

## Introduction

Three-dimensional (3-D) modeling techniques have been intensively investigated in the field of computer vision. The techniques used can be categorized into two types: the geometric approach, which uses the geometrical structure of the scene, and the photometric approach, which uses the light reflected from the scene. Shape-from-specularity has been extensively surveyed by Ihrke et al.^{1}

A smooth surface normal can be obtained using a photometric approach. Polarization^{2}3.^{–}^{4} is one of the characteristics that can be used to obtain a smooth surface normal. Koshikawa and Shirai^{5} used circular polarization to estimate the surface normal of a specular object. However, extending their method to a dense estimation of surface normal causes an ambiguity problem that the surface normal cannot be uniquely determined. Note that, throughout our paper, we use the term “ambiguity” if the surface normal cannot be uniquely determined and if there are two or more candidates of surface normals. Guarnera et al.^{6} extended their method to determine the surface normal uniquely, by changing the lighting conditions in two configurations. Morel et al.^{7} also disambiguated it using multiple illumination; however, they did not solve the ambiguity of the degree of polarization (DOP) because they did not use circular polarization. Saito et al.^{8} proposed the basic theory for estimating the surface normal of a transparent object using polarization. Barbour^{9} approximated the relation between the surface normal and the DOP and developed a commercial sensor for shape-from-polarization. Drbohlav and Sara^{10} and Ngo et al.^{11} solved the ambiguity problem of uncalibrated photometric stereo via polarization analysis and estimated both the light direction and the surface normal of a nonspecular object. Miyazaki et al.^{12} estimated the surface normal of a transparent object by analyzing the polarization state of the thermal radiation from the object. Miyazaki et al.^{13} attempted to estimate the surface normal of a diffuse object from a single view. Miyazaki et al.^{14} used a geometrical invariant to match the corresponding points from two views to estimate the surface normal of a transparent object. Miyazaki and Ikeuchi^{15} solved the inverse problem of polarization ray tracing to estimate the surface normal of a transparent object. Wolff and Boult^{16} developed the basic theory for showing that polarization analysis can estimate a surface normal from two views if the corresponding points are known. Rahmann^{17} indicated that the surface normal can be obtained from polarization. Rahmann and Canterakis^{18} estimated the surface normal of a specular object from multiple views by iteratively finding the corresponding points of these views. Rahmann^{19} proved that polarization analysis can estimate quadratic surfaces only if the corresponding points are searched iteratively. Atkinson and Hancock^{20} analyzed the local structure of an object to find the corresponding points between two viewpoints in order to calculate the surface normal from the polarization of two views. Atkinson and Hancock^{21} also provided a detailed investigation of surface normal estimation for a diffuse object from a single view. Huynh et al.^{22} estimated not only the surface normal but also the refractive index. Some of these methods can be used for estimating the surface normal of a specular object; however, the corresponding points of multiple views are required for the estimation process.

Recently, researchers have integrated the geometric approach with the photometric approach to obtain rich information about the object shape. They combined the rough 3-D geometry obtained using multiview stereo or laser range sensors with the smooth surface normal obtained using the photometric stereo method.^{23} Ochiai et al.^{24} mapped the surface normal obtained from photometric stereo measurements onto the mesh model obtained from a 3-D laser sensor. Fua and Leclerc^{25} combined binocular stereo and shading information and obtained the shape of an object represented by facets. Maki et al.,^{26} Zhang et al.,^{27} Lim et al.,^{28} and Higo et al.^{29} observed an object using a single light source and a single camera and obtained the 3-D shape of a textureless diffuse object. Zickler et al.^{30} proposed a so-called Helmholtz stereo method, which can estimate the 3-D geometry and surface normal of an object that has an arbitrary bidirectional reflectance distribution function. These methods suggest that combining the geometric and photometric approaches is important; however, these photometric stereo methods, except for the Helmholtz stereo method, can obtain the surface normal of only a diffuse surface. The dense surface normal of a specular black object cannot be obtained using the Helmholtz stereo method because of the discretized sampling of the light source. Kadambi et al.^{31} combined the 3-D geometry obtained by a time-of-flight (ToF) sensor and the surface normal obtained from the DOP. Unlike space carving, which can be applied to a completely black object, a ToF sensor cannot measure such objects because the laser does not reflect at a black surface.

Johnson and Adelson^{32} pressed an elastomer slab onto a target object and applied the photometric stereo method to the elastomer slab. Kawasaki and Furukawa^{33} projected the shadow instead of stripe-pattern light to ensure that the measurement result would not depend on the reflection property of target objects. Michel et al.^{34} proposed a method for estimating the shapes of objects composed by any material using the user interaction as a clue. In contrast to these methods, which require additional human tasks, the shape-from-silhouettes (or, volumetric intersection, visual hull, space carving) method^{35}36.37.^{–}^{38} is very useful in some cases. Yamazaki et al.^{39} used the shadow to apply the visual hull method to objects of any material and with any reflectance property. Typically, the silhouette of visual information of a target object is sufficient in shape-from-silhouettes tasks, and the silhouette of the shadow is unnecessary in most situations.

In this study, we propose a method for creating a 3-D model using both polarization analysis and space carving. The principal target objects are smooth surfaces such as plastics and ceramics. We first calibrate multiple cameras to calculate the geometrical relationships among them. We observe the object from multiple viewpoints using a polarization imaging camera. First, we apply space carving to estimate the rough structure of the object. Space carving can obtain a visual hull of a textureless object, such as a black object with high specularity; however, it cannot obtain the shape of a concave portion of the object. The 3-D shape obtained by conventional space carving is usually not smooth; thus, we add polarization information. The shape-from-polarization method can estimate the shapes of black objects with high specularity, which cannot be estimated using the photometric stereo method because there are no diffuse reflections. The polarization information of the object is obtained from multiple viewpoints using a polarization imaging camera. The polarization data must be analyzed at identical points on the object surface when observed from multiple viewpoints; thus, the shape obtained by space carving can be used for estimation of the surface normal from the polarization data. We map the surface normal obtained from the polarization information onto the 3-D surface of the object.

A surface normal can be constrained by the DOP. For example, Miyazaki et al.,^{14} Kadambi et al.,^{31} and several other researchers used DOP for estimating the surface normal from specular reflection. However, DOP depends on the refractive index and surface roughness. We do not use DOP, but phase angle, explained later, because the DOP-based method requires knowing the refractive index and surface roughness. The concept of the algorithm is the same as that of Rahmann and Canterakis;^{18} however, the computation process is completely different from their method. They also computed the corresponding points, but our method uses the corresponding points obtained by space carving. Our method is based on singular value decomposition (SVD), which can minimize the least-squared error as much as possible, owing to the strong constraint on the shape information, namely, the corresponding points. Rahmann^{19} proved that a quadratic surface can be estimated only when the corresponding points are searched at the same time as the surface normal is estimated. This limitation is a crucial problem for shape estimation. We overcome this problem via polarization analysis in order to estimate a wide variety of shapes. The corresponding points obtained by space carving solve Rahmann’s problem (Fig. 1). In addition to a spherical object, one of the quadratic surfaces, Sec. 3 shows the result for an object that is not a quadratic surface, such as a rabbit-shaped object. We also show both successful and failed results for colored objects in Sec. 3.

We describe our method in Sec. 2 and present our results in Sec. 3. We discuss the advantages and disadvantages of our method and conclude the paper in Sec. 4.

## 2.

## Estimating the Surface Normal from Polarization Information Obtained from Multiple Views

## 2.1.

### Polarization

We explain only linear polarization since circular polarization is not related to our method. Light is an electromagnetic wave, and wave oscillates. Electromagnetic wave oscillating in only one direction is said to have perfectly linear polarization, while electromagnetic wave oscillating isotropically in all directions is called unpolarized light (Fig. 2). The intermediate state of such light is called partially polarized light. DOP is one of the metrics used to represent the polarization state of light. Its value varies from 0 to 1, with 1 representing perfectly polarized light and 0 representing unpolarized light. Light that has penetrated into a linear polarizer becomes perfectly polarized light. The light will transmit if the orientation of the linear polarizer and the oscillating orientation of the incoming electromagnetic wave are collinear, while the light will be blocked if these two orientations are orthogonal.

The maximum light observed while rotating the polarizer is denoted as ${I}_{\mathrm{max}}$, and the minimum light is denoted as ${I}_{\mathrm{min}}$. The polarizer angle at which ${I}_{\mathrm{max}}$ is observed is called the phase angle $\psi $ (Fig. 3).

Suppose that the surface of the target dielectric object is optically smooth. Figure 4 represents light traveling through the air and hitting the object. The angle between the surface normal and the incident light is denoted as $\theta $, and that between the surface normal and the reflected light is also denoted as $\theta $ since the surface is optically smooth.

The plane consisting of the incident light and surface normal vectors is called the reflection plane. The reflected light vector is also coplanar with the reflection plane since the surface is optically smooth. The orientation of the reflection plane is denoted as $\phi $, which is defined on a certain $xy$-plane and is defined as an angle between $x$-axis and the reflection plane projected on $xy$-plane.

The surface normal is represented in polar coordinates (Fig. 5), where the azimuth angle is denoted as $\phi $ and the zenith angle is denoted as $\theta $. The azimuth angle $\phi $ coincides with the angle of the reflection plane $\varphi $ ($\phi =\varphi $). The DOP is defined as follows:

If we denote the refractive index of the object as $n$, the DOP of the specularly reflected light is represented as follows:

## (2)

$$\rho =\frac{\sqrt{{\mathrm{sin}}^{4}\text{\hspace{0.17em}}\theta \text{\hspace{0.17em}}{\mathrm{cos}}^{2}\text{\hspace{0.17em}}\theta ({n}^{2}-{\mathrm{sin}}^{2}\text{\hspace{0.17em}}\theta )}}{[{\mathrm{sin}}^{4}\text{\hspace{0.17em}}\theta +{\mathrm{cos}}^{2}\text{\hspace{0.17em}}\theta ({n}^{2}-{\mathrm{sin}}^{2}\text{\hspace{0.17em}}\theta )]/2}.$$The graph of the DOP is shown in Fig. 6.

## 2.2.

### Calculating the Surface Normal from Two Viewpoints

Section 2.1 described the relationship between the surface normal and the phase angle. However, we cannot determine the surface normal uniquely because only the orientation of the reflection plane including the surface normal is obtained. We must observe the object from two viewpoints to solve this problem.

Figure 7 represents the situation of our problem. A camera has its coordinate system $x$-axis, $y$-axis, and $z$-axis. Camera’s $z$-axis is along the optical axis. The azimuth angle $\phi $ and the reflection plane angle $\varphi $ ($\phi =\varphi $) are the angle between the $x$-axis of camera coordinate system and the line caused by the intersection between the reflection plane and the $xy$-plane. The phase angle $\psi $ is 90 deg rotated from the azimuth angle.

We analyze the two phase angles at the same surface point, corresponding to the known 3-D geometry. Our method assumes that the approximate 3-D geometry of the target object is known by space carving, which we explain later (Sec. 2.4). For the time being, we assume that the true 3-D geometry of the object is known, for simplicity in explaining the fundamental theory. The relationship between the surface normal vector and the azimuth angle is shown in Fig. 8, and the azimuth angle is 90 deg rotated from the phase angle. The relationship between the azimuth angles for each of the cameras, represented as ${\phi}_{1}$ and ${\phi}_{2}$, and the normal vector of the reflection plane, represented as ${\mathbf{a}}_{1}$ and ${\mathbf{a}}_{2}$, is shown in Eq. (3):

## (3)

$${\mathbf{a}}_{1}=\left[\begin{array}{c}\mathrm{cos}({\varphi}_{1}+90\xb0)\\ \mathrm{sin}({\varphi}_{1}+90\xb0)\\ 0\end{array}\right]=\left(\begin{array}{c}\mathrm{cos}\text{\hspace{0.17em}}{\psi}_{1}\\ \mathrm{sin}\text{\hspace{0.17em}}{\psi}_{1}\\ 0\end{array}\right),$$## (4)

$${\mathbf{a}}_{2}=\left[\begin{array}{c}\mathrm{cos}({\varphi}_{2}+90\xb0)\\ \mathrm{sin}({\varphi}_{2}+90\xb0)\\ 0\end{array}\right]=\left(\begin{array}{c}\mathrm{cos}\text{\hspace{0.17em}}{\psi}_{2}\\ \mathrm{sin}\text{\hspace{0.17em}}{\psi}_{2}\\ 0\end{array}\right).$$## (7)

$$\left(\begin{array}{c}{\mathbf{a}}_{1}^{T}{\mathbf{R}}_{1}\\ {\mathbf{a}}_{2}^{T}{\mathbf{R}}_{2}\\ \mathbf{0}\end{array}\right)\left(\begin{array}{c}{n}_{x}\\ {n}_{y}\\ {n}_{z}\end{array}\right)=\left(\begin{array}{c}0\\ 0\\ 0\end{array}\right).$$## (8)

$$\mathbf{a}=\left[\begin{array}{c}\mathrm{cos}(\varphi +90\xb0)\\ \mathrm{sin}(\varphi +90\xb0)\\ 0\end{array}\right],$$## (9)

$$\tilde{\mathbf{a}}=\left[\begin{array}{c}\mathrm{cos}(\varphi +90\xb0+180\xb0)\\ \mathrm{sin}(\varphi +90\xb0+180\xb0)\\ 0\end{array}\right]=-\mathbf{a}.$$Therefore, the 180-deg amibiguity of reflection plane angle does not matter in our algorithm.

## 2.3.

### Calculating the Surface Normal from Multiple Viewpoints

This section explains the estimation process for the surface normal from the phase angle obtained from multiple viewpoints. The fundamental theory is similar to that explained in Sec. 2.2.

Figure 9 shows the relationship between the surface normal $\mathbf{n}$ of the surface point $p$ and the phase angle obtained from $K$ viewpoints. In Fig. 9, ${\phi}_{k}$ represents the azimuth angle of the surface point $p$ observed by the camera $k=(1,2,\dots ,K)$, and ${\mathbf{a}}_{k}$ represents the vector orthogonal to the reflection plane under the coordinate system of the camera $k$. Because ${\mathbf{a}}_{k}$ is orthogonal to the reflection plane, we obtain Eq. (12) using the phase angle ${\psi}_{k}$ or azimuth angle ${\phi}_{k}$:

## (12)

$${\mathbf{a}}_{k}=\left[\begin{array}{c}\mathrm{cos}({\varphi}_{k}+90\xb0)\\ \mathrm{sin}({\varphi}_{k}+90\xb0)\\ 0\end{array}\right]=\left(\begin{array}{c}\mathrm{cos}\text{\hspace{0.17em}}{\psi}_{k}\\ \mathrm{sin}\text{\hspace{0.17em}}{\psi}_{k}\\ 0\end{array}\right).$$## (13)

$$({\mathbf{R}}_{k}^{T}{\mathbf{a}}_{k})\xb7\mathbf{n}=0,\phantom{\rule[-0.0ex]{2em}{0.0ex}}(k=\mathrm{1,2},\cdots ,K).$$## (14)

$$\left(\begin{array}{c}{\mathbf{a}}_{1}^{T}{\mathbf{R}}_{1}\\ {\mathbf{a}}_{2}^{T}{\mathbf{R}}_{2}\\ \vdots \\ {\mathbf{a}}_{K}^{T}{\mathbf{R}}_{K}\end{array}\right)\left(\begin{array}{c}{n}_{x}\\ {n}_{y}\\ {n}_{z}\end{array}\right)=\left(\begin{array}{c}0\\ 0\\ \vdots \\ 0\end{array}\right),\phantom{\rule{0ex}{0ex}}\mathbf{An}=0.$$## (15)

$$\left(\begin{array}{c}{\mathbf{a}}_{1}^{T}{\mathbf{R}}_{1}\\ {\mathbf{a}}_{2}^{T}{\mathbf{R}}_{2}\\ \vdots \\ {\mathbf{a}}_{K}^{T}{\mathbf{R}}_{K}\end{array}\right)={\mathbf{UWV}}^{T},\phantom{\rule{0ex}{0ex}}=\mathbf{U}\left(\begin{array}{ccc}{w}_{1}& & \\ & {w}_{2}& \\ & & 0\end{array}\right)\left(\begin{array}{c}{\mathbf{v}}_{1}\\ {\mathbf{v}}_{2}\\ {\mathbf{v}}_{3}\end{array}\right).$$^{40}which can be calculated from the singular vector that has the smallest singular value, namely, the third row of ${\mathbf{V}}^{T}$ in Eq. (15). In the general case, $s$ is an arbitrary scalar coefficient; however, since the surface normal and the singular vectors are normalized vectors, $s$ would be either $+1$ or $-1$. Whether $s$ must be positive or negative can be easily determined to ensure that the surface normal will face toward the camera. The surface normal estimated by Eq. (16) is the optimal value that minimizes the squared error of Eq. (14) formulated by $K$ equations. The input data must be obtained from two or more viewpoints since the rank of the matrix $\mathbf{A}$ is 2. If we obtain the input data from more viewpoints, the influence of input noise will decrease.

If the reflection planes of the two cameras used are coplanar, as shown in Fig. 10, then the surface normal cannot be uniquely determined. In this degenerate case, the rank of the matrix $\mathbf{A}$ is 1. As shown in Fig. 11, an extra camera can solve this problem. If we have three or more cameras that are not collinear, we can uniquely determine the surface normal at any point on the object surface that is observed by these cameras.

## 2.4.

### Space Carving

The space carving method can be used to reconstruct the 3-D shape of an entire object. Suppose that a scene is captured by a camera whose position and orientation are known. The object shape is included in the convex hull (visual hull), which is generated by projecting the silhouette onto a global coordinate system. Here, a silhouette image is a binary image that distinguishes between the target object region and the background. An approximate shape is obtained because the object shape is included in the visual hull.

Compared with the stereo matching method, the space carving method has several advantages. For example, unlike stereo matching, space carving does not need to search corresponding points of the surface between multiple viewpoints. On the other hand, owing to the characteristics of the space carving method, 3-D shapes obtained using this method become convex hulls. However, there is a shortcoming whereby the shape of an object becomes larger than the true shape. Figure 12 shows one example for which the result of reconstruction using the space carving method is a convex hull.

## 2.5.

### Algorithm Flow

Figure 13 shows the algorithm flow of the proposed method, including the input and output for each process. In Fig. 13, the angular rectangle represents the process and the rounded rectangle represents the input and output.

We first calibrate the cameras, illuminate the object using a lighting dome, and obtain the polarization images from multiple viewpoints. We obtain each camera parameter from camera calibration procedure. Next, we extract the silhouette of the target object from the image using the background subtraction method and obtain the 3-D shape of the visual hull from the camera parameters and the silhouette images using the space carving method. We calculate the phase angle from the polarization data. Since we know the corresponding points of each image calculated from the camera pose obtained by camera calibration and the 3-D shape obtained by space carving, we can analyze the phase angle at the same surface point. Therefore, we obtain surface normal of the entire object surface using the phase angle obtained from multiple viewpoints.

To obtain a detailed representation of the surface shape of the object, we use both the geometrical and photometrical approaches. We use the space carving method for the geometrical approach and the shape-from-polarization method for the photometrical approach. The space carving method can estimate the 3-D shape of a textureless object; however, it cannot estimate the detailed smooth structure of the object surface. We therefore use the shape-from-polarization technique to estimate the detailed smooth structure of the object surface. Similar to the space carving method and unlike the photometric stereo method, the shape-from-polarization method can estimate the surface normal of a highly specular object, even when it is black.

## 3.

## Experiment

## 3.1.

### Simulation Results

First, we estimate the surface normal using simulation-generated input data. The target object is a smooth sphere, which is assumed to have only specular reflection. The object is illuminated from every direction.

## 3.1.1.

#### Simulation results for a sphere

In our simulation, 12 cameras are set horizontally to the object, and 12 more cameras are set 30 deg above the object. The arrangement of the simulation is shown in Fig. 14. The angle between cameras is set to 15 deg. The distance between each camera and the object is the same in this experiment.

The result of space carving is shown in Fig. 15(a). The length of the voxel space is 200. A rough estimate of the shape is obtained using this process. The smooth detailed structure of the surface shape is obtained by introducing the shape-from-polarization technique. Throughout this paper, we show the 3-D shape of the object as a shading image, where the light is illuminated from the frontal direction.

The result for the surface normal obtained through polarization analysis is shown in Fig. 15(b). The smooth surface of the sphere is clearly estimated. Table 1 shows the error values for the results, as shown in Fig. 15. The error is calculated as an angle (rad) between the estimated surface normal and the surface normal of the true shape. Table 1 shows the average, maximum, and minimum of this angle over all surface points. Table 1 indicates that the error for our result [Fig. 15(b)] is less than that for space carving [Fig. 15(a)].

## Table 1

Comparison between the estimated and true surface normals.

Space carving result | Our result | ||
---|---|---|---|

Angle between two vectors (rad) | Average | 0.100811 | 0.016366 |

Maximum | 0.369145 | 0.121151 | |

Minimum | 0.000000 | 0.000000 |

## 3.1.2.

#### Evaluating robustness to noise level

In this section, we add a random noise to the input phase angle. From this phase angle data, we estimate the surface normal, as shown in Fig. 16. The number of cameras used is 24. The variation of the Gaussian noise is 0.01 (rad) for Fig. 16(a), 0.05 (rad) for Fig. 16(b), 0.1 (rad) for Fig. 16(c), and 0.2 (rad) for Fig. 16(d). Figure 16(d) shows that the estimated surface normal is contaminated by the input noise artificially added to the polarization data.

Figure 17 shows the relationship between the added noise and the estimation error. The error is calculated using the procedure described in Sec. 3.1.1. The red line shows our result, and the blue line shows the space carving result. The error increases with increasing noise. For a noise level $<0.07$, our result is better than the space carving result.

## 3.1.3.

#### Evaluating the error dependence on the number of cameras

In this section, we perform a simulation in which the number of cameras changes. We use from 2 to 24 cameras. The noise added to the input phase angle is 0.05 (rad). Figure 18 shows the relationship between the number of cameras and the estimation error (rad). The red line is our result, which uses from 2 to 24 cameras, and the blue line is the space carving result, which uses 24 cameras. The error decreases if the number of cameras is increased. If we use more than seven cameras, we can obtain better results than those of the space carving method, which is obtained using 24 cameras. Section 3.1.2 indicates that our results are sensitive to noise; however, Sec. 3.1.3 indicates that our results will improve if we increase the number of cameras used.

## 3.2.

### Experiments in Real Situations

## 3.2.1.

#### Experimental setup

The object is illuminated using a lighting dome, which produces unpolarized light, as shown in Figs. 19 and 20. The object is set in the middle of the dome and is rotated using the turntable. The dome is illuminated by a combination of spotlights, fluorescent roof lights, and a white wall. We use the polarization imaging camera shown in Fig. 21, which can measure the polarization state of the incoming light in real time and in 8-bit monochrome with $1120\times 868$ (px) resolution.

The lighting dome we have used in this experiment is not a hard acrylic but is a soft polyester cloth. Although the slight polarization at the wrinkles of the cloth is almost ignorable, we should avoid using a soft material for lighting dome if possible. Our future setup would be consisted of a spherical diffuser as is also used by Nayar et al.^{41} or Miyazaki et al.^{15} On the other hand, another direction of this research project might be to use natural lighting, such as a cloudy outdoor illumination.^{6} This approach is also interesting and can be considered to be one choice of the future direction of this research project.

## 3.2.2.

#### Results for a black plastic sphere

We use a black plastic sphere, which has high specularity, as the target object, as shown in Fig. 22. The diameter of the sphere is 40 (mm). The shading images of the shape obtained by space carving are shown in Fig. 23. The length of each side of the voxel space is 400. Owing to the sparse camera arrangement, space carving cannot represent the smooth surface of the sphere. The phase angle obtained by the polarization camera is shown in Fig. 24. Figure 24 indicates that the phase angle rotated by 90 deg clearly represents the orientation of the surface normal of the sphere. The center area of the sphere has unreliable phase angle (Fig. 24) since DOP is low for that area where the zenith angle is close to zero (cf., Fig. 6). Since the surface normal of those area does not head toward the camera for other different views, integrating the information of multiple views overcomes this problem. Due to the satisfactory input data, the smooth surface normal of the sphere is clearly estimated using our algorithm, where the output shape is represented as shading images in Fig. 25 and is presented as needle map in Fig. 26. Some estimation errors can be found at the bottom part of the sphere. These errors are caused by the insufficient illumination of the bottom part resulting from the pedestal for the target object. This result indicates that our method can obtain successful results for smooth black objects.

## 3.2.3.

#### Results for a black plastic rabbit

In this section, we estimate the surface normal of a much more complex object, as shown in Fig. 27. The target object, shaped like a rabbit, was created by a 3-D printer from 3-D polygon data provided by Turk and Levoy.^{42} The target object is made from black plastic, which causes high specularity. In order to show how black the object is, we show the depth estimation result using an active scanner Kinect v1 manufactured by Microsoft Corporation. The second object from the left in Fig. 28(a) is the color image of the black plastic rabbit captured by Kinect sensor. The infrared image shown in Fig. 28(b) shows that the target object is not only black in visible light wavelength but also black in near-infrared wavelength. The depth of the target object has large amount of defects due to its blackness, as is shown in Fig. 28(c).

The target object is observed from 24 directions. The phase angle is obtained using a polarization imaging camera (Fig. 29). Figure 30 represents the true data rendered from the 3-D polygon data. The space carving result is shown in Fig. 31, as a shading image. The length of each side of the voxel space is 400. Figure 31 indicates that space carving methods can estimate only a square-like, nonsmooth shape unless a sufficient number of cameras is supplied. The shading image of the shape estimated using our method is shown in Fig. 32, as well as the needle map in Fig. 33. The smooth curved surface and the detailed structure of the bulging muscles of the object surface are estimated well. On the other hand, the complex structure of the ear is not recovered clearly. The phase angles of multiple viewpoints must be analyzed at identical surface points; however, the corresponding point for multiple viewpoints is not correctly computed for the space carving results, which show low quality due to the sharp changes in the curvature. In addition to the error at the ear, the foot and neck of the rabbit were also not well estimated by our method. These parts are not well illuminated because the light is occluded by other parts of the object itself.

## 3.2.4.

#### Results for a colored porcelain fish

This section examines the performance of our method when applied to nonblack objects. The target object is a red porcelain fish (Fig. 34). The object is observed from 24 directions. The shading image of the shape obtained by space carving is shown in Fig. 35, and the phase angle obtained is shown in Fig. 36. Here, the length of each side of the voxel space is 400. The shading image calculated from the estimated surface normal is shown in Fig. 37, as well as the needle map shown in Fig. 38. The smooth curved surface and the bump of the yellow pattern of the actual object are reproduced as intended. However, the upper part of the top of the object has a defect due to the strong specular reflection.

We also show a result when the number of viewpoints is small. The shape of space carving we used here is that calculated from 24 viewpoints. Using this 3-D geometry, we calculated the surface normal from the phase angle obtained from 24 viewpoints [Fig. 39(a)], 12 viewpoints [Fig. 39(b)], 6 viewpoints [Fig. 39(c)], and 3 viewpoints [Fig. 39(d)]. If a surface point does not have information of two or more phase angles under different viewpoints, the surface normal of the point cannot be calculated. Due to this reason, as is shown in Fig. 39(d), some part of the object surface has a surface normal, which is the same as the surface normal of space carving.

## 3.2.5.

#### Limitation: results for a diffuse paper mache bird

In this section, we apply our method to an object that has only a diffuse reflection. The inner structure of the paper mache shown in Fig. 40 is made of wood, and the paper is pasted on its surface. The object is observed from 24 directions. The shading image rendered using the shape obtained by space carving is shown in Fig. 41. The length of each side of the voxel space is 400. The phase angle of this object is shown in Fig. 42. The shading result calculated from the estimated surface normal is shown in Fig. 43, as well as the needle map (Fig. 44). Apparently, Figs. 43 and 44 are erroneous results, which is far from the true shape. The reason for the erroneous shape (Figs. 43 and 44) is that the performance of our method is strongly affected by the input phase angle, which is inconsistent with the true shape for this experiment, as is shown in Fig. 42.

This erroneous result is due to the rough surface, which results in low DOP. Our algorithm can also be applied to the objects which have diffuse reflection only, if the object surface is smooth. The phase angle of diffuse reflection is 90 deg rotated from the phase angle of specular reflection; thus, we can apply our method by rotating the phase angle in our software. Atkinson and Hancock^{21} estimated the surface normal of white smooth porcelain by analyzing the polarization state of the diffusely reflected light. We skip to test our method to smooth diffuse objects since this is out of scope of the paper. If the object is not smooth, as is shown in this section, our method and other existing methods including Atkinson and Hancock^{21} cannot estimate the surface normal.

As can be easily conjectured from the characteristics of our theory, it is not surprising that our method could not estimate the shape of an object that has only a diffuse reflection. Although this problem is a disadvantage of the proposed method, we are not pessimistic about it. Various types of conventional technique including photometric stereo and laser range sensors can estimate the shape of an object that causes only diffuse reflection; thus, we consider that shape estimation of diffuse-only objects is beyond the scope of this paper.

## 4.

## Conclusion

We propose a shape estimation method from polarization images obtained from multiple viewpoints. We have elaborated on fully integrating the advantages of the space carving and shape-from-polarization methods. The proposed method computes the surface normal using SVD to minimize the least-squared error. It can estimate the shapes of optically smooth objects, such as plastic and ceramic objects as well as those of black and colored objects with high specularity.

The experiments show that our method can estimate the surface normals of optically smooth objects with high specularity. This property demonstrates the advantage of the proposed approach compared with the photometric stereo method, because the conventional photometric stereo method can estimate the surface normal of diffuse-only objects.

The final result of our method is a 3-D geometrical surface obtained using the space carving method, with the surface normal mapped onto the surface. Although the final rendered image represents a shape similar to ground truth, the geometrical coordinates of the surface points are still the same as those for the space carving results. Therefore, we must deform the 3-D geometrical surface to ensure that the surface normal of the 3-D geometrical surface coincides with the obtained surface normal. In addition, we must recalculate the surface normal using the corresponding points calculated from the updated 3-D shape, because the corresponding points of the updated 3-D shape are more precise than those of the 3-D shape obtained by space carving. Our future work is to iteratively compute the above process.

In our current measurement system, we have used one camera and have observed the objects from multiple views. Our future plan is to use multiple cameras so that the target object can be captured with multiple cameras at the same time. Such one-shot scan enables high-speed capturing of the target objects, resulting in various fields of applications especially in industrial area. For example, it is possible to inspect the industrial products running on a conveyer belt using such system. In order to broaden the application field of our measurement system, developing such multiple camera system is one of our future goals of this research project.

## Appendices

## Appendix:

### Proof of Rank Two

The rank of the matrix $\mathbf{A}$ defined in Eq. (14) is at most 2. In this appendix, we present a mathematical proof of this fact.

A unit vector $\mathbf{n}$ is defined as the surface normal of an object. From the polarization data, or at a particular phase angle, the orientation of the reflection plane is known. From the definition of the reflection plane, it includes the surface normal. Therefore, the normal vector of the reflection plane is always orthogonal to the surface normal. This constraint is shown in Fig. 45 using a Gaussian sphere representation.

The constraint matrix $\mathbf{A}$ is a list of the normal vectors of reflection planes. Since the normal vectors of reflection planes lie on a coplanar plane, as shown in Fig. 45, it is apparent that the rank of $\mathbf{A}$ is at most 2.

The size of the constraint matrix is $K\times 3$; thus, its rank never exceeds 3. In this section, we express the constraint matrix $\mathbf{A}$ as a $3\times 3$ matrix without loss of generality when proving the rank of this matrix. Assume that $\mathbf{A}$ is full rank, namely, rank 3. Since $\mathbf{A}$ is a regular matrix, its inverse exists. Therefore, $\mathbf{An}=\mathbf{0}$ is solved as $\mathbf{n}={\mathbf{A}}^{-1}\mathbf{0}=\mathbf{0}$. However, since the surface normal $\mathbf{n}$ is defined as a unit vector, it is a nonzero vector. This contradiction proves that the rank of the constraint matrix $\mathbf{A}$ never becomes 3.

Next, we discuss the particular case in which the surface normal is $\mathbf{n}=(0,0,1)$ and the normal vectors of reflection planes are (1, 0, 0), (0, 1, 0), and ($-1$, 0, 0). In this example, $\mathbf{An}=\mathbf{0}$ becomes the following equation:

## (17)

$$\left(\begin{array}{ccc}1& 0& 0\\ 0& 1& 0\\ -1& 0& 0\end{array}\right)\left(\begin{array}{c}0\\ 0\\ 1\end{array}\right)=\left(\begin{array}{c}0\\ 0\\ 0\end{array}\right).$$Multiplying the matrix $\mathbf{A}$ on the left by the regular matrix shown below gives the matrix shown in the right-hand side of the following equation:

## (18)

$$\left(\begin{array}{ccc}1& 0& 0\\ 0& 1& 0\\ 1& 0& 1\end{array}\right)\left(\begin{array}{ccc}1& 0& 0\\ 0& 1& 0\\ -1& 0& 0\end{array}\right)=\left(\begin{array}{ccc}1& 0& 0\\ 0& 1& 0\\ 0& 0& 0\end{array}\right).$$Thus, the rank of the constraint matrix $\mathbf{A}$ becomes 2 in this particular example, proving that there exists at least one case in which the rank of the constraint matrix $\mathbf{A}$ becomes 2.

Consequently, this section has proved that the rank of the constraint matrix $\mathbf{A}$ is at most 2. The degenerate case in which its rank becomes 1 (Fig. 10) is discussed in Sec. 2.3 (Fig. 11).

## Acknowledgments

This research was supported in part by the Microsoft Research Asia eHeritage Program under the project, “Polarization stereo for modeling the small-scale structure of cultural assets,” in part by the Konica Minolta Imaging Science Encouragement Award (Outstanding) from the Konica Minolta Science and Technology Foundation, Japan, in part by the Grant-in-Aid for Young Scientists from the Japan Society for the Promotion of Science under the project no. 24700176 from JSPS, Japan, and in part by the Grant-in-Aid for Scientific Research on Innovative Areas under the project, “Shitsukan” (No. 15H05925), from MEXT, Japan. Gratitude is also extended to anonymous reviewers for their careful reviews of the paper. A short version of this paper is previously published at Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization & Transmission (3DIMPVT’12).

## References

## Biography

**Daisuke Miyazaki** received his MS and PhD degrees in information science and technology from the University of Tokyo in 2002 and 2005, respectively. Currently, he is an associate professor at Hiroshima City University. His research interests include physics-based vision. He is a member of ACM and IEEE.

**Takuya Shigetomi** received his MS degree from the Hiroshima City University in 2011. Currently, he is employed in Nomura Research Institute. His research interests include polarization-based shape estimation.

**Masashi Baba** received his MS and PhD degrees in engineering from Hiroshima University in 1992 and 2004, respectively. Currently, he is a lecturer at Hiroshima City University. His research interests include computer graphics and computer vision, especially reflection models, camera models, and image synthesis algorithms. He is a member of IPSJ, IEICE, and VRSJ.

**Ryo Furukawa** received his MS and PhD degrees in engineering from Nara Institute of Science and Technology in 1993 and 1995, respectively. Currently, he is an associate professor at Hiroshima City University. His research interests include shape-capturing, 3-D modeling, appearance sampling, and image-based rendering. He is a member of IPSJ, IEICE, and IEEE.

**Shinsaku Hiura** received his MS and PhD degrees in engineering from Osaka University in 1995 and 1997, respectively. Currently, he is a professor at Hiroshima City University. His research interests include computer vision, 3-D image analysis, and computational photography. He is a member of VRSJ, IPSJ, and IEICE.

**Naoki Asada** received his MS degree and PhD degree in engineering from Kyoto University in 1981 and 1987, respectively. Currently, he is a professor and vice president at the University of Hyogo. His research interests include computer vision, computer graphics, medical image understanding, and document image understanding. He is a member of IEICE, IPSJ, and JSAI.