Surface reconstruction from multiview projection data employing a microlens array-based optical detector: a simulation study

Abstract. This article proposes a surface reconstruction method from multiview projectional data acquired by means of a rotationally mounted microlens array-based light detector (MLA-D). The technique is adapted for in vivo small animal imaging, specifically for imaging of nude mice, and does not require an additional imaging step (e.g., by means of a secondary structural modality) or additional hardware (e.g., laser-scanning approaches). Any potential point within the field of view (FOV) is evaluated by a proposed photo-consistency measure, utilizing sensor image light information as provided by elemental images (EIs). As the superposition of adjacent EIs yields depth information for any point within the FOV, the three-dimensional surface of the imaged object is estimated by a graph cuts-based method through global energy minimization. The proposed surface reconstruction is evaluated on simulated MLA-D data, incorporating a reconstructed mouse data volume as acquired by x-ray computed tomography. Compared with a previously presented back projection-based surface reconstruction method, the proposed technique yields a significantly lower error rate. Moreover, while the back projection-based method may not be able to resolve concave surfaces, this approach does. Our results further indicate that the proposed method achieves high accuracy at a low number of projections.

Surface reconstruction from multiview projection data employing a microlens array-based optical detector: a simulation study 1 Introduction Recently, a noncontact light detector utilizing a purposely designed microlens array (MLA-D) has been developed. 1 The detector consisted of a MLA, a septum mask, and a photon sensor [cf. Fig. 1(a)]. In contrast to conventional lens-based imaging systems, an MLA-D possesses a thin construction size (<10 mm including cooling). Since its imaging performance is optimal for objects positioned close to its optics, a single or a multitude of MLA-Ds can be arranged circumferentially in close proximity to the imaged object, hence allowing for confined system assembly. Furthermore, the whole optical imaging system can potentially be encircled by another imaging system such as positron emission tomography or magnetic resonance imaging (MRI), in which MLA-Ds are compatible or nearly "invisible." 2 An MLA-D does not form an immediate observer image. Corresponding to each microlens, a local lattice of sensor pixels forms a so-called elemental image (EI), as shown in Fig. 1(b). Therefore, an image needs to be calculated by inverse mapping 3 or an iterative algorithm, as depicted in Ref. 4. For noncontact optical in vivo imaging, the procedure for localizing emitted photons within the imaged objectwith the purpose to further solve the inverse problem involved in image reconstruction 5,6 -is crucially affected by the exact knowledge of the three-dimensional (3-D) surface of the imaged object. As of today, surface information is mainly derived from secondary data as provided by (1) imaging the object with another modality such as x-ray computed tomography (CT) 5 or MRI 7 or (2) conducting a structured light 3-D scanning procedure. 8 Alternatively, direct optical methods such as back-projection-based (silhouette or visual hull) based approaches have also been proposed. [9][10][11] While the first class of methods requires additional hardware and scanning steps, the second class is generally considered less accurate, as concave areas cannot always be resolved.
As both classes of approaches operate independently from the light camera being used, they have been proposed in the literature (a survey is given in Ref. 12) as either conventional or multilens-based camera approaches such as the MLA-D employed here. 13,14 For the latter systems though, it has been shown recently that 3-D imaging, including surface reconstruction, can be achieved by so-called integral imaging techniques. 15,16 Using MLAs to record and recover light fields has become a recent research highlight in the fields of optics and computer vision. [17][18][19] Shape reconstruction or depth retrieval using MLAs can be performed from a single projection by finding corresponding points among neighboring EIs or adjacent focal point images. 15,20 Due to the limited number of microlenses and/or light trajectories, the reconstructed surface possesses a rather low-depth resolution and is insufficient for the downstream task of image reconstruction from optical data (tomography).
Considering that the MLA-D is either part of a multidetector imaging setup or is positioned or rotated around the imaged object, the aforementioned problem could be improved upon by combining multiple projections from multiview data (at unchanged illumination condition), which has been extensively studied in computer vision. [21][22][23] However, the direct application of these algorithms in the MLA-D system is difficult for two main reasons. First, global illumination either by environmental light or by fixed distributed light sources is difficult to achieve due to the enclosed design. 1 Second, the EIs of the MLA-Ds possess a comparatively low resolution so the visibility computation is difficult to perform.
Therefore, an integrated instrumentation setup for in vivo small animal (mice) imaging is presented by means of a simulation study employing MLA-Ds of various sizes as well as by a new light source setup for object illumination. Furthermore, a dedicated surface reconstruction algorithm is presented for this setup, which is based on a newly defined photo-consistency measure (the concept of photo-consistency measure can be referred in Ref. 22) and volumetric graph cuts. It is particularly tailored to the comparatively large number of EIs in our design, albeit each EI possesses a rather low-spatial resolution.

MLA-D
As depicted in Fig. 2, the binary sensor data of an MLA-D represents a two-dimensional (2-D) matrix B ¼ ½b m;n ðm ¼ 1; · · · M; n ¼ 1; · · · NÞ with square pixel size p, where M and N are the number of pixels in x and y directions on the photosensor plane, respectively. The arrangement of sensoraligned G × H microlenses with lens pitch d l is represented by L ¼ ½l g;h ðg ¼ 1; · · · G; h ¼ 1; · · · HÞ. An elemental image E is formed with local pixels corresponding to microlens l g;h , represented as E ¼ ½e u;v ðu ¼ 1; · · · U; v ¼ 1; · · · VÞ, where U and V are the numbers of pixels in 2-D for each EI in x and y directions, respectively.
Currently, two types of MLA-D setups have been proposed, 1,14 for which the parameters are summarized in Table 1, respectively. The first design has been assembled and is being used in practice, whereas the second is still under construction.

System Simulation
The simulation is performed within the physically based rendering (PBRT) framework, which includes interfaces for detector, lighting, and scenario (imaged object) descriptions. 24 The configuration of the MLA-Ds is defined by the parameters in Table 1. As a result of the previously mentioned illumination source confinement, a local illumination setup as illustrated in Fig. 3 is selected. As can be seen, adjacent MLA-Ds share the same illumination. The whole assembly is also rotatable around the long axis of the imaged object to acquire the data at A angular views. At each projectional view, two sensor data, B 1 a and B 2 a ða ¼ 1; : : : ; AÞ, are obtained from the detectors.
In order to define a realistic phantom representing an imaged object, a triangulated mesh extracted from a nude mouse CT dataset 25 is adopted as input for the simulation,   as shown in Fig. 4. The actual (identical) fields of view (FOVs) of the simulated MLA-Ds are marked by the red box. Once the surface data have been derived, its reflectance is set to be ideally diffusive (Lambertian surface). 26,27 Considering the size of normal mice (approximately 25 mm in diameter) as well as the depth of field for the individual microlenses, the radius of rotation is set to be 40 mm. The oblique angle between the two detectors is set to be 40 deg in order to maintain the condition that any point on the surface of the imaged object is being seen on either detector (photoconsistency). Further details about the simulation can be found in Ref. 14. The simulation is carried out by employing a high-performance computer cluster with 128 CPU cores (Intel E5450, 3.0 GHz, 16 GB memory for each core) for parallel implementation of different microlens units.

Surface Reconstruction
Centrally important to the surface reconstruction procedure, a photo-consistency measure is proposed by comparing the formed EIs following the system shown in Fig. 3. By calculating and super-positioning all photo-consistency measures, a photo-consistency volume of the imaged object is obtained. The problem of surface reconstruction is applied onto the formed volume, concretely to decide whether any voxel belongs to the imaged object, also known as labeling problem. 28 The labeling problem can be expressed as an energy minimization form and is further solved by the graph cutsbased method. 29

Photo-consistency measure
To solve the problem of surface reconstruction from multiview stereo in computer vision, a number of approaches have been proposed for evaluating the visual compatibility of a reconstruction with a collection of input images. 30 Most of these measures (often entitled photo-consistency measures) operate by comparing pixels in one image to pixels in other images to see how well they correlate. 21 To define a suitable measure for the surface reconstruction employing the MLA-D configuration, notations are first introduced.
In the process of image formation, any point Pðx; y; zÞ on the surface of the imaged object, which is within the FOV of either MLA-D, can potentially be detected by a number of microlenses. A ray originating from Pðx; y; zÞ that forms a trajectory forward an EI is called a valid ray. A set of EIs containing all valid raysT of a given Pðx; y; zÞ is referred to asÊ, as shown in Fig. 5. The intensities as collected by the sensor of the formed trajectoriesT tend to be similar since the corresponding sensor pixels inÊ measure the same point P in space, only from slightly different angles. Instead of comparing the absolute differences among the collected trajectories directly, a more robust method is to compare the similarities between two pixel windows since area matching (pixel window) compares regularly sized regions of pixel values in two images and, hence, is well suited for a texture scene. 31 For example, two arbitrary trajectories T r and T p fromT can be extended to form corresponding pixel windows W p and W r , respectively (for setup 1, a 3 × 3 pixel window is used, while for setup 2, a 7 × 7 pixel window is used). The collection of pixel windows is represented asŴ. Following common strategies assessing similarities between data formed by two pixel windows, the normalized cross-correlation (NCC) is employed in this article as a similarity measure. It is defined as 31 (1) whereW p andW r represent the average value of two corresponding pixel windows, respectively. This function yields return values between −1 and 1. A perfect match would reach the maximum of 1. In our context, a high correlation is defined (NCC > 0.7), which is used in the following part. Employing the NCC function as described decreases the effect of noise in the data due to, among others, the inhomogeneous responses of sensor pixels. In surface reconstruction procedure, a volume model is adopted to represent the surface for its simplicity and uniformity. 21 A reasonable simplification can be made, though. Any voxel V i;j;k (the concept "voxel" here refers to a point in a spatial grid) can form trajectories onto the photon sensor plane assuming that there would be no self-occlusion caused by other voxels. 22 Using the proposed design as illustrated in Fig. 3, a voxel V i;j;k can be seen through Q 1 and Q 2 microlenses in the two MLA-Ds simultaneously. Hence,Ê,T, as well asŴ are formed by two subsets:Ê 1 andÊ 2 ;T 1 andT 2 ; andŴ 1 andŴ 2 , as shown in Fig. 6.
Because of the specific design and configuration of MLA-Ds, the problem of defining a photo-consistency measure is extended from pair comparison between pixel windows, as seen in Eq. (1), to formalize a new measure over two subsets of EIs (Ê 1 andÊ 2 ) and the corresponding pixel windows (Ŵ 1 andŴ 2 ). A simple method is to choose one reference pixel window W r within the elemental image E r fromŴ and to compare it with the other formed pixel windows from the same subsetŴ, one by one. For an arbitrary V i;j;k , it might occur that some of the pixel windows possess a high degree of correlation with W r , while some other pixel windows possess a low degree of correlation, as seen in Fig. 6(a). Although the direct average of all NCC values could be applied, the resolvability among adjacent voxels might be hampered. 22 Therefore, an alternative method is used. The assumption is made that there exists an optic ray between V i;j;k and the optical center C of a chosen microlens, as depicted in Fig. 6(b). Correspondingly, the formed pixel window of this microlens is chosen as reference pixel window W r . As depicted in Fig. 6(b), an arbitrary point RðdÞ on the supposed optic ray defined by V i;j;k and C could be expressed according to where d is a variable to control the position of RðdÞ. RðdÞ would form the same pixel window W r within the corresponding E r as d changes, as shown in Fig. 6(b). However, in any other EI, e.g., a fixed E f fromÊ that can detect V i;j;k , RðdÞ would form different pixel windows as compared with the pixel window formed by V i;j;k within E f . The newly formed pixel window series by RðdÞ is represented asŴ f as d changes. When comparing W r with the formed setŴ f , a maximum NCC value can be found with respect to d. If V i;j;k is on the surface (or very close to the surface of the object), the degree of correlation becomes a maximum at d ¼ 0, as seen in Fig. 7. Contrarily, if V i;j;k is not on the surface, the degree of correlation is lower at d ¼ 0 or becomes a maximum at d ≠ 0, as shown in Fig. 6(b). In other words, the maximum value of NCC at d ¼ 0 can be used as a strong indicator for determining whether V i;j;k is a point on the surface or not.
Having discussed the case of mapping V i;j;k with one microlens, we extend this principle to combine V i;j;k with multiple microlenses. Supposing N a microlenses are chosen in the a-th projection, the number of occurrences, M a , in which a maximum NCC value is derived at d ¼ 0, would be counted. The ratio M a ∕N a can be used to describe the degree whether V i;j;k is on the surface concerning the normalization effect.
In order to reduce the computational complexity, the following strategies are employed. First, W r andŴ f are chosen from two different detectors. Reasoning is due to the fact that the MLA-Ds configuration is symmetric, as seen in Fig. 3. If there is a case that N a microlenses are chosen from the first MLA-D detector while M a occurrences with maximum NCC are obtained on the second MLA-D, equivalently, then it could occur that N 0 a microlenses are chosen from the second MLA-D detector while M 0 a occurrences are obtained on the first MLA-D. Second, only EIs within limited s rows (Ê 1 ) are chosen for W r , and t rows of EIs (Ê 2 ) are chosen forŴ f (s and t are set as 3, in this article). The photo-consistency measure φðV i;j;k Þ is defined as in which μ is a rate-of-decay parameter (set to be 0.5, in this article), and V A is the number of effective projectional views (M a ∕N a > 0.001 and M 0 a ∕N 0 a > 0.001). The details about the implementation could be found in Algorithm 1.

Volumetric surface reconstruction based on graph cuts
By calculating the proposed photo-consistency for each voxel employing Eq. (3), a volume of the imaged object is derived. The surface reconstruction task in this context is to assign a label to each voxel of the image object, i.e., a voxel either belongs to the background or to the imaged object. In accordance with labeling a voxel, an energy cost is paid. The objective is to find a discrete labeling of voxels with minimum systematic total energy, known as labeling problem. 28 In our context of a binary labeling problem (only two labels for the background and the imaged object), two energy parts are included. One part of cost concerns the label itself (data cost), i.e., the volume of imaged object in this context. This part of total cost can be seen as a volume integration of the imaged object. The other part of cost Fig. 6 An arbitrary point V i;j;k forms pixel windows on MLA-Ds represented asŴ (Ŵ 1 andŴ 2 on two detectors, respectively). One pixel window W r is chosen as reference window (correspondingly, the EI formed is represented as E r ). Some pixel windows inŴ possess high correlations with W r , while others do not (a). Supposing that there exists an optic ray (dashed line marked in red) between V i;j;k and the optical center C of the corresponding microlens forming E r , any point RðdÞ on the optic ray would form the same pixel window W r within E r . When Rðd Þ is on the surface, the formed pixel windows inŴ have high correlations with W r , as seen in concerns the regularity of labels among neighboring voxels (regularity cost). When two neighboring voxels have the same label, the regularity cost is zero, whereas two neighboring voxels with the different labels yield nonzero regularity cost values. The regularity cost in this case is set to be the value of the photo-consistency measure calculated by Eq. (3). Because regularity cost is valid across the boundary between the imaged object and its background, the total cost of this part can be seen as a surface integration of the photoconsistency measure. The surface reconstruction problem is transformed to extract an optimal surface S opt , over which the surface integral of the photo-consistency measure φðVÞ is minimized as well as the volume VðSÞ enclosed by S opt is maximized. This can be expressed as minimization of the following equation: 22 in which λ is a parameter to control the weight of volume integration (λ is set to be within 0.05 and 0.25, in this article). In this context, the first term in Eq. (4) is comprehended to be a collapsing force, whereas the second term of volume integration is comprehended to be an expansion force. The optimization problem in Eq. (4) is solved by the graph-cuts-based method using the implementation by Boykov et al. with expansion moves and swap moves. 29 Each voxel is treated as a node in a graph. Given the initial estimation of the imaged object as shown in Fig. 8, i.e., voxels inside of surface S in , the total cost would decrease as the nodes expand. This expansion would stop at the surface of the imaged object, i.e., the balance between the collapsing and ballooning forces (provided that any voxel on the surface possesses a much lower photo-consistency measure, and a good choice of λ as indicated above). The convergence and efficiency of this approach has been validated in previous studies. 32

Comparison with Back-Projection Method
For comparison, we also applied a previously presented back-projection method 14 to reconstruct the object's surface using data in one detector from multiview data. The reconstructed surfaces from 6, 12, 18, 24, 30, and 36 views, respectively, are evaluated with respect to the two Algorithm 1 Photo-consistency measure calculation.

Input:
Begin: ChooseÊ 1 within s rows of EI's from first detector and corresponding lens center information ChooseÊ 2 within t rows of EI's from second detector Begin Pixel windows-comparison: Reference pixel windows set:Ŵ r ¼ ½W l r ðl ¼ 1 · · · N a Þ Optical center of N a microlenses: Initialize M a ðV i;j;k Þ ¼ 0 for each E f inÊ  Fig. 8 Illustration of how the optimal surface is obtained between the estimated initial outer boundary S out and the inner boundary S in . Given the initial estimation of the imaged object, the nodes expand until they reach the surface. A minimal systematic cost is achieved when voxels on the surface possess a lower photo-consistency measure value.
methods, and an error rate η ¼ jV phantom L V recon j∕jV phantom j is derived, where V phantom represents the binary volume of the phantom, and V recon is the binary volume after surface reconstruction.
L means exclusive or operation between two binary volumes, and j · j is a l 1 -norm operation of binary data.

Results
PBRT simulations have been carried out for both setups as listed in Table 1 to generate multiview projection data. Two exemplary MLA-D simulations showing a single view according to the configurations are depicted in Fig. 9. The total computational burden, for example to generate 36 views, was about 4 h for setup 1 and 16 h for setup 2.
Volumetric surface reconstruction is performed on a 256 Ã 256 Ã 249 voxel grid with 0.13 mm between two neighboring voxels. Calculated slices of photo-consistency measures according to the 6 and 36 views are compared, as shown in Fig. 10.
For the same slice, the results from back-projection and the proposed method are compared in Fig. 11. The first row   When 36 projectional views are used, setup 1 yields an error rate of 14.18%, while setup 2 results in an error rate of 1.95%. A quantitative evaluation of error rate comparing the back-projection method and the proposed method is performed for setup 2, as shown in Fig. 12. The curves indicate that the proposed method shows a significant improvement of accuracy as compared with the back-projection method.
Results of rendered surfaces for setup 2 are shown in Fig. 13. Considerable errors in shape restoration for results of the back-projection method, such as particularly in the leg areas of the mouse, are clearly evident [cf. Fig. 13(c)]. In contrast, these unresolved structures are being better preserved by using the proposed method, as shown in Fig. 13(b).

Discussion and Conclusion
In the field of in vivo optical imaging, the back-projection method is a frequently used approach for surface reconstruction. This method not only requires a high number of projections (100 to 300 in general) to produce acceptable results, 9,10 but it also possesses inherent shortcomings, mainly regarding its inability to resolve concave areas. Surface reconstruction from multiview projection data using the proposed method has been demonstrated to resolve concave surface areas better. Furthermore, the required number of projections can be significantly reduced when employing MLA-Ds, since MLA-Ds provide additional angular information per detector position. The simulation results indicate that satisfying results can be achieved by 6 to 36 views of projection using the proposed new setup. In contrast to other surface detection approaches used within the research field of in vivo optical imaging, such as using structured light or employing a secondary structural imaging procedure, the approach as proposed in this article does not require any other additional hardware.
Because experimentally acquired data were not available as the final MLA-D construction is still to be completed, results are obtained using the PBRT simulation approach. However, as we run the simulation on highly realistic object data, experimentally acquired by x-ray CT of a nude mouse scan, relying on a simulation study does not degrade its value. By incorporating ray-tracing techniques tailored specifically to the optical layout of the MLA-Ds investigated, the PBRT framework has once more proven to be a stateof-the-art tool for PBRT in computer graphics and, in our context, 3-D in vivo imaging. In addition, we neglected noise in the process of MLA-D image generation mainly due to the fact that the images are acquired under controlled illumination with high signal-to-noise ratio in practice [cf. Fig. 1(b)].
As described in Ref. 22, reference pixel windows were compared with images taken with six closest cameras to calculate the NCC value, which further worked as photoconsistency measure. In contrast to conventional single-lens cameras, MLA-Ds provide multitude EIs covering the same area of interest from (slightly) different views. Hence, a greater number of EIs can be chosen as reference or comparing images and further analyzed to calculate similarity measures. Rather than using the NCC value directly, the proposed measure as described herein combines these inherent characteristics of MLA-Ds by calculating the rate of occasions when the NCC value reaches a maximum at d ¼ 0. Statistically it uses more information from the neighboring views. However, as RðdÞ changes along the virtual optic ray, pairwise comparison of pixel windows from multitude EIs yields very high computational complexity. The calculation of similarity between one pair of pixel windows is independent from the others. Hence, the calculation of photo-consistency could be decomposed into several subtasks and could be further accelerated using parallel computing techniques in the future.  Increasing the number of projectional views does improve the reconstruction results, as seen in Fig. 10; cf. also Ref. 22. The performance of the two setups for 3-D surface reconstruction differs significantly. Because pixel size of setup 2 is about seven times smaller than that of setup 1, setup 2 features a much higher space sampling rate than that of setup 1. Hence, higher-frequency information can be resolved which could potentially make pixel window comparison even more robust. Moreover, the EIs of setup 2 possess higher spatial resolution, such that a bigger pixel window could be used when the similarity measure is calculated. These two advantages are helpful in suppressing falsepositive cases in the presence of noise. 33 In conclusion, this article verifies the feasibility of reconstructing 3-D surfaces from multiview data solely by using MLA-D data within the research field of in vivo small animal imaging. We presented a new imaging instrumentation design with respect to the alignment of MLA-Ds and light sources. Given this conceptional imaging system setup, a corresponding algorithm for 3-D surface reconstruction is presented and investigated. The results indicate that the proposed method does have the ability to resolve concave areas and to achieve a high accuracy, even when using a significantly low number of projection views, without the requirement of any additional hardware.