Recently, a noncontact light detector utilizing a purposely designed microlens array (MLA-D) has been developed.1 The detector consisted of a MLA, a septum mask, and a photon sensor [cf. Fig. 1(a)]. In contrast to conventional lens-based imaging systems, an MLA-D possesses a thin construction size ( including cooling). Since its imaging performance is optimal for objects positioned close to its optics, a single or a multitude of MLA-Ds can be arranged circumferentially in close proximity to the imaged object, hence allowing for confined system assembly. Furthermore, the whole optical imaging system can potentially be encircled by another imaging system such as positron emission tomography or magnetic resonance imaging (MRI), in which MLA-Ds are compatible or nearly “invisible.”2
An MLA-D does not form an immediate observer image. Corresponding to each microlens, a local lattice of sensor pixels forms a so-called elemental image (EI), as shown in Fig. 1(b). Therefore, an image needs to be calculated by inverse mapping3 or an iterative algorithm, as depicted in Ref. 4. For noncontact optical in vivo imaging, the procedure for localizing emitted photons within the imaged object—with the purpose to further solve the inverse problem involved in image reconstruction5,6—is crucially affected by the exact knowledge of the three-dimensional (3-D) surface of the imaged object. As of today, surface information is mainly derived from secondary data as provided by (1) imaging the object with another modality such as x-ray computed tomography (CT)5 or MRI7 or (2) conducting a structured light 3-D scanning procedure.8 Alternatively, direct optical methods such as back-projection-based (silhouette or visual hull) based approaches have also been proposed.9–11 While the first class of methods requires additional hardware and scanning steps, the second class is generally considered less accurate, as concave areas cannot always be resolved.
As both classes of approaches operate independently from the light camera being used, they have been proposed in the literature (a survey is given in Ref. 12) as either conventional or multilens-based camera approaches such as the MLA-D employed here.13,14 For the latter systems though, it has been shown recently that 3-D imaging, including surface reconstruction, can be achieved by so-called integral imaging techniques.15,16
Using MLAs to record and recover light fields has become a recent research highlight in the fields of optics and computer vision.17–19 Shape reconstruction or depth retrieval using MLAs can be performed from a single projection by finding corresponding points among neighboring EIs or adjacent focal point images.15,20 Due to the limited number of microlenses and/or light trajectories, the reconstructed surface possesses a rather low-depth resolution and is insufficient for the downstream task of image reconstruction from optical data (tomography).
Considering that the MLA-D is either part of a multidetector imaging setup or is positioned or rotated around the imaged object, the aforementioned problem could be improved upon by combining multiple projections from multiview data (at unchanged illumination condition), which has been extensively studied in computer vision.21–23 However, the direct application of these algorithms in the MLA-D system is difficult for two main reasons. First, global illumination either by environmental light or by fixed distributed light sources is difficult to achieve due to the enclosed design.1 Second, the EIs of the MLA-Ds possess a comparatively low resolution so the visibility computation is difficult to perform.
Therefore, an integrated instrumentation setup for in vivo small animal (mice) imaging is presented by means of a simulation study employing MLA-Ds of various sizes as well as by a new light source setup for object illumination. Furthermore, a dedicated surface reconstruction algorithm is presented for this setup, which is based on a newly defined photo-consistency measure (the concept of photo-consistency measure can be referred in Ref. 22) and volumetric graph cuts. It is particularly tailored to the comparatively large number of EIs in our design, albeit each EI possesses a rather low-spatial resolution.
As depicted in Fig. 2, the binary sensor data of an MLA-D represents a two-dimensional (2-D) matrix with square pixel size , where and are the number of pixels in and directions on the photosensor plane, respectively. The arrangement of sensor-aligned microlenses with lens pitch is represented by . An elemental image is formed with local pixels corresponding to microlens , represented as , where and are the numbers of pixels in 2-D for each EI in and directions, respectively.
Currently, two types of MLA-D setups have been proposed,1,14 for which the parameters are summarized in Table 1, respectively. The first design has been assembled and is being used in practice, whereas the second is still under construction.
Microlens array-based light detector (MLA-D) specification for the simulation setups.
|Parameter||Setup 1||Setup 2|
|Focal length, (mm)||2.2||2.4|
|Lens pitch, (μm)||480||520|
|Half-cone angle, (deg)||6.2||6.2|
|Pixel size, (μm)||48||6.5|
|Elemental image (EI) resolution,|
|Detector size ()|
The simulation is performed within the physically based rendering (PBRT) framework, which includes interfaces for detector, lighting, and scenario (imaged object) descriptions.24 The configuration of the MLA-Ds is defined by the parameters in Table 1. As a result of the previously mentioned illumination source confinement, a local illumination setup as illustrated in Fig. 3 is selected. As can be seen, adjacent MLA-Ds share the same illumination. The whole assembly is also rotatable around the long axis of the imaged object to acquire the data at angular views. At each projectional view, two sensor data, and , are obtained from the detectors.
In order to define a realistic phantom representing an imaged object, a triangulated mesh extracted from a nude mouse CT dataset25 is adopted as input for the simulation, as shown in Fig. 4. The actual (identical) fields of view (FOVs) of the simulated MLA-Ds are marked by the red box. Once the surface data have been derived, its reflectance is set to be ideally diffusive (Lambertian surface).26,27 Considering the size of normal mice (approximately 25 mm in diameter) as well as the depth of field for the individual microlenses, the radius of rotation is set to be 40 mm. The oblique angle between the two detectors is set to be 40 deg in order to maintain the condition that any point on the surface of the imaged object is being seen on either detector (photo-consistency). Further details about the simulation can be found in Ref. 14. The simulation is carried out by employing a high-performance computer cluster with 128 CPU cores (Intel E5450, 3.0 GHz, 16 GB memory for each core) for parallel implementation of different microlens units.
Centrally important to the surface reconstruction procedure, a photo-consistency measure is proposed by comparing the formed EIs following the system shown in Fig. 3. By calculating and super-positioning all photo-consistency measures, a photo-consistency volume of the imaged object is obtained. The problem of surface reconstruction is applied onto the formed volume, concretely to decide whether any voxel belongs to the imaged object, also known as labeling problem.28 The labeling problem can be expressed as an energy minimization form and is further solved by the graph cuts-based method.29
To solve the problem of surface reconstruction from multiview stereo in computer vision, a number of approaches have been proposed for evaluating the visual compatibility of a reconstruction with a collection of input images.30 Most of these measures (often entitled photo-consistency measures) operate by comparing pixels in one image to pixels in other images to see how well they correlate.21 To define a suitable measure for the surface reconstruction employing the MLA-D configuration, notations are first introduced.
In the process of image formation, any point on the surface of the imaged object, which is within the FOV of either MLA-D, can potentially be detected by a number of microlenses. A ray originating from that forms a trajectory forward an EI is called a valid ray. A set of EIs containing all valid rays of a given is referred to as , as shown in Fig. 5. The intensities as collected by the sensor of the formed trajectories tend to be similar since the corresponding sensor pixels in measure the same point in space, only from slightly different angles. Instead of comparing the absolute differences among the collected trajectories directly, a more robust method is to compare the similarities between two pixel windows since area matching (pixel window) compares regularly sized regions of pixel values in two images and, hence, is well suited for a texture scene.31 For example, two arbitrary trajectories and from can be extended to form corresponding pixel windows and , respectively (for setup 1, a window is used, while for setup 2, a window is used). The collection of pixel windows is represented as . Following common strategies assessing similarities between data formed by two pixel windows, the normalized cross-correlation (NCC) is employed in this article as a similarity measure. It is defined as31
In surface reconstruction procedure, a volume model is adopted to represent the surface for its simplicity and uniformity.21 A reasonable simplification can be made, though. Any voxel (the concept “voxel” here refers to a point in a spatial grid) can form trajectories onto the photon sensor plane assuming that there would be no self-occlusion caused by other voxels.22 Using the proposed design as illustrated in Fig. 3, a voxel can be seen through and microlenses in the two MLA-Ds simultaneously. Hence, , , as well as are formed by two subsets: and ; and ; and and , as shown in Fig. 6.
Because of the specific design and configuration of MLA-Ds, the problem of defining a photo-consistency measure is extended from pair comparison between pixel windows, as seen in Eq. (1), to formalize a new measure over two subsets of EIs ( and ) and the corresponding pixel windows ( and ). A simple method is to choose one reference pixel window within the elemental image from and to compare it with the other formed pixel windows from the same subset , one by one. For an arbitrary , it might occur that some of the pixel windows possess a high degree of correlation with , while some other pixel windows possess a low degree of correlation, as seen in Fig. 6(a). Although the direct average of all NCC values could be applied, the resolvability among adjacent voxels might be hampered.22
Therefore, an alternative method is used. The assumption is made that there exists an optic ray between and the optical center of a chosen microlens, as depicted in Fig. 6(b). Correspondingly, the formed pixel window of this microlens is chosen as reference pixel window . As depicted in Fig. 6(b), an arbitrary point on the supposed optic ray defined by and could be expressed according toFig. 6(b).
However, in any other EI, e.g., a fixed from that can detect , would form different pixel windows as compared with the pixel window formed by within . The newly formed pixel window series by is represented as as changes. When comparing with the formed set , a maximum NCC value can be found with respect to . If is on the surface (or very close to the surface of the object), the degree of correlation becomes a maximum at , as seen in Fig. 7. Contrarily, if is not on the surface, the degree of correlation is lower at or becomes a maximum at , as shown in Fig. 6(b). In other words, the maximum value of NCC at can be used as a strong indicator for determining whether is a point on the surface or not.
Having discussed the case of mapping with one microlens, we extend this principle to combine with multiple microlenses. Supposing microlenses are chosen in the -th projection, the number of occurrences, , in which a maximum NCC value is derived at , would be counted. The ratio can be used to describe the degree whether is on the surface concerning the normalization effect.
In order to reduce the computational complexity, the following strategies are employed. First, and are chosen from two different detectors. Reasoning is due to the fact that the MLA-Ds configuration is symmetric, as seen in Fig. 3. If there is a case that microlenses are chosen from the first MLA-D detector while occurrences with maximum NCC are obtained on the second MLA-D, equivalently, then it could occur that microlenses are chosen from the second MLA-D detector while occurrences are obtained on the first MLA-D. Second, only EIs within limited rows () are chosen for , and rows of EIs () are chosen for ( and are set as 3, in this article). The photo-consistency measure is defined as1.
Photo-consistency measure calculation.
|for each , in|
|Choose within rows of EI’s from first detector and corresponding lens center information|
|Choose within rows of EI’s from second detector|
|is visible through microlenses in|
|Reference pixel windows set:|
|Optical center of microlenses:|
|is visible through microlenses in|
|for each in|
|for each in|
|Calculate formed pixel window within by|
|Calculate NCC value between and|
|end loop for|
|if NCC between and reaches maximum at|
|end loop for|
|end loop for|
|end Pixel window-comparison:|
|Choose within rows of EI’s from second detector and corresponding lens center information|
|Choose within rows of EI’s from first detector|
|Do Pixel window-comparison again to obtain and|
|end loop for ,|
Volumetric surface reconstruction based on graph cuts
By calculating the proposed photo-consistency for each voxel employing Eq. (3), a volume of the imaged object is derived. The surface reconstruction task in this context is to assign a label to each voxel of the image object, i.e., a voxel either belongs to the background or to the imaged object. In accordance with labeling a voxel, an energy cost is paid. The objective is to find a discrete labeling of voxels with minimum systematic total energy, known as labeling problem.28 In our context of a binary labeling problem (only two labels for the background and the imaged object), two energy parts are included. One part of cost concerns the label itself (data cost), i.e., the volume of imaged object in this context. This part of total cost can be seen as a volume integration of the imaged object. The other part of cost concerns the regularity of labels among neighboring voxels (regularity cost). When two neighboring voxels have the same label, the regularity cost is zero, whereas two neighboring voxels with the different labels yield nonzero regularity cost values. The regularity cost in this case is set to be the value of the photo-consistency measure calculated by Eq. (3). Because regularity cost is valid across the boundary between the imaged object and its background, the total cost of this part can be seen as a surface integration of the photo-consistency measure. The surface reconstruction problem is transformed to extract an optimal surface , over which the surface integral of the photo-consistency measure is minimized as well as the volume enclosed by is maximized. This can be expressed as minimization of the following equation:224) is comprehended to be a collapsing force, whereas the second term of volume integration is comprehended to be an expansion force.
The optimization problem in Eq. (4) is solved by the graph-cuts-based method using the implementation by Boykov et al. with expansion moves and swap moves.29 Each voxel is treated as a node in a graph. Given the initial estimation of the imaged object as shown in Fig. 8, i.e., voxels inside of surface , the total cost would decrease as the nodes expand. This expansion would stop at the surface of the imaged object, i.e., the balance between the collapsing and ballooning forces (provided that any voxel on the surface possesses a much lower photo-consistency measure, and a good choice of as indicated above). The convergence and efficiency of this approach has been validated in previous studies.32
Comparison with Back-Projection Method
For comparison, we also applied a previously presented back-projection method14 to reconstruct the object’s surface using data in one detector from multiview data. The reconstructed surfaces from 6, 12, 18, 24, 30, and 36 views, respectively, are evaluated with respect to the two methods, and an error rate is derived, where represents the binary volume of the phantom, and is the binary volume after surface reconstruction. means exclusive or operation between two binary volumes, and is a -norm operation of binary data.
PBRT simulations have been carried out for both setups as listed in Table 1 to generate multiview projection data. Two exemplary MLA-D simulations showing a single view according to the configurations are depicted in Fig. 9. The total computational burden, for example to generate 36 views, was about 4 h for setup 1 and 16 h for setup 2.
Volumetric surface reconstruction is performed on a voxel grid with between two neighboring voxels. Calculated slices of photo-consistency measures according to the 6 and 36 views are compared, as shown in Fig. 10.
For the same slice, the results from back-projection and the proposed method are compared in Fig. 11. The first row shows reconstructed slices for setup 1 employing 6 and 36 views of data by the back-projection method [Figs. 11(a) and 11(b), respectively] and by the proposed method [Figs. 11(c) and 11(d)]. The second row of Fig. 11 shows the results using setup 2 following the same order as the first row of figures.
When 36 projectional views are used, setup 1 yields an error rate of 14.18%, while setup 2 results in an error rate of 1.95%. A quantitative evaluation of error rate comparing the back-projection method and the proposed method is performed for setup 2, as shown in Fig. 12. The curves indicate that the proposed method shows a significant improvement of accuracy as compared with the back-projection method.
Results of rendered surfaces for setup 2 are shown in Fig. 13. Considerable errors in shape restoration for results of the back-projection method, such as particularly in the leg areas of the mouse, are clearly evident [cf. Fig. 13(c)]. In contrast, these unresolved structures are being better preserved by using the proposed method, as shown in Fig. 13(b).
Discussion and Conclusion
In the field of in vivo optical imaging, the back-projection method is a frequently used approach for surface reconstruction. This method not only requires a high number of projections (100 to 300 in general) to produce acceptable results,9,10 but it also possesses inherent shortcomings, mainly regarding its inability to resolve concave areas. Surface reconstruction from multiview projection data using the proposed method has been demonstrated to resolve concave surface areas better. Furthermore, the required number of projections can be significantly reduced when employing MLA-Ds, since MLA-Ds provide additional angular information per detector position. The simulation results indicate that satisfying results can be achieved by 6 to 36 views of projection using the proposed new setup. In contrast to other surface detection approaches used within the research field of in vivo optical imaging, such as using structured light or employing a secondary structural imaging procedure, the approach as proposed in this article does not require any other additional hardware.
Because experimentally acquired data were not available as the final MLA-D construction is still to be completed, results are obtained using the PBRT simulation approach. However, as we run the simulation on highly realistic object data, experimentally acquired by x-ray CT of a nude mouse scan, relying on a simulation study does not degrade its value. By incorporating ray-tracing techniques tailored specifically to the optical layout of the MLA-Ds investigated, the PBRT framework has once more proven to be a state-of-the-art tool for PBRT in computer graphics and, in our context, 3-D in vivo imaging. In addition, we neglected noise in the process of MLA-D image generation mainly due to the fact that the images are acquired under controlled illumination with high signal-to-noise ratio in practice [cf. Fig. 1(b)].
As described in Ref. 22, reference pixel windows were compared with images taken with six closest cameras to calculate the NCC value, which further worked as photo-consistency measure. In contrast to conventional single-lens cameras, MLA-Ds provide multitude EIs covering the same area of interest from (slightly) different views. Hence, a greater number of EIs can be chosen as reference or comparing images and further analyzed to calculate similarity measures. Rather than using the NCC value directly, the proposed measure as described herein combines these inherent characteristics of MLA-Ds by calculating the rate of occasions when the NCC value reaches a maximum at . Statistically it uses more information from the neighboring views. However, as changes along the virtual optic ray, pairwise comparison of pixel windows from multitude EIs yields very high computational complexity. The calculation of similarity between one pair of pixel windows is independent from the others. Hence, the calculation of photo-consistency could be decomposed into several subtasks and could be further accelerated using parallel computing techniques in the future.
Increasing the number of projectional views does improve the reconstruction results, as seen in Fig. 10; cf. also Ref. 22. The performance of the two setups for 3-D surface reconstruction differs significantly. Because pixel size of setup 2 is about seven times smaller than that of setup 1, setup 2 features a much higher space sampling rate than that of setup 1. Hence, higher-frequency information can be resolved which could potentially make pixel window comparison even more robust. Moreover, the EIs of setup 2 possess higher spatial resolution, such that a bigger pixel window could be used when the similarity measure is calculated. These two advantages are helpful in suppressing false-positive cases in the presence of noise.33
In conclusion, this article verifies the feasibility of reconstructing 3-D surfaces from multiview data solely by using MLA-D data within the research field of in vivo small animal imaging. We presented a new imaging instrumentation design with respect to the alignment of MLA-Ds and light sources. Given this conceptional imaging system setup, a corresponding algorithm for 3-D surface reconstruction is presented and investigated. The results indicate that the proposed method does have the ability to resolve concave areas and to achieve a high accuracy, even when using a significantly low number of projection views, without the requirement of any additional hardware.
The first author gratefully acknowledges the financial support from the China Scholarship Council (CSC).