Long-range depth profiling of camouflaged targets using single-photon detection

Abstract. We investigate the reconstruction of depth and intensity profiles from data acquired using a custom-designed time-of-flight scanning transceiver based on the time-correlated single-photon counting technique. The system had an operational wavelength of 1550 nm and used a Peltier-cooled InGaAs/InP single-photon avalanche diode detector. Measurements were made of human figures, in plain view and obscured by camouflage netting, from a stand-off distance of 230 m in daylight using only submilliwatt average optical powers. These measurements were analyzed using a pixelwise cross correlation approach and compared to analysis using a bespoke algorithm designed for the restoration of multilayered three-dimensional light detection and ranging images. This algorithm is based on the optimization of a convex cost function composed of a data fidelity term and regularization terms, and the results obtained show that it achieves significant improvements in image quality for multidepth scenarios and for reduced acquisition times.

Long-range depth profiling of camouflaged targets using single-photon detection 1 Introduction Light detection and ranging (LiDAR) continues to be the technique of choice in a variety of remote sensing applications. 1 The time-correlated single-photon counting (TCSPC) technique has more recently emerged as a candidate technology for LiDAR, due to its high sensitivity and excellent surface-to-surface resolution. 2 The TCSPC approach has been successfully demonstrated in a number of LiDAR applications, such as long-range depth imaging, [3][4][5] underwater depth imaging, 6,7 and multispectral depth imaging. 8 The TCSPC technique was used to obtain both depth and intensity information for each pixel, building up a three-dimensional (3-D) image of the target scene by using scanning or multiple detector arrays. 9 The use of high-sensitivity single-photon detectors, such as InGaAs/ InP and Si single-photon avalanche diode (SPAD) detectors, [10][11][12] means that low average optical power levels can be used even at long distances, resulting in the potential for low-power eye-safe imaging.
The identification of targets that have been obscured by clutter is a subject of significant relevance for long-range field applications, in particular. Several experiments involving "seeing" behind or through various obscuring media have been previously performed using LiDAR systems. [13][14][15][16][17] References 13 and 14 present examples of 3-D laser radar imaging using a range-gated approach that can provide highresolution gated images using very few laser pulses. This approach requires high energy laser pulses (typically μJ) and does not give a full surface profile of the target, but instead provides a range-gated intensity image. Henriksson et al. 15 presented a scanning TCSPC system that was successful in imaging targets through foliage at a distance of ∼300 m. The slow scan speed of the system meant that in the example demonstrated the acquisition time was 30 min for a 5 deg × 1 deg scene. Also, their algorithm for depth estimation did not take into account spatial correlations between neighboring pixels, resulting in many of the pixels providing no depth information due to a lack of returned photons. Some previous work has been performed on the image processing of targets behind obscuring surfaces and media, for example Refs. 16 and 17 described bespoke image processing algorithms designed for TCSPC data that reconstructed depth and intensity profiles. Wallace et al. 16 presented an algorithm based on a reversible jump Markov chain Monte Carlo technique, which successfully reconstructed the depth profile of an object behind a wooden trellis fence at a stand-off distance of 325 m. While this approach works well and provides good depth information, the algorithm required significant processing times. The algorithm presented by Shin et al. 17 was used to demonstrate highresolution depth estimations for multiple surfaces with a low number of returned photons. The approach had relatively short processing times; however, the data were obtained in a laboratory-based trial at short target distances (around 4 m) and did not demonstrate the effects of high levels of ambient lighting or solar background.
In this paper, we explore the challenges of obtaining highresolution depth images of objects obscured by camouflage netting using low laser powers at stand-off distances of hundreds of meters and in the presence of high ambient light levels. This paper also presents an advanced image processing algorithm specifically designed to reconstruct depth and intensity profiles of objects hidden in clutter or behind obscuring media. The new algorithm exploits spatial correlations in the photon data and was designed to be robust when used in the sparse photon regime under high levels of ambient background light. It considers two main assumptions: (i) the observed target exhibits spatial correlations that can be exploited using a total variation (TV) approach and (ii) a small number of depths are active with respect to the observation range window. We describe the reconstruction of 3-D depth and intensity profiles from data acquired using a custom-designed time-of-flight (ToF) scanning transceiver based on the TCSPC technique. This active imaging system was successfully used to reconstruct depth and intensity profiles of human figures standing outdoors, both in plain view and obscured by camouflage netting, from a stand-off distance of 230 m in bright daylight. The system had an operational wavelength of 1550 nm and the illumination beam exiting the system had an average optical power of just under 1 mW. The wavelength of 1550 nm was selected for its high atmospheric transmission and because the adverse affect of solar background at this wavelength is significantly lower compared to operating at wavelengths below 1 μm. [18][19][20] Also, as this wavelength is outside the retinal hazard region, it permits, if circumstances require, the use of higher average optical powers in comparison to wavelengths in the visible and near-infrared regions of the spectrum.
In Sec. 2, there are details of the experimental setup used for these measurements, describing the transceiver and outlining the key system parameters. Section 3 gives details on the construction of the depth and intensity images from the acquired data using a pixelwise cross correlation algorithm. Section 4 shows results obtained using cross correlation for both an unobscured target and a target hidden behind camouflage netting. The proposed algorithm for restoration and noise reduction of both depth and intensity profiles is presented in Sec. 5 and is used to reconstruct the data used in Sec. 4 to permit a comparison of this new algorithm to the pixelwise approach outlined in Sec. 3. The conclusions of this work are presented in Sec. 6.

Experimental Setup
A schematic of the single-photon depth imaging system setup is shown in Fig. 1. The system used the TCSPC technique, where the time difference is recorded between a laser pulse being emitted and the occurrence of a photon event. Typically, these measurements are recorded over m any laser pulses for a given pixel, revealing the ToF and hence depth, of the target. The pulsed illumination was provided by a broadband supercontinuum laser source (SuperK EXTREME EXW-12, NKT Photonics) used in conjunction with a series of high-performance spectral filters. The filters used were a longpass filter with a cut-on wavelength of 1500 nm (LP1 as shown in Fig. 1), a shortpass filter (SP) with a cut-off wavelength of 1800 nm, and a 1550-nm bandpass filter (BP1) with a 10-nm full-width half-maximum (FWHM) passband. This combination of filters allowed a narrow band of light centered on 1550 nm to be transmitted, while providing good out-of-band rejection. The supercontinuum laser provided the electrical synchronous start signal for the TCSPC module (HydraHarp 400, PicoQuant). The pulse duration of the laser was less than 100 ps and the repetition rate was 19.5 MHz. The laser was coupled to the transmit channel of the transceiver unit via a 10-μmdiameter core optical fiber. The custom-built transceiver unit employed for these measurements was configured with optical components for operation at a wavelength of 1550 nm, with the layout similar to those used in our previous work reported in Refs. 3 and 21.
The transmit and receive channels of the transceiver unit were coaxial, resulting in a monostatic, parallax-free system, which allowed for operation over a wide range of target distances without the need for realignment. A polarizing beam splitter (PBS) was used to demultiplex the return signal from the common channel. Two galvanometer mirrors (GM1 and GM2 in Fig. 1) were used to raster scan the beam in X and Y across the scene. An objective lens with an aperture of 80 mm diameter and an effective focal length of 500 mm Fig. 1 Schematic of the single-photon depth imaging system that was operated at a wavelength of 1550 nm. It comprises a custom-built transceiver unit, a supercontinuum laser source, a TCSPC module, and an InGaAs/InP SPAD detector. Optical components include: polarizing beam splitter, PBS; fiber collimation packages, FC1, FC2, FCR, and FCT; scanning galvanometer mirrors, GM1 and GM2; relay lenses, R1, R2, R3; objective lens, OBJ; longpass filters, LP1 and LP2; shortpass filter, SP; bandpass filters, BP1 and BP2. was used to both focus the outgoing light on to the target and collect photons scattered back from the target. The collected return photons were routed to the receive channel and then coupled to the detector using a 10-μm-diameter core armored optical fiber. An electrically gated InGaAs/InP SPAD detector module (Micro Photon Devices) was used in these measurements, which had an operating wavelength range of 900 to 1700 nm and an active-area diameter of 25 μm. The detector was set with a 5-V excess bias and had a singlephoton detection efficiency of ∼30% at the operational wavelength. 22 Due to the monostatic configuration of the system, the presence of backreflections from the optical components within the transceiver unit could result in problems resulting from saturation of the sensitive optical detection system. Hence, the detector was operated in an electrically gated-mode in synchronization with the pulsed laser return, with the detector gate positioned to avoid these spurious backreflections. For the measurements described in this paper, a 14-ns detector gate duration was used. Afterpulsing can also present difficulties when using InGaAs/InP SPAD detectors, causing increased background levels. Afterpulsing is caused by charge carriers being trapped in defects, which are subsequently released causing spurious avalanches. 23,24 In order to reduce the deleterious effects of detector afterpulsing, a hold-off time was used to deactivate the detector for a predetermined duration after a recorded event, in order to allow the traps to empty without triggering further avalanches. In the measurements described in this paper a detector hold-off time of 40 μs was selected as a compromise between reducing the effects of afterpulsing and restricting the maximum count rate possible. More detailed descriptions of the electronic gating approach used for this detector are provided in Refs. 21 and 22. In order to reduce the effects of solar background, the receive channel was also spectrally filtered using a longpass filter (LP2) with a cut-on wavelength of 1500 nm, and a 10-nm FWHM bandpass filter (BP2), as shown in Fig. 1. The detector module provided the electrical stop signal for the TCSPC module, which was configured to output time-tagged detection events. The time-tagged event information was transferred to the control computer via a USB connection.

Estimation of Depth and Intensity Images using
Cross Correlation For each pixel, the time-tagged photons were used to construct timing histograms of the ToF information using 2 ps timing bins. Depth information was extracted from these histograms using a cross correlation method, described previously in Refs. 6 and 21. For each pixel, a cross correlation, c, was performed between an instrumental response, R, and the measured histogram, y E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 1 9 1 c t ¼ where y t is the timing histogram value at the t'th bin and T is the total number of timing bins. The timing position corresponding to the highest peak in the cross correlation was calculated for each pixel to provide target depth information.
For each pixel, the number of photons in a range of 200 timing bins around the centroid location was summed to obtain an estimate of the intensity (or reflectivity) of the target. For these measurements, the instrumental response function R was obtained by performing a single-point measurement of a uniform, flat surface, which was placed in the same nominal plane as the target position. An example of the instrumental response function for the measurements presented in this paper is shown in Fig. 2. Contributions to the timing jitter originate from the detector response, laser pulse duration, and other electronic components such as the TCSPC module. In this case, the overall system jitter was 226 ps (FWHM), with the largest contribution being the detector jitter. Typically, targets with a single reflecting surface will result in one peak per histogram, which corresponds to the target position (not including any peaks arising from backreflections as previously discussed). This means that pixelwise cross correlation can give satisfactory results since there is only one distinct target return (it is worth noting that the presence of background noise can affect the accuracy of these estimates). However, for targets behind camouflage netting or in obscuring media, the histograms may include  Example of an aggregated timing histogram of a measurement of a target placed ∼1 m behind camouflage netting. In this aggregated histogram, data from all 12800 pixels in the image are summed and displayed in this single histogram. In the figure, the larger peak consists of the returns from the camouflage netting and the smaller peak represents the returns from the target. The zero point in the depth axis is chosen arbitrarily, the camouflage netting was ∼230 m from the transceiver.
Optical Engineering 031303-3 March 2018 • Vol. 57 (3) Tobin et al.: Long-range depth profiling of camouflaged targets using single-photon detection multiple peaks, with the largest peak not necessarily corresponding to the target position. In this case, the cross correlation will assign a single depth point to only the largest return peak. To illustrate this, Fig. 3 shows an aggregated timing histogram constructed from an 80 × 160 pixel measurement of a target located at ∼1 m behind a camouflage net. This particular histogram, shown in Fig. 3, is the sum of the histograms from all 12,800 pixels in the depth image. This figure clearly shows that the return from the camouflage netting is considerably greater than the return from the target placed behind the camouflage. Preliminary results show that more advanced image processing algorithms, designed for multisurface targets that exploit spatial correlations between neighboring pixels, can be used to reduce noise and improve image quality, as described in Sec. 5.

Depth Imaging Using the Cross Correlation
Approach A series of measurements were performed in daylight at a stand-off distance of 230 m from the transceiver unit. The weather was dry, with bright daylight and overcast cloud coverage, with conditions remaining stable for the duration of the measurements. This section presents preliminary results from these trials with depth and intensity estimation of targets obtained using the pixelwise cross correlation algorithm discussed in the previous section. The target scene comprised of an actor holding one item in different positions. The first set of measurements was performed with an unobstructed view of the actors and the second set was performed with a double layer of commercially available camouflage netting placed ∼1 m in front of the actors' standing position (see Fig. 4).
The first set of targets was imaged at a stand-off distance of 230 m, unobscured by camouflage. The scanned area (1 m × 2 m) was mapped by 80 × 160 pixels (X × Y). This was equivalent to a pixel-to-pixel pitch of 12.5 mm in both X and Y at the target plane. The focused beam diameter at the target was ∼1 cm, meaning that there was little or no overlap between adjacent pixels for each scan position. A per-pixel acquisition time of ∼3.2 ms was used, which gave a total image scan time of 41.0 s. An average optical power level of just less than 1 mW at the target was used at a laser repetition rate of 19.5 MHz. Figure 5 shows the results from two measurement scenarios: the first scenario consisted of an actor holding a rocket-propelled grenade (RPG) across his chest; the second scenario is a different actor holding a plank of wood in the same position. Both intensity and depth profiles were obtained using pixelwise cross correlation. In the results shown in Fig. 5, a threshold has been applied to the data to exclude pixels with very low levels of photon returns, since they were unlikely to originate from target returns. The corresponding pixels in the depth profile were subsequently excluded.
Due to the inherent problem of range ambiguity in high repetition rate ToF systems, 25 the depth range was taken from an arbitrary zero point. In a fixed repetition rate LiDAR system, range ambiguity occurs when there is more than one possible position for a reflecting surface, which occurs when, instantaneously, there is more than one optical pulse in transit. This maximum unambiguous distance (d rep ) is dependent on the fixed repetition rate (f rep ) of the laser as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 3 2 6 ; 5 3 2 where c is the speed of light in a vacuum. Given that a laser repetition rate of 19.5 MHz was used in these measurements, the maximum range for unambiguous determination of target distance was only ∼7.7 m. Range ambiguity can be removed by a reduction in repetition rate-which can significantly increase measurement time-or by using techniques such as laser pulse trains composed of pseudorandom patterns or by the use of multiple sequential repetition rates. 26,27 Some of the background noise exhibited around the target in Fig. 5 originates from photon returns from foliage far behind the target illuminated by earlier laser pulses. Differences in the material reflectivity and dimensions of both the RPG and the wooden plank are evident in the intensity and depth profiles, making the two objects easily discernible in this example. The number of photon returns is dependent on a variety of factors such as acquisition time, optical power level, and the reflectivity of the target material at the illumination wavelength. 21,26,28 It is evident from these images that at λ ¼ 1550 nm, the clothes of the actors yielded a significant quantity of photon returns, whereas the gun handle and the actors' dark eye-wear yielded considerably less photon returns. Low photon returns from the face and hands demonstrate the low reflectivity of human skin at λ ¼ 1550 nm, as previously shown in Ref. 29. In Fig. 5, the overall depth range of the image is ∼0.5 m, and the depth profile appears to show subcentimeter depth features for most of the target. The long acquisition time of the entire scan (41.0 s) used for these measurements was chosen in order to acquire an image with a far greater amount of photon returns than required. In these measurements, both the macrotime (time from start of scan) and the microtime (time between the corresponding start signal and the recorded photon arrival time) were recorded for each detection event. This meant that we could use perpixel acquisition times that were shorter than the original measurement using shorter duration sections of each pixel's entire measurement data. The resulting depth profiles, for the scenario shown in Fig. 5(f), for per-pixel acquisition times of 3.2, 1.0, 0.5, and 0.1 ms, which correspond to image acquisition times of 41.0, 12.8, 6.4, and 1.3 s, respectively, are shown in Fig. 6.
As seen in Fig. 6, the quality of the depth profile degrades with decreasing acquisition time as the number of photons arriving back from the target decreases.
Using the same experimental parameters as used for the unobstructed scenarios, a series of measurements were performed with the target obscured by camouflage netting. The target scene consisted of an actor holding the object of interest (in this case a wooden plank held across the chest) ∼1 m behind two layers of commercially available camouflage netting (see Fig. 4). Figure 7 shows the results of this scenario processed using the pixelwise cross correlation approach.
As can be seen in Fig. 7(b), when processed using pixelwise cross correlation, the intensity map only shows the camouflage netting due to significantly higher returns. Note that the netting moved slightly throughout the entire 41.0 s measurement duration due to a slight breeze. The depth map [ Fig. 7(c)] shows a limited amount of detail from the obscured target where light has propagated through gaps in the camouflage net. In Fig. 7(c), the camouflage netting can be observed as being at the front of the depth profile (colored in blue) at a distance of 0.5 m from the reference point, whereas small regions of the target can be seen at a depth of ∼1.5 m-a distance of 1 m behind the camouflage netting. Therefore, in order to more fully profile the target behind the camouflage, data were selected from only the 1900 timing bins, which correspond to a 0.6-m depth range centered around the target. The results [shown in Figs. 7(d) and 7(e)] demonstrate that even behind a double layer of camouflage, our approach can provide depth and intensity reconstruction with approximately centimeter resolution. Such high quality depth and intensity profiling allow the targets to be identified in these examples. The "missing" pixels in the depth profile shown in Fig. 7(e) are where there were insufficient photon returns to provide depth estimations. In order to improve the quality of these depth and intensity images and permit use with low photon returns, a bespoke image processing algorithm was developed and will be described in the next section.

Restoration of Depth and Intensity Images Using
an Algorithm Based on a Total Variation Approach As highlighted in the results presented in Fig. 6, imaging a cluttered target (or reducing the acquisition time) can result in a large proportion of pixels to be either empty or contain  This challenging problem has already been tackled by the image processing community and several algorithms, based on the Poissonian statistics of single-photon data, have been designed. 15,17 For example, Shin et al. 17 presented a reconstruction algorithm that restores multiple depths from an object behind a scattering media by solving a convex optimization problem accounting for the Poisson statistics and the sparsity of the data. However, this algorithm does not consider the possible spatial correlation of the hidden object and was only demonstrated using single-photon data obtained in indoor conditions over a range of 4 m. Alternatively, Henriksson et al. 15 demonstrated a simple multisurface Gaussian fitting algorithm used in outdoor trials over tens of meters. This algorithm (i) filters the raw photon data to obtain a smaller number of peaks and (ii) uses a simple Gaussian fitting on the filtered histograms in order to obtain depth information. 15 Again this approach does not account for the spatial correlations of the hidden object and may present poor results when the measurement time is reduced. In the presence of multiple surfaces and at low acquisition times, a reduced number of photon counts is collected, resulting in no depth data or highly erroneous depth information being assigned to a significant number of pixels using pixelwise-based approaches. The resulting data can be improved using image processing algorithms that take into account the spatial correlation of the observed targets. 30 In this paper, we consider a new algorithm that has two main objectives: (i) reducing the effect of Poisson noise affecting the observed histograms and (ii) reconstructing the different target surfaces. This is achieved by adopting a statistical approach that restores the LiDAR data while accounting for the Poisson data statistics and introducing prior information to improve the algorithm's performance. In this paper, we consider two prior assumptions, the first regularizes intensities by accounting for spatial correlations between adjacent pixels; the second assumes a reduced number of detected peaks that are located in close depth regions, which regularizes the depths. By denoting by Y the ðT × NÞ observed histograms gathering the T bins and N pixels, the algorithm is based on the minimization of a cost function C with respect to X, as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 6 7 5 CðXÞ ¼ LðY; XÞ þ τ 1 ϕ 1 ðXÞ þ τ 2 ϕ 2 ðXÞ; where τ 1 > 0, τ 2 > 0, and X is a ðT × NÞ matrix representing the cloud points after denoising and restoration of the observed targets. More precisely, the elements of the n'th pixel x t;n , ∀ t are zero except in the presence of a target at distance d n , where the value x d n ;n ¼ I n will be associated with the target's intensity I n . This cost function accounts for the Poisson statistics of the observed histograms (Y) through the use of the log-likelihood of the data LðY; XÞ. However, since the problem is ill-posed, additional information should be included to improve its results, which justifies the presence of the regularization terms ϕ 1 and ϕ 2 . The latter promote the following properties: (i) a small number of depths are active with respect to the observation range window; (ii) the observed objects present spatial correlations between adjacent pixels. Due to the fine depth resolution and the large observed range window, the first property assumes that the number of layers is lower than the number of available time bins, which is introduced using a collaborative sparse prior ϕ 1 associated with an L 21 -mixed norm. 31 The second property is promoted using a convex TV regularization term ϕ 2 , which is of great interest in the image processing community since it promotes smoothness while preserving edges. To deal with sparse data and because of the fine depth resolution, ϕ 2 assumes spatially correlated pixels after the sum of a predefined set of range bins 32 (a four neighborhood structure is considered with these results). The resulting algorithm is therefore denoted by TV þ L 21 to highlight the importance of these regularization terms. The cost function in Eq. (3) is convex and can be optimized using different convex algorithms, including the alternating direction method of multipliers algorithm considered in this paper. 33,34 More details on the algorithm are presented in Ref. 35. Figure 8 compares the pixelwise cross correlation with the TV þ L 21 approach when all the data are used, i.e., with no time gating of the data. The figure clearly shows that the TV þ L 21 algorithm has extracted more information from both the main reflecting surfaces, with the weaker signal from the target much more evident than in the case of cross correlation, which will only display the highest amplitude reflection in the histogram. The field trial data were then processed using the following steps: (i) filter the histograms using the TV þ L 21 algorithm (as in the case of the point cloud shown in Fig. 8), (ii) time gate the histograms to extract the temporal region of interest of the target, and (iii) determine the position and amplitude of the maximum of each pixel that correspond to the depth and intensity of the target. Figures 9 and 10 show the estimated depth and intensity images obtained by the cross correlation and TV þ L 21 algorithms, for different acquisition times while considering downsampled images of 40 × 80 pixels from the 80 × 160 acquired pixels.
As expected, there is a decrease in the quality of the reconstructed image for both algorithms as the acquisition time is reduced, due to the photon returns being correspondingly lower. With reduced acquisition time, the cross correlation depth estimates exhibit a higher level of noise. However, the TV approach offers better restoration results where the noise surrounding the target is reduced, and the missing pixels of the part of the image comprising the human figure and the object of interest are restored. This performance was achieved as a result of considering the spatial correlation between pixels, and the use of collaborative sparsity to limit the number of active depths, which are mainly due to noise. A similar behavior is observed for the intensities where smoother and less noisy results are obtained by the image processing algorithm, especially at t ¼ 0.1 ms, where the average photon return from the human target is well below one photon per pixel. These results highlight the interest of image processing algorithms in improving the performance of the sparsity-based single-photon data.

Conclusion
This paper presents reconstructions of high-resolution depth and intensity profiles of distant targets obscured by camouflage. The data used were acquired at outdoor field trials using a single-photon ToF scanner, and the images were reconstructed using extremely low levels of photon return, down to a level of under 1 photon per-pixel, on average. All measurements were taken at a stand-off distance of 230 m in daylight, and the scanning transceiver operated at a wavelength of 1550 nm. The pulsed laser used had an average optical power of just under 1 mW, although considerable reductions in measurement acquisition time are possible with a modest increase in laser power. Overall, a good level of target identification can be observed, even for the camouflaged targets. This paper also presented an algorithm to restore the 3-D data cube representing histograms of single-photon data. The proposed method is based on an optimization of a convex cost function composed of a data fidelity term and regularization terms. The proposed formulation and algorithm showed good restoration results when processing field trial data representing a human figure standing behind a camouflage net. Such algorithm development and characterization will contribute to a more complete depth imaging model to inform nextgeneration single-photon transceiver design and to test the performance limits in terms of maximum stand-off distance, optical power requirements, and frame rate.