Performance assessment of time-domain optical brain imagers, part 2: nEUROPt protocol

. The nEUROPt protocol is one of two new protocols developed within the European project nEUROPt to characterize the performances of time-domain systems for optical imaging of the brain. It was applied in joint measurement campaigns to compare the various instruments and to assess the impact of technical improvements. This protocol addresses the characteristic of optical brain imaging to detect, localize, and quantify absorption changes in the brain. It was implemented with two types of inhomogeneous liquid phantoms based on Intralipid and India ink with well-defined optical properties. First, small black inclusions were used to mimic localized changes of the absorption coefficient. The position of the inclusions was varied in depth and lateral direction to investigate contrast and spatial resolution. Second, two-layered liquid phantoms with variable absorption coefficients were employed to study the quantification of layer-wide changes and, in particular, to determine depth selectivity, i.e., the ratio of sensitivities for deep and superficial absorption changes. We introduce the tests of the nEUROPt protocol and present examples of results obtained with different instruments and methods of data analysis. This protocol could be a useful step toward performance tests for future standards in diffuse optical imaging. © The


Introduction
In biomedical optics, the characterization of instrumentation with the help of appropriate tissue-simulating phantoms 1 plays an important role, ranging from proof-of-principle tests at early stages of the development to quality assurance during routine clinical application. 2Numerous performance studies have been carried out by individual research groups; however, there are only a few multilaboratory efforts to characterize and to compare multiple instruments.Such approaches are essential to provide a sound basis for achieving quantitatively comparable results in clinical studies.They require the development of standardized guidelines to perform the same measurements on phantoms with the same properties in different laboratories.In the field of photon migration instruments, the first systematic study of different instruments was performed on the basis of the MEDPHOT protocol. 3Recently, the development and testing of phantoms has been reported for quality assurance in a multicenter clinical trial to measure the response of breast tumors to neoadjuvant chemotherapy by means of diffuse optical spectroscopic imaging. 4In the context of fluorescence imaging of tissues, phantoms were developed to characterize and compare imaging systems and to train surgeons. 5he present work is related to the characterization of instruments for time-domain optical brain imaging.This effort was part of the tasks of the European project nEUROPt (FP7-HEALTH-F5-2008-201076) that aimed to develop new time-domain systems for optical imaging of the brain based on technological advances as well as novel methodological approaches.Common protocols were developed to provide guidelines for the assessment and comparison of instruments in coordinated measurement campaigns, for the estimation of the impact of technological and methodological advances on their performances, and for supporting quality assurance by routine tests, in particular, during clinical studies.This work represents a contribution to the ongoing efforts toward quality control 6 and standardization in the emerging field of functional near-infrared spectroscopy (fNIRS), 7,8 in general, and more specifically, for time-domain fNIRS imaging. 9he nEUROPt protocol was specifically developed to address the characteristic of instruments for optical brain imaging to detect, localize, and quantify changes in the optical properties of the brain and to strive to eliminate the influence of changes in extracerebral tissues on the measurement.It is focused on the assessment of sensitivity, spatial resolution, and quantification of an absorption change Δμ a in the cerebral cortex as the most relevant physical quantity related to neurological *Address all correspondence to: Heidrun Wabnitz, E-mail: heidrun.wabnitz@ptb.de applications of diffuse optical imaging.The complete performance assessment of the time-domain optical brain imagers also included other relevant aspects covered by the following protocols: (1) the previously developed MEDPHOT protocol 3 that evaluates the capability of an instrument to measure the optical properties (absorption coefficient μ a and reduced scattering coefficient μ 0 s ) of a homogeneous turbid medium and (2) the new Basic Instrumental Performance (BIP) protocol that measures instrumental characteristics in a direct way as described in a companion paper. 10iven the nature of the problems concerned with neurological applications of diffuse optical imaging, inhomogeneous turbid media had to be considered and specific quantities (measurands) for the individual tests had to be defined.There are numerous accounts on the construction and application of phantoms to model various aspects of the tissue structure of the head and of brain activation, e.g., layered gel phantoms, 11 liquid phantoms with variable absorption in simple or more realistic geometries, 12,13 and dynamic solid phantoms with absorption changes induced by mechanical, 14 thermochromic, 15 or electrochromic 16 methods.The nEUROPt protocol was implemented with liquid phantoms based on aqueous dilutions of Intralipid and India ink, taking into account the ease of preparation and replication as well as the achievable accuracy and reproducibility of optical properties.To model localized absorption changes related to a localized brain activation, solid black inclusions were used.This choice was supported by the derivation of an equivalence relation between a realistic absorption change in a finite volume and the perturbation produced by a small black object. 17Further, the liquid phantom also allowed a layered medium to be mimicked by separating two compartments by means of a thin Mylar foil. 18We defined and applied tests and measurands, such as contrast, contrast-to-noise ratio, spatial resolution, and depth selectivity.The explicit specification of the operational parameters for the implementation of the protocol was another prerequisite to ensure the comparability of results across various laboratories.
This paper is organized as follows.First the measurands and individual tests are introduced, followed by the description of the liquid phantoms and the measurement conditions relevant for the implementation of the protocol.A number of instruments which were assessed during the joint measurement campaigns within the nEUROPt project are introduced, and finally, typical results are reported, which were obtained with the two types of phantoms and two different approaches for data analysis.

Definition of the Tests
The nEUROPt protocol is designed to characterize the performances of instruments for optical brain imaging by focusing on application-oriented figures while treating the whole instrument as a black box.Thus, the performance assessment is directed toward the final results of the measurement related to the clinical application, mimicked by appropriate phantom measurements.The results of these measurements are influenced by instrumental features and settings as well as by the analysis procedure.If the influence of hardware aspects of different instruments is to be compared, the experimental conditions and the analysis need to be kept the same.Similarly, the protocol can also be used to assess the performances of different algorithms of data analysis, based either on forward simulations or on a set of experimental data obtained in phantom measurements.
The protocol consists of a total of six tests which address three main features that are of direct relevance for the performance of optical brain imaging in clinical applications, namely (1) sensitivity, (2) spatial resolution, and (3) quantitation of absorption changes in the brain.Each feature is characterized by two tests.In particular, the sensitivity is described by contrast and contrast-to-noise ratio, the spatial resolution by depth sensitivity and lateral resolution, and the quantitation by accuracy and linearity.
The nEUROPt protocol was developed in the context of time-domain brain imaging, which relies on the measurement of time-resolved diffuse reflectance, most commonly by timecorrelated single photon counting (TCSPC).The measurement provides the temporal profile NðtÞ of the histogram of collected photons, also denoted as distribution of times of flight of photons (DTOF).Although the results presented in this paper refer to this specific kind of measurement, it is worth noting that the nEUROPt protocol is not restricted to time-resolved measurements, but can also be applied to continuous wave (cw) or frequency-domain instruments.
To ensure the applicability of the tests to any mode of measurement and data analysis, the tests were formulated in a generalized way.The tests of the nEUROPt protocol can be applied to any measurand M that reflects absorption changes in the brain.M can represent, in particular, (1) absorption changes Δμ a retrieved by specific reconstruction algorithms and (2) semiempirical quantities as, for example, photon counts in parts of a measured DTOF (time windows) or moments of the DTOF (integral, mean time of flight, variance).These semiempirical quantities can be regarded as primary steps in the analysis of the measured DTOFs that have been routinely used within the nEUROPt consortium.Here, they serve as examples of algorithms with different features.
In the following, the individual tests are introduced and the related measurands are first defined in an abstract manner.In order to apply the protocol in practice, a specific implementation identifying suitable phantoms and measurement conditions has to be specified.A possible implementation-the one actually used to compare different brain imagers within the nEUROPt project-will be presented in Sec. 3.

Sensitivity
The goal of this section is to define tests for the assessment of the detection capabilities of the techniques with respect to small absorption changes in the cortex.Two tests are identified, i.e., contrast and contrast-to-noise ratio.

Contrast
The primary assessment of the effect of a given absorption change in the compartment i on the measurand M is to consider the difference with respect to the baseline state (prior to the absorption change) M 0 or the contrast, expressed as a relative difference Which of these options is more appropriate depends on the particular measurand M. For example, for the photon count N k in a k'th time window, only the ratio ΔN k;i ∕N k;0 is relevant, whereas the attenuation related to a time window requires that the absolute difference ΔA k;i ¼ − lnðN k;i ∕N k;0 Þ be considered.In other cases, e.g., for moments of DTOFs, both options can be applied.
A valid comparison of contrast is feasible only within measurands of the same kind, for example, within various ratios of photon counts in time windows.Comparison of measurands of different kinds is possible on the basis of the following tests: contrast-to-noise ratio, depth selectivity, lateral spatial resolution, and linearity.

Contrast-to-noise ratio
The detectability of a small absorption change Δμ a;i in the compartment i depends on the contrast-to-noise ratio CNR i , i.e., the ratio between the corresponding change in the measurand ΔM i and the uncertainty of M due to random effects.
σðM 0 Þ is the standard deviation of a series of repeated measurements for the unperturbed (baseline) state.Typically, the major source of noise is photon noise.Hence, the contrast-to-noise ratio depends on the detected intensity and, in particular, on the injected laser power, on the source-detector separation, on the responsivity of the detection system, 10 as well as on the measuring time t meas of the individual measurements.In addition, σðM 0 Þ can be influenced by various random fluctuations due to instabilities in the measurement system, e.g., laser power.

Spatial Resolution
Depth discrimination and lateral localization of absorption changes are major tasks in optical brain imaging.Due to strong scattering in the tissue (typically μ 0 s ∼ 1 mm −1 ), the spatial region in which an absorption change influences a measurement of diffuse reflectance has extensions on the order of 1 cm.In addition, the attenuation of light in tissue makes the detection of absorption changes in the cortex a challenging task.Depth selectivity is a practicable test related to depth localization and depth resolution.The spatial resolution parallel to the surface is addressed by the test lateral spatial resolution.

Depth selectivity
This test addresses the capability of instruments and/or methods of data analysis to distinguish between absorption changes occurring in different compartments of the head, in particular, in the cortex versus the overlying tissues.It compares the sensitivity of the measurand M to a small absorption change Δμ a in the lower compartment (index 2, corresponding to the cortex) with the sensitivity to an absorption change in the upper compartment (index 1, corresponding to extracerebral, superficial tissue) by considering the ratio S For ease of implementation, the reduced scattering coefficient μ 0 s is kept constant and equal in both compartments.
The sensitivity ratio S 2;1 is derived from three measurements, Mðμ a;1 ; μ a;2 Þ ¼ M 0 (baseline state), Mðμ a;1 ; μ a;2 þ Δμ a;2 Þ with a change in the lower compartment only, and Mðμ a;1 þ Δμ a;1 ; μ a;2 Þ with a change in the upper compartment only.For Δμ a;1 ¼ Δμ a;2 , the ratio S 2;1 equals the ratio of the corresponding contrasts, which is valid for both cases of contrast definitions, see Eqs. ( 1) and ( 2).The ratio S 2;1 is dimensionless and can be used to compare measurands of any kind and dimension.Ideally, a measurand that is sensitive to absorption changes in the cortex should only exhibit an infinitely large value of S 2;1 .The magnitude of S 2;1 critically depends on the thickness of the upper compartment.

Lateral spatial resolution
Lateral spatial resolution is determined as a spatial point spread function (PSF) by measuring the diffuse reflectance for a small absorber moved across this region.It is quantified by the full width at half maximum (FWHM) of the PSF related to a measurand M along two perpendicular directions parallel to the surface.
at a predefined depth z below the surface and for a source-detector separation r.The source is located at (0, 0, 0) and the detector at (r, 0, 0).Since the resolution is most relevant with respect to changes in the cortex, z is preferably chosen to represent a typical depth of the cortex.

Quantification of Absorption Changes
The goal of quantification of concentration changes of oxy-and deoxyhemoglobin can only be achieved if the absorption changes in a certain compartment of the head can be accurately retrieved.In this case, the accuracy test is mandatory for performance assessment.In many cases, data analysis relies on a linear approximation that is only valid for small absorption changes.Therefore, the linearity test is important to determine the range of applicability of a linear method of analysis.This test can be applied to any measurand M.

Accuracy
The accuracy of a measured absorption change Δμ a;i is characterized as a relative measurement error, i.e., the relative deviation from its (conventional) true value Δμ a;i .
The index i denotes the compartment in which a known absorption change Δμ a;i was realized.The value of Δμ a;i is available only if an absorption change can be derived by the application of a reconstruction algorithm to the measured DTOFs.
The accuracy depends on the measurement as well as on the reconstruction algorithm.Practically, the conventional true value Δμ a;i realized in an inhomogeneous phantom can be obtained by prior accurate characterization of the individual turbid materials in homogeneous phantoms.The absolute (baseline) optical properties of the materials used should also be known and kept fixed when comparing the accuracy of Δμ a measurements performed with different instruments and/or reconstruction algorithms.

Linearity
To assess the linearity of the change in the measurand M with respect to an underlying absorption change Δμ a;i in the compartment i, the values for ΔM (together with their uncertainty) are plotted as a function of Δμ a;i .The linearity range is defined as the maximum value of Δμ a;i for which the deviation from proportionality does not exceed a certain percentage of the respective ΔM.This test requires several absorption changes to be realized, starting from small values and a sufficiently low uncertainty of ΔM due to random effects.

Implementation
This section describes the particular phantoms and measurement conditions to perform the tests of the nEUROPt protocol in a standardized and reproducible manner.The intention was to model the optical properties and geometry of the adult human head in a largely simplified, but as far as possible realistic, manner.The tests rely on two types of inhomogeneous turbid phantoms: (1) a localized inclusion in an otherwise homogeneous medium and (2) a two-layered medium.

Liquid phantoms
The decision to use well-characterized liquid phantoms for the performance characterization of instruments in the present study was based on the following considerations: • Liquid phantoms offer a very high flexibility to (1) gradually change optical properties and (2) realize various inhomogeneous geometries, in particular, to vary the position of an inclusion.
• Liquid phantoms can be replicated much more easily than solid phantoms.Common measurement campaigns can be performed in parallel at several institutions.
• The optical properties can be adjusted with good reproducibility (a few percentage points) based on well-characterized Intralipid and prediluted ink as base materials.
The optical properties of these base materials (1) remain stable for a long time, (2) are almost identical when taken from the same batch, and (3) are very similar for samples from different batches. 19,20detailed procedure was developed at the University of Florence (UNIFI) to prepare liquid phantoms with known optical properties based on water dilutions of Intralipid as a diffusively scattering component and India ink as an absorber.The amounts of components to be mixed were solely determined by accurate weighing, which ensured that the concentrations were known with high accuracy.UNIFI provided the other partners with Intralipid and India ink from the same batches and with optical properties characterized with an uncertainty of <3%.The mixtures had to be freshly prepared on the day of the experiment.Accurate weighing and homogeneous mixing of the components is mandatory.It should be noted that these phantom materials were also characterized in an interlaboratory comparison of nine laboratories of the nEUROPt consortium and beyond.The intrinsic absorption coefficient of India ink and the intrinsic reduced scattering coefficient of Intralipid-20% were determined with an uncertainty of ∼2% or better. 21odular scattering cells made of black polyvinyl chloride (PVC) with small plexiglass windows were provided by UNIFI for measurements of time-resolved reflectance in homogeneous and layered geometry (see Refs. 18 and 22).For the implementation of the nEUROPt protocol, black front walls of 2 mm thickness were used, with three transparent windows (diameter 7 mm, adapted to the diameter of the fiber bundles used) for a source optode and a detector optode positioned 20 and 30 mm apart.For the cell, spacers of 10 and 30 mm thickness were provided.The effects of the finite lateral dimensions of the cell, the shape, and the refractive index mismatch of the inclusion were studied in detail by Monte Carlo (MC) simulations. 22The inner dimensions of the cell were 120 mm width, 145 mm height, and 70 mm thickness (with three spacers).A Mylar foil of 30 μm thickness could optionally be inserted as a separator with minimal perturbation of light propagation to realize the two-layer geometry. 18During data analysis of the measurement campaign, it turned out that it is not easy to maintain a constant and reproducible thickness of the upper layer (see Sec. 5.2).Therefore, a dedicated mounting plate for the Mylar foil was designed and manufactured by Physikalisch-Technische Bundesanstalt (PTB) for future use with a smaller area (60 mm × 80 mm) to be covered by the foil.This arrangement was used when characterizing the instrument POLIMI_2.

Absorbing objects
To mimic the perturbation due to small localized variations of the absorption coefficient for the nEUROPt protocol, UNIFI proposed the use of small black PVC cylinders immersed in the homogeneous liquid phantom. 22They can, by far, more easily be manufactured and reproduced than small inhomogeneities made of a scattering and moderately absorbing material with well-known optical properties.The major idea behind this approach is the equivalence of the perturbation by a small black inclusion and by a certain moderate absorption change in a given volume.For details, see Martelli et al. 17 Thus, the small black cylinders can be used as a kind of universal inclusions provided their position is sufficiently far (>10 mm) from the source and detector and from the boundary of the medium.PVC cylinders of various sizes were provided by UNIFI, with dimensions (diameter equal to height) of 3.2, 4, 5, 6.8, and 8.6 mm, corresponding to volumes V incl ∕mm 3 ¼ 25, 50, 100, 250, and 500.They were held by thin, rigid metallic wires (0.5 mm music wire) that were painted white in order to reduce their influence on the measurement.The perturbation by the black cylinders can be directly matched to the equivalent finite Δμ a changes over a certain larger volume (for instance, for a reference volume V 0 ¼ 1000 mm 3 , the equivalent values for the five cylinders were found to be Δμ a ∕mm −1 ¼ 0.0056, 0.0087, 0.015, 0.037, and 0.094, within the restrictions regarding the position as mentioned above.Thus, linearity plots can be drawn against the equivalent Δμ a .

Specifications of Measurement Conditions
The measurement conditions were explicitly defined in order to ensure the comparability of the results of performance assessment across various devices and institutions.A consolidated table was prepared which specified all relevant parameters for the measurements pertaining to the BIP protocol, 10 the MEDPHOT protocol, 3 as well as the nEUROPt protocol.These specifications included (1) the phantom configuration and measurement geometry as well as the baseline optical properties, (2) the geometrical parameters or optical properties to be changed, and (3) the parameters of data acquisition (measuring time, number of repetitions, count rates).In addition, a template lab report was prepared to facilitate the exchange of raw measured and preprocessed data between the institutions.
Table 1 represents a part of the complete implementation table.It contains the parameters of the phantoms to be changed in the measurements according to the nEUROPt protocol.For all tests, the target baseline optical properties were μ 0 s ¼ 1 mm −1 and μ a ¼ 0.01 mm −1 and the source-detector separation r ¼ 30 mm.The first column contains the tests addressed (see definitions in Sec. 2).A set of phantom measurements, i.e., a row in Table 1, can provide data for several tests.Meanwhile, the tests for accuracy and linearity can be realized by measurements on both types of phantoms.Section 4 gives examples for several tests on both phantoms.

Instruments and Data Analysis 4.1 Instruments
The instruments and configurations that have been characterized during the multilaboratory measurement campaign according to the nEUROPt protocol are listed together with their most relevant specifications in Table 2.In the discussion of the results in Sec. 5, they are referred to by their code.Table 2 specifies the laser, the detector types, and the parameters of the detection fiber bundles since these are the most relevant determinants of the instrument response function (IRF).The parameters of the source fibers (which are typically graded-index or low NA fibers and have less influence on the IRF) are omitted in this abbreviated table.The codes of the instruments in Table 2 are consistent with those of Table 3 in the companion paper, 10 but not all instruments and configurations underwent all tests.
The instruments denoted by "brain imager" are compact, portable systems for clinical applications on adults.All of these instruments contain lasers operating at two or more wavelengths (mostly between 690 and 830 nm), compact fast photomultipliers, preamplifiers, and multiboard TCSPC systems.PTB performed the experiments on the same phantom with two types of detector modules (PTB_1 and PTB_2) installed in the brain imager and, in addition, either with an external hybrid detector (PTB_3) or with a setup based on a supercontinuum laser source with an acousto-optic tunable filter and a hybrid detector (PTB_4).Figure 1 shows selected IRFs whose temporal profiles differ remarkably.Some of them exhibit afterpeaks at a level of several percent of the maximum at different temporal positions.The detector used with POLIMI_2 and PTB_4 has a narrower IRF without afterpeaks and an approximately exponential tail.The different widths of both IRFs result from the different pulse widths of the laser types used as well as from the different amounts of dispersion in the detection bundles.

Data Analysis
Two types of semiempirical algorithms have been routinely used within the nEUROPt consortium as the first steps for analyzing the measured DTOFs obtained in in vivo experiments.These algorithms are based on (1) moments and (2) time windows of the distributions.The moments considered are the integral N tot (total photon count), the first moment m 1 (mean time of flight), and variance V (second central moment).They are derived from the counts N i in the time channels (width Δt) of the histogram memory between limits a and b, according to Photon counts in the k'th time window with limits [a k ; b k ] are obtained as To estimate the expected results of measurements and to check for deviations from ideal conditions, dedicated simulations   Measurements reported here were performed with z and d 1 values of 10 mm.Values of 15 mm are suggested for future measurements, to better match the typical geometry of the adult human head. 23ere performed.The perturbation of time-resolved diffuse reflectance due to black inclusions at varying positions was simulated by UNIFI using an MC code for photon migration through turbid media containing spherical absorbing objects. 22,33Simulations for the two-layer geometry were carried out with the forward solver based on the solution of the diffusion equation for n-layered media. 34

Results and Discussion
During the common measurement campaign, the tests according to the nEUROPt protocol were performed by four groups of the nEUROPt consortium, in part, with several instruments.The measurements were analyzed by the time window as well as the moments approach.A comprehensive set of meaningful data was obtained.Here, we cannot present examples of all individual tests.The results shown below were selected to give an overview of the kind of data obtained and to illustrate the influence of particular features of instruments and methods of data analysis.The results are arranged according to the type of phantom.Examples for both approaches of data analysis are given.In the case of depth selectivity, we show a direct comparison of results of several instruments.Additional examples of results of various other tests, for one particular instrument, can be found in the consolidated reporting sheet shown in Sec.5.3, e.g., for lateral resolution and linearity.).The depth-dependent contrast for instrument PTB_2 for late times (eighth time window, ranging from 3500 to 4000 ps) is similar to that for the third time window.The reason is the existence of a substantial late afterpeak in the IRF of PTB_2 (see Fig. 1).Meanwhile, the hybrid detector employed in the setup PTB_4 together with a supercontinuum laser has a good time resolution and an IRF without afterpeaks.The contrast for a deep inclusion, e.g., at z ¼ 20 mm, is considerably larger for PTB_4 compared to PTB_2. Figure 3 provides another representation of the results of the same measurement.For the inclusion positioned at shallow depths, the contrast decreases with increasing time [Fig.3(a)], whereas a deep inclusion becomes detectable only at later times [Fig.3(b)].For the configurations PTB_1 and PTB_2, a deviation from this behavior is observed at late times, i.e., too high a contrast for the shallow inclusion and too low a contrast for the deep inclusion.This finding can be explained by the fact that an afterpeak in the IRF (see Fig. 1) effectively transfers early photons to late times.At the same time, these falsely assigned photons lead to a decreased noise and deviations in the contrast-to-noise ratio [Fig.3

(c)]. A comparison of Figs. 3(c) and 3(d) reveals a considerably lower maximum CNR for the deep versus shallow position of the inclusion.
For the inclusion at z ¼ 6 mm, the maximum contrast and CNR are both found at early times.The maximum CNR for z ¼ 16 mm, however, is found at intermediate times.At late times, where the contrast is highest, the low photon count at late times causes the CNR to drop.The results presented in Figs. 2 and 3 illustrate that contrast and CNR alone are not sufficient to characterize the discrimination between a superficial and a deep absorption change.For this characteristic, we refer to depth selectivity, see Sec. 5.2.
As an example of analysis of moments, the depth-dependent contrast for variance derived from DTOFs is shown in Fig. 4 for three of the black cylinders.Unlike in the case of a time-window analysis, the results for the three different experimental configurations applied in the same measurement almost exactly match.This finding is explained by the fact that measurands based on differences in variance (as well as mean time of flight) are virtually independent of the IRF. 35In addition, the measured variance contrast agrees very well with the results of MC simulations performed according to Ref. 33 for black spheres of equal volume (data not shown, see Ref. 22).Note the changing sign of variance contrast between 10-and 12-mm depths, a behavior that is typical for normalized moments.

Contrast
The measurements on the two-layered phantom allow the contrast of laterally extended absorption changes in a superficial and a deep compartment to be compared.Results of a time-window analysis of the measurements on the two-layer phantom with an upper layer of nominal thickness of 10 mm are presented in Fig. 5, together with the results of two-layer simulations based on Ref. 34.The absorption is changed in each of the  compartments separately; the corresponding results are plotted in the upper and lower rows.Note that the range of the absorption change is rather large, μ a is changed from 0.01 mm −1 (baseline) to 0.02 mm −1 .It is not surprising that the contrast is not linear in Δμ a ¼ μ a − 0.01 mm −1 in the whole range.
The contrast for changes in the upper layer is comparable with the simulated data for all instruments, whereas the experimental results for changes in the lower layer show less contrast than expected from the simulations.A major reason is the finite width of the IRF that has not been accounted for in the simulations.In general, the sensitivity to absorption changes in the lower layer is higher for later photons, as observed in the case of POLIMI_2.The experimental data for the instruments PTB_1 and PTB_2 show, however, that the maximum contrast for the lower layer is achieved for intermediate time windows.For the latest time windows, the contrast decreases again.This behavior can be explained by the influence of afterpeaks in the IRF (see Fig. 1

and Ref. 10).
The contrasts for moments obtained on the two-layered phantom with the instrument IBIB_1 is depicted in Fig. 6, together with simulations with the same nominal optical properties and thickness of the upper layer.Measured and simulated contrasts agree very well despite the fact that the IRF has not been taken into account in the simulations.As already discussed in Sec.5.1, this behavior is expected since differences of moments are virtually independent of the IRF. 35n general, the contrast obtained from N tot is larger in the upper layer (by about a factor of 3), and the contrast based on m 1 is approximately the same in both layers, whereas the contrast for V is larger (by about a factor of 2) for absorption changes in the lower layer.A quantitative comparison of these results is performed in the following section.

Depth selectivity
The definition of depth selectivity [Eq.( 4)] based on a ratio is a way to compare different measurands even of different dimensions (see the different moments in Fig. 6) with respect to the discrimination between deep and superficial changes.This ratio is formed from data such as those displayed in Figs. 5 and 6 for the lower and upper compartments.In Fig. 7, the same quantity S 2;1 is plotted for analysis by time windows [Fig.7(a)] and moments [Fig.7(b)].S 2;1 was obtained as the ratio of the slopes of the contrast curves for small absorption changes, i.e., for μ a  values in the interval [0.010 mm −1 , 0.012 mm −1 ].The consolidated plots in Fig. 7 include the results obtained for several different instruments and configurations together with results obtained from simulations.Comparing the plots for the instrumental configurations PTB_1, PTB_2, and PTB_3, which differ in the type of detector, reveals the different influence of the IRF with both approaches of analysis.For time windows [Fig.7(a)], the maximum achievable depth selectivity for late photons is substantially degraded if the IRF exhibits remarkable afterpeaks as in case of the instruments PTB_1 and PTB_2.The existence of afterpeaks causes a fraction of early photons to be detected at late times.Such afterpeaks are not present in the case of the hybrid detectors in PTB_3 and POLIMI_2.These instruments provided the best depth selectivity at late times.Moreover, in these cases, the courses as a function of time essentially follow that for the simulated data where no IRF was taken into account.
For moments [Fig.7(b)], the influence of the IRF is expected to vanish, as discussed above in Sec.5.1.This can actually be observed when comparing the results for the three PTB detectors that were obtained together in the same phantom experiment.It should be noted that here the integration limits were set at 0.1% of the maximum (otherwise 1%) to avoid cutting within the region influenced by the afterpeak for PTB_2.Discrepancies between the measurements by different groups are most likely due to inaccuracies in the thickness of the upper layer caused by difficulty in flattening the inelastic Mylar foil that separates the layers.The comparison with corresponding two-layer simulations suggests that in the case of PTB, the thickness of the upper layer was most likely ∼9 mm instead of 10 mm.A decreased thickness of the upper layer mainly leads to an increase in contrast for changes in the lower layer and, thus, to an overestimation of depth selectivity.The inaccuracy of the thickness hampers the comparison between different experiments.An improved mounting of the Mylar foil employing the mounting plate mentioned in Sec.3.1 could solve this problem.This mounting plate was applied in the later experiment with the instrument POLIMI_2, which yielded depth selectivities for moments rather close to the simulated values.
Overall, the highest depth selectivity is obtained for variance.It should be noted that the depth selectivity of the time-window approach can be substantially improved by considering ratios of photon counts in late to early time windows.

Consolidated Presentation of Results
In order to facilitate the presentation of the results of all tests, the comparison of different instruments or different configurations of the same instrument, as well as the archival storage of system specifications, a consolidated reporting sheet was prepared.The report is automatically generated from the data inserted by the user in a tabular form.Figure 8 shows an example for a particular instrument and method of data analysis.
The first page of the report is specific for the instruments tested.It is divided into three sections: a first section (Section I) with some information on the system (e.g., name, institution, operating conditions); a second section (Section II)  with six figures reporting the results of the main tests (contrast, contrast-to-noise ratio, x and y lateral resolution, linearity, depth selectivity); and a third section (Section III) with several final synthetic descriptors derived automatically from the figures and used to summarize the key performances of the system in a short and quantitative way for easy grading of the instrument under test.The second page (not shown) describes the actual implementation of the nEUROPt protocol and the conditions for analysis and reporting.In particular, the specific measurands reported on the first page are defined and details on the synthetic descriptors are provided.
In the example presented in Fig. 8, the measurands are photon counts in an early time window (N_E, 0 to 500 ps), containing information mainly on superficial regions, photon counts in Fig. 8 Front page of reporting sheet for one of the instruments (POLIMI_1 at 830 nm) with data analysis by time windows (N_tot: total counts integrated from 0.0 to 4.0 ns, N_E: early time window integrated from 0 to 0.5 ns, N_L: late time window integrated from 2.5 to 3.0 ns).
a late time window (N_L, 2500 to 3000 ps), mainly probing deeper regions, and the time-integrated (N_tot) signal.The measurements were performed according to the specifications in Table 1.The figure related to relative contrast was obtained using a black PVC cylinder of volume V cyl ¼ 100 mm 3 , equivalent to Δμ a ¼ 0.015 mm −1 within a spherical volume V 0 ¼ 1000 mm 3 .The linearity plot was constructed from the change in the measurand produced by the black cylinders with increasing volumes at the specific depth z 0 ¼ 10 mm.The contrast is plotted versus the equivalent increase in μ a for V 0 ¼ 1000 mm 3 .For the transformation from V cyl to Δμ a , we applied the nonlinear relation between the black object volume and a realistic equivalent absorption perturbation reported in a previous paper. 17Finally, the depth selectivity assessed using the two-layer phantom is shown.Compared to the complete nEUROPt protocol, the accuracy test is missing here.The reason is that the specific measurands used in this example (photon counts in selected time windows) are not sufficient to retrieve absolute information on the absorption coefficient.
The choice of the synthetic descriptors aims at extracting several numbers for a quick and quantitative comparison of the instruments at the expense of loss of detail.Particular thresholds or conditions are specified to extract these values from the plots of Section II.In the case illustrated in Fig. 8, five descriptors were extracted, namely (1) the contrast obtained at z ¼ 15 mm; (2) the CNR at z ¼ 10 mm; (3) and ( 4) the FWHM of the x and y lateral resolution plots; and (5) the depth selectivity for Δμ a ¼ 0.004 mm −1 .

Conclusions
As a joint effort of the nEUROPt consortium, a standardized protocol (nEUROPt protocol) has been developed to assess and compare the performances of time-domain brain imagers with respect to the detection, localization, and quantification of absorption changes in the brain.The protocol is composed of six tests, namely contrast and contrast-to-noise ratio to assess sensitivity of detection, depth selectivity and lateral resolution to address capabilities related to localization, and linearity and accuracy to grade quantification.The instrument as a whole, together with data analysis, is subjected to these tests.They aim at the determination of performance parameters that are directly relevant in the context of the in vivo application.The tests provide a snapshot of the capabilities of a particular system and allow various comparisons to be made.
A specific implementation of the protocol was proposed, exploiting well-characterized and reproducible inhomogeneous liquid phantoms 17,21,22 together with a compilation of experimental conditions for the test measurements.Following the protocol and the implementations described above, a total of eight instruments developed by four different laboratories were tested.The results of these measurements were represented with several examples comparing different instruments and methods of data analysis.These examples illustrate, in particular, the suitability of the protocol to evaluate the effect of technical improvements, e.g., the use of different detectors.The shape of the IRF turned out to be crucial for the performance of a time-domain brain imager.In particular, afterpeaks in the IRF caused a substantial decrease in the contrast for photons with long flight times, thus hampering detection of deep absorption changes.In this context, the link to the BIP protocol 10 is important.A thorough characterization of the instrument itself, complemented by simulations taking into account its characteristics, facilitates the understanding and interpretation of the results of the tests according to the nEUROPt protocol.
The results of the tests presented in this work can also serve as an example for comparing different algorithms of data analysis and assessing their robustness with respect to experimental factors.We focused on straightforward measurands that were routinely obtained in primary steps of the analysis of measurement data, including those recorded in clinical studies.Two approaches were considered here based on time windows and moments of DTOFs, respectively.Both types of measurands are suitable to study absorption changes in the cortex, but exhibit a substantially different sensitivity to the shape of the IRF.It should be noted that these basic steps of analysis do not directly yield the absolute value of Δμ a , which would require more sophisticated reconstruction methods.Hence, the accuracy test was not applied.
All test measurements started from a fixed and equal baseline photon count rate.This approach eliminates the influence of differing laser power, of the scheme of its distribution to various source optodes, and of the responsivity of the detection system.Thus, the overall sensitivity of the instruments is not reflected in the results.This decision was taken to avoid too much complexity.The BIP protocol 10 addresses these issues separately, specifically by means of the responsivity test together with recording the source parameters.The noise test of the MEDPHOT protocol, 3 i.e., evaluation of the noise level of optical properties as a function of input energy, illustrated the possibility of a combined approach.
The implementation of the nEUROPt protocol presented here was based on a fixed geometry (source-detector separation, layer thickness) and optical properties of the phantoms chosen to reflect realistic conditions of in vivo measurements.However, the tests could also be applied to simulated data to study the effect of these parameters.It is worth mentioning that three factors, i.e., (1) physics of light propagation in turbid media, (2) instrumental characteristics, and (3) features of data analysis methods, influence the various tests differently.Some tests are dominated by the physics of the problem, as in the case of the lateral resolution, which is only slightly affected by instrumental characteristics, e.g., the size of the optodes.
Although originally developed for time-domain brain imagers, the nEUROPt protocol is also applicable to instruments based on cw or frequency-domain technologies.The tests were formulated in a rather universal way in terms of an arbitrary measurand M which can also represent, in particular, the phase shift and demodulation obtained from frequency-domain measurements.When applying the protocol to cw or frequencydomain technologies, the protocol itself can remain valid.Even the implementation in terms of phantom configuration and measurement geometry as well as the baseline optical properties could be adopted.However, the conditions of data acquisition (e.g., measuring time, signal magnitude) need to be specified with respect to the particular technology.With some modifications, the nEUROPt protocol can also be extended to address other applications of diffuse optical imaging, in particular, optical mammography.While most of the tests can also be applied in transmission geometry, the depth selectivity test, as described above, is primarily relevant to measurements in reflection geometry.
The work presented here can be useful when designing performance tests in future documentary standards in these fields.Liquid phantoms are most likely impracticable for test procedures in such a context, requiring easy and reliable means to be applied by the manufacturer under industrial conditions.In such a case, solid, easily accessible, and possibly commercially available phantoms need to be developed.

5. 1 . 1
Depth-dependent contrastContrast and noise measured as a function of the depth of a black inclusion in the liquid phantom are presented in Fig.2for two instrumental configurations, for analysis of time windows.Figure2(a) clearly illustrates the advantages of the time-resolved measurement to detect absorption changes in the brain.Photons with a short time of flight are only sensitive to shallow absorption changes.For late time windows, the highest contrast is found deeply below the surface, in this example at

Fig. 1
Fig.1Instrument response functions.The measuring time was 20 s in each case, constant background was subtracted.The position t ¼ 0 was obtained as the baricenter of each instrument response function in the interval between the points at half maximum.

Fig. 2
Fig. 2 Contrast of photon counts in time windows measured as a function of depth position z of a black polyvinyl chloride (PVC) cylinder of 100 mm 3 volume, for instrumental configurations PTB_4 (a) and PTB_2 (b).The numbers indicate consecutive time windows of 500 ps width; error bars represent noise as obtained from the standard deviation of repeated measurements.

Fig. 3
Fig. 3 Contrast [(a) and (b)] and contrast-to-noise ratio [(c) and (d)] for the black cylinder of 100 mm 3 volume as a function of time, for depths z ¼ 6 mm [(a) and (c)] and z ¼ 16 mm [(b) and (d)].The results are plotted for the configurations PTB_1, PTB_2, and PTB_4.

Fig. 5
Fig. 5 Contrasts of selected time windows of 500 ps width (number given in the legend) as a function of a change in μ a;1 in the upper layer (upper row) and μ a;2 in the lower layer (lower row) of the two-layered phantom with nominal thickness of the upper layer d ¼ 10 mm.First column: two-layer simulation based on Ref. 34, second column: PTB_1, third column: PTB_2, fourth column: POLIMI_2.

Fig. 6
Fig. 6 Contrasts of moments, (a) integral, (b) mean time of flight, (c) variance, as a function of a change in μ a;1 in the upper layer (up triangles, gray) and μ a;2 in the lower layer (down triangles, black) for the instrument IBIB_1.Both integration limits for moments were set at 1% of the maximum.Dotted lines: simulations based on Ref. 34.

Fig. 7
Fig. 7 Depth selectivity, i.e., the ratio of contrasts for small absorption changes in the lower layer and in the upper layer, for time windows (a) and moments (b).The positions of the symbols in (a) correspond to the center of the respective time windows.

Table 1
Implementation of the tests of the nEUROPt protocol by phantom measurements (x , y, lateral; z, depth position coordinates; V cyl , volume of black cylinder; d 1 , thickness of upper layer).
bChange in upper / lower layer separately a Indirect method, requires prior application of equivalence relation for black absorbers.

Table 2
Instruments and configurations characterized.Codes (acronyms of institutions, for complete information, see author affiliations): POLIMI, Politecnico di Milano; IBIB, Nałęcz Institute of Biocybernetics and Biomedical Engineering; UCL, University College London; PTB, Physikalisch-Technische Bundesanstalt.Superscript lowercase letters indicate manufacturers of components.Parameters of collection fibers or fiber bundles-D, diameter; L, length; NA, numerical aperture.