Multisite concordance of apparent diffusion coefficient measurements across the NCI Quantitative Imaging Network

Abstract. Diffusion weighted MRI has become ubiquitous in many areas of medicine, including cancer diagnosis and treatment response monitoring. Reproducibility of diffusion metrics is essential for their acceptance as quantitative biomarkers in these areas. We examined the variability in the apparent diffusion coefficient (ADC) obtained from both postprocessing software implementations utilized by the NCI Quantitative Imaging Network and online scan time-generated ADC maps. Phantom and in vivo breast studies were evaluated for two (ADC2) and four (ADC4) b-value diffusion metrics. Concordance of the majority of implementations was excellent for both phantom ADC measures and in vivo ADC2, with relative biases <0.1% (ADC2) and <0.5% (phantom ADC4) but with higher deviations in ADC at the lowest phantom ADC values. In vivo ADC4 concordance was good, with typical biases of ±2% to 3% but higher for online maps. Multiple b-value ADC implementations were separated into two groups determined by the fitting algorithm. Intergroup mean ADC differences ranged from negligible for phantom data to 2.8% for ADC4 in vivo data. Some higher deviations were found for individual implementations and online parametric maps. Despite generally good concordance, implementation biases in ADC measures are sometimes significant and may be large enough to be of concern in multisite studies.


Introduction
The controlled sensitivity of nuclear magnetic resonance, and thus of MRI, to water diffusion provides medical researchers and clinicians a unique tool for measuring microscopic properties of tissue. In the realm of cancer in particular, quantitative diffusion-weighted MRI (DWI) is playing an ever-increasing role in both diagnosis and treatment response monitoring. In addition to providing information about tissue cellularity and microstructure, DWI has the advantages of not requiring the administration of an exogenous contrast agent and of requiring reasonably short acquisition times using standard echo-planar imaging techniques.
The simplest and most commonly used model for describing the MRI sensitive diffusion process is a monoexponential MRI signal decay as a function of the diffusion weighting ("b-value") *Address all correspondence to: David C. Newitt, E-mail: david.newitt@ucsf .edu typically achieved with a pair of field gradient pulses as described by Stejskal and Tanner 1 in 1965. This model assumes Gaussian diffusion behavior in isotropic tissue regions, characterized by an apparent diffusion coefficient (ADC) exponential decay constant. Despite the simplicity of this physical model, its practical implementation requires several choices that could affect the ADC measurements. These include masking of voxels for low signal-to-noise ratio (SNR) or poorness of fit; correction for nonideal imaging factors, such as low SNR effects, scanner nonlinearities, or diffusion weighting inaccuracies; and, for multi-b-value analysis, the choice of fitting algorithm may also be a source of variability.
For validation, reproduction of results, meta-analyses in multicenter studies, and consistency across multiple exams in longitudinal studies, it is essential that different analysis implementations (AIs) produce concordant results. Numerous studies have been published addressing repeatability and reproducibility of ADC measurements, mostly addressing the important aspects of acquisition repeatability 2,3 and intra-and interreader reproducibility. 4,5 For this work, the Image Analysis and Performance Metrics Working Group of the NCI Quantitative Imaging Network (QIN) 6 undertook the ADC Mapping Collaborative Project (ADC-CP) to determine the effects of software platform and algorithm choices on ADC measurement through the analysis of common datasets by multiple institutions. The overall goal of the project is to quantify the cross-platform concordance of DWI parametric mapping software implementations. In this study, we present the results for ADC analyses performed on phantom and in vivo breast DWI, along with evaluation of the feasibility of centralized analysis of multicenter generated DWI parametric maps.

Materials and Methods
Overview: The ADC-CP was initiated and coordinated by the Breast Imaging Research Program (BIRP) at the University of California San Francisco (UCSF). Participants performed a prescribed set of DWI analyses on a common set of in vivo and phantom MRI datasets, generating derived parametric maps. These were submitted to the BIRP for centralized regionof-interest (ROI) and statistical analysis. Where available, parametric maps generated at scan time by on-scanner, manufacturer-provided software ("online" maps) were included in the central analysis.

Common DWI Datasets
Three groups of DWI datasets were analyzed in the ADC-CP: two b-value in vivo breast scans (Br2b), four b-value in vivo breast scans (Br4b), and four b-value phantom scans (Ph4b). Analysis metrics and MRI diffusion protocol details for all data are summarized in Table 1. All in vivo datasets were from the IRB approved American College of Radiology Imaging Network (ACRIN) 6698 trial 7 and were used with the permission of ACRIN. In vivo image files were deidentified as per the requirements of the Health Insurance Portability and Accountability Act [Digital Imaging and Communication in Medicine (DICOM) standard, supplement 142], while preserving private metadata attributes necessary for DWI processing. DICOM images were curated and shared via the Cancer Imaging Archive. 8 Each protocol group included scans from three MRI scanner manufacturers: Siemens Medical (SM), Philips Medical (PM), and General Electric Healthcare (GEHC). In vivo scans were multislice axial acquisitions with full biaxial breast coverage using standard two-dimensional (2-D) singleshot echo-planar imaging sequences. Group Br2b consisted of three studies: ID101 (GEHC, Signa HDxt, 3.0 T), ID102 (PM, Intera, 3.0 T), and ID103 (SM, Avanto, 1.5 T). Group Br4b consisted of four studies: ID201 (GEHC, Signa HDxt, 3.0 T), ID203 (GEHC, Signa HDxt, 1.5 T), ID205 (PM, Achieva, 1.5 T), and ID207 (SM, Avanto, 1.5 T). For all in vivo scans, a single b ¼ 0 image was acquired and non-0 b-value images were acquired with three orthogonal diffusion encoding directions. For all cases except ID203, standard on-scanner processing was used, resulting in trace images for each non-0 b-value and online generated ADC maps, and only the trace images were available for analysis. For ID203, the full set of directional DWI images was preserved, and no trace images or online ADC map were calculated.
The Ph4b datasets were of a diffusion phantom designed and constructed by the National Institute of Standards and Technology (NIST) and High Precision Devices (HPD Inc., Output parameters: ADC hni : monoexponential ADC using all hni b-values; ADC hi-low : monoexponential ADC using only highest and lowest b-values; ADC slow : monoexponential ADC using three highest b-values; and PerfFrac: fraction of b ¼ 0 signal attributed to fast-decaying perfusion component. Boulder, Colorado). 9,10 This phantom consisted of an array of 13 20-mL vials in a spherical vessel filled with an ice-water mixture to maintain a controlled temperature of 0°C. Three vials were filled with water and ten vials were filled with solutions of the polymer polyvinylpyrrolidone (PVP) in deionized water, 11 with two vials each at PVP mass fractions of 10%, 20%, 30%, 40%, and 50%. ADC values ranged from ∼1.1 to 0.12 × 10 −3 mm 2 ∕s. Scans were multislice coronal acquisitions at 3.0 T, using standard 2-D single-shot echo-planar imaging sequences. Diffusion encoding was applied on three orthogonal axes, with reconstruction of standard trace images at each b-value. Only the trace images were provided for analysis. Three datasets were provided: ID401 (GEHC, Discovery MR750, Memorial Sloan-Kettering Cancer Center, New York, New York), ID402 (SM, Trio, University of Colorado, Boulder, Colorado), and ID403 (PM, Ingenia, University of Michigan, Ann Arbor, Michigan). All phantom images used in this study were obtained by the DWI task force of the Quantitative Imagining Biomarker Alliance (QIBA) of the Radiological Society of North America (RSNA).

ADC-CP Parametric Maps
For the purpose of the ADC-CP, the basic monoexponential decay model for the MRI signal intensity from an isotropic tissue region was assumed E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 3 2 6 ; 5 3 5 (1) where SðbÞ is the signal intensity at a diffusion weighting b, S 0 is the true signal for no diffusion weighting, and ADC is the apparent diffusion coefficient. For practical considerations, methods for the derivation of the estimated ADC from a DWI acquisition can be separated into two cases: two b-value analyses wherein the ADC is solved explicitly via the following equation: Multi-b fitting methods: NLS-GX ¼ nonlinear least squares using gradient expansion, NLS-TRF ¼ NLS using trust-region-reflective, NLS-LM ¼ NLS using Levenberg-Marquardt, and log-linear ¼ linear fit or regression of logðSÞ. Base software package function name is given where known.
Journal of Medical Imaging 011003-3 Jan-Mar 2018 • Vol. 5(1) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 7 5 2 ADC ¼ flog½Sðb1Þ − log½Sðb2Þg∕ðb2 − b1Þ; (2) and multi-b-value analyses where fitting of the data to Eq. (1) must be done to determine the ADC. The choice of algorithm for fitting multi-b-value data, as well as the choice of any masking parameters, was left to the participating sites. Site analysis consisted of generating a set of parametric maps from pixel-by-pixel analysis of each DWI dataset. Analyses performed for each data group are listed in Table 1. For all cases, a monoexponential ADC map utilizing all images was computed: ADC 2 for Br2b, and ADC 4 for Br4b and Ph4b groups. In addition, for the Br4b group, a perfusion minimized analysis was performed. 20 For this analysis, the three nonzero b-values were used to estimate the "slow" or tissue diffusion signal using Eq. (1) for b ≥ 100 s∕mm 2 , giving S 0slow and ADC slow as the fitted parameters characterizing the slow signal decay. The fraction of the signal attributable to a fast-decaying perfusion component was then calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 6 3 ; 5 5 4 P f ¼ ½Sð0Þ − S 0slow ∕Sð0Þ; ( and parametric maps were generated for ADC slow and P f . For the Ph4b group, a two b-value decay coefficient, ADC hi-low , was also calculated using only the b ¼ 0 and 2000 s∕mm 2 images. In addition to the parametric maps provided by the analysis sites, scanner manufacturers' software ("online") ADC maps were evaluated when they were provided with the original DWI data. This included the ADC 2 for the Br2b group, ADC 4 for the Br4b datasets with trace images (three of four studies), and ADC 4 for the Ph4b group.

Centralized ROI Analysis
All parametric maps were submitted to UCSF through a secure box system. No restrictions were placed on the choice of file format, and formats included DICOM (N ¼ 7), Neuroimaging Informatics Technology Initiative (NIfTI; N ¼ 2), 21 Nearly Raw Raster Data (NRRD; N ¼ 1), 22 Analyze (Mayo Clinic; N ¼ 1), 23 and MATLAB ® (N ¼ 4). Prior to concordance analysis, all maps were converted to a UCSF in-house modified multiframe DICOM format allowing integer or floating point data, along with storage of an analysis mask. Slice order was detected automatically for file formats that do not include orientation information and was reversed if necessary to match the slice order of the source images. ADC scaling was detected automatically by comparison with a reference UCSF ADC map, and scaling factors were set in the metadata (DICOM rescale slope attribute) to produce ADC maps in common units of 10 −6 mm 2 ∕s. No manipulation of the actual map pixel data was done except for floating point formats (MATLAB ® implementations) in which pixels with a "not-a-number" value were reassigned to 0.0 and masked out for analysis.
ROI analysis was performed using standardized ROIs across all parametric maps (Fig. 1). For the in vivo breast cancer scans, a multislice, whole-tumor region defined for use in the primary study was used. For the phantom scans, ROIs were defined on the middle slice of each scan using 1-cm-diameter circular regions on each of the 13 sample vials. ROIs were applied to the parametric maps yielding mean values of the diffusion metrics for each analysis platform. All centralized analysis was done using software developed by the UCSF lab in IDL ® .

Statistical Analysis
For each metric, pairwise within-subject coefficient of variation (wCV) was calculated between all implementation pairs to establish groups of implementations with similar results (intragroup wCV < 0.1% between all AI pairs). As no ground truth values could be established for the in vivo assessed DWI metrics, individual implementation concordance could only be evaluated from the percent difference of each ROI measurement from a consensus reference value for that measurement. This method was also used for the phantom scans even though reference ADC values were available, both for consistency of presentation and to avoid complications from scanner-and position-dependent ADC effects. A full analysis of the phantom ADC data relative to the ground truth reference values is presented by Malyarenko et al. 24 Reference value calculation for each of the metrics is described in Sec. 3. The two-tailed student's T-test was used to test for significant differences among different implementations.

Practicalities
From the 12 participating institutions, monoexponential ADC maps for the Br2b and Br4b groups and perfusion minimized ADC slow values for Br4b were provided for 13 analysis platforms. Nine platforms from eight institutions also provided perfusion-fraction maps for the Br4b group. The Ph4b data group was analyzed on 11 platforms, 10 generating both ADC 4 and ADC hi-low parametric maps while one provided only ADC 4 . All sites were able to process DICOM image sets from all three vendors, but interpretation of the no trace, full directional data (Br4b, ID203) was challenging for several sites due to unfamiliarity with this format. After specification of the image storage order for this case, all sites were able to program their implementations to process this data, though in some cases we noted discrepancies in the results as shown in Sec. 3.2.

Breast Scans
For the Br2b ADC 2 metric, a majority of the AI (11 of 13) gave essentially identical results (maximum wCV < 0.003%). For each dataset, the median ADC value from all offline results was used for the reference value for concordance. Figure 2 shows the percent difference from these reference values for each AI's mean ROI ADC 2 measure for each of the three Br2b scans. AI-MAT3 had a consistent 0.12% positive bias relative to the median, while AI-Aegis varied from −0.04% to −0.06%. The GEHC and PM online maps were within 0.05% of the respective median values, but the SM map had a −1.4% bias.
More variations were observed among platforms in the Br4b analyses. Figure 3(a) shows graphically the pattern of agreement among platforms given by the pairwise wCV measures. A majority of implementations (9 of 13) fell into two groups when using a threshold of wCV < 0.1% among all group members. Group A consisted of three AI (AI-IDL, AI-3DSl1, and AI-3DSl2) with wCV < 0.01%, while group B consisted of six AI (AI-MAT2, AI-MAT3, AI-MAT5, AI-OsX1, AI-C++, and AI-Aegis) with wCV < 0.1%. For each dataset, a reference value was calculated as the average of the mean value for group A and the mean value for group B. Figure 3(b) shows the percent difference from these reference values for ADC 4 from each implementation for the Br4b datasets. ADC 4 values differed significantly between groups A and B [2.8% AE 0.2% (mean AE SD), p < 0.003], and up to 5% between nongrouped sites. Two of the four nongrouped implementations, AI-MAT4 and AI-MAT6, had only small variations (wCV < 0.13%) from the group B values, while AI-MAT1 and AI-OsX2 showed more variability both between scans from different vendors and from the reference values. Two implementations, AI-MAT1 and AI-MAT4, had slightly anomalous results for ID203 (GEHC), believed to be due to different handling of the full directional diffusion data. Scanner-generated ADC 4 maps were available for the three datasets with trace images. GEHC and SM maps gave mean ROI ADC values of þ3.6% and −3.3%, respectively, from   Fig. 2 Concordance of two b-value in vivo ADC measurements across 13 offline AIs and online scanner-generated maps. Plotted is the percent difference for each ROI mean value from the median value for that measurement for all offline AI. Eleven offline AI had essentially identical results (wCV < 0.003%) and thus show no offsets on the plot. The SM online ADC had a −1.4% bias relative to the consensus median value. the reference values, while the PM online map had a 28% offset. Further investigation revealed that this large deviation was due to loss of the DICOM rescale slope data employed by PM for parametric map intensity scaling. This loss appeared to have occurred during data transfer between the scanner and the imaging site's PACS system. Results for the perfusion minimized analysis tissue ADC (ADC slow ) were similar to the ADC 4 results [Figs. 4(a) and 4(b)]. For wCV < 0.1% grouping, AI-IDL switched from group A to B, and AI-MAT6 was also now included in group B. Overall differences were generally smaller than for ADC 4 but still statistically significant: 1.2% AE 0.2% (mean AE SD, p < 0.003) difference between groups A and B and maximum individual differences of any implementation < AE 1.3% relative to the reference value. Perfusion fraction (P f ) was a nonstandard metric and was implemented on nine platforms. Two groups were again evident, though with different membership [Fig. 4(c)]: group A (wCV ¼ 0.04%) composed of AI-IDL and AI-MAT2 and group B (wCV < 0.01%) with MATLAB ® implementations AI-MAT3, AI-MAT5, and AI-MAT6, with a small difference among the groups [0.29% AE 0.10% (mean AE SD), p < 0.03]. Figure 4(d) shows the concordance for the P f metric results. P f results from AI-MAT1 showed large deviations (−16% to −23%) from the consensus reference, indicating possible errors in the software implementation that was developed on-site for this CP. AI-C++ had a positive bias of 1.5% to 2.5%, which was found to be due to implementation of a biexponential decay model for this calculation. All other measures fell within AE0.25% of the reference values, except for the AI-MAT4 result for the GEHC directional diffusion dataset with a −0.9% deviation. No online parametric maps were available for the perfusion minimized analysis.

Phantom Scans
Analyses of the three Ph4b phantom datasets, ID401 (GEHC), ID402 (SM), and ID403 (PM), were submitted from 11 AI for the four b-value ADC 4 metric and 10 AI for the two b-value ADC hi-low . For AI-C++, only the ID402 results were included, as a problem in the DICOM encoded ADC maps for ID401 and ID403 resulted in incorrect ROI ADC values in the centralized analysis. In a separate analysis completed after the encoding bug was fixed, these results were in concordance with the other implementations. 24 Online maps for ADC 4 were available for all three phantom datasets, but only ID403 (PM) included an online map for ADC hi-low . Results for the two b-value ADC hi-low were practically identical across all implementations. The maximum pairwise wCV among postprocessing implementations using all 39 ROI measurements from the three datasets was 0.04%. Looking at the percent difference of each ROI measure from the nine site median values, AI-QIBA showed a similar clinically insignificant bias (0.05%) to that seen in the Br2b datasets for AI-MAT2. The results from the online PM ADC hi-low map were very close to the offline reference Fig. 4 wCV and ROI mean concordance results for the Br4b data group perfusion minimized analysis. For ADC slow , (a) shows the pairwise wCV matrix with groups with wCV < 0.1% indicated and (b) the corresponding data for differences in mean ROI ADC slow . Group results showed smaller variations than for ADC 4 . For P f , (c, d) groups were less well defined, except for the three MATLAB ® AI indicated, which were nearly identical (wCV < 0.01%). The small positive biases for AI-C++ were identified as due to use of a biexponential model. For the ADC 4 measures, paired wCV measurements over all phantom measurements gave similar groups to the Br4b results. Differences within and between the two postprocessing implementation groups were smaller than for the breast scans. The maximum wCV was 0.04% for group A (AI-IDL, AI-Sl1, AI-Sl2, and AI-MAT6) and 0.01% for group B (AI-MAT2, Al-QIBA, Al-OsX1, and Al-MAT5), and the between-group root mean square percent difference in ADC values for all 13 ROIs was 0.29%, 0.30%, and 0.62% for GEHC, SM, and PM scans, respectively. There was no significant bias among the ROI mean ADC values from the two groups (p ¼ 0.15, 0.07, and 0.19 for GEHC, SM, and PM scans, respectively). Figure 5 shows the differences from reference ADC 4 (average of the mean group A and mean group B results) for the three Ph4b datasets. While differences are in general very small (<0.5%), individual excursions were as high as 5.5%, with the highest differences on the lowest 2 ADC values (ADC < 0.25 × 10 −3 mm 2 ∕s). Only the SM online map showed a statistically significant deviation from the reference values, with a small negative bias of −0.31% AE 0.25 (mean AE SD, p ¼ 0.001). Only the PM dataset analysis showed a trend with ADC value in the difference among the analysis groups, with group A tending to underestimate ADC relative to group B for higher ADC values and over estimate at lower. A linear regression of the percent difference between the groups versus the mean ROI ADC gave a slope of 1% per 1.0 × 10 −3 mm 2 ∕s with R 2 ¼ 0.35.

Summary and Discussion
Overall, the QIN ADC Mapping Collaborative Project demonstrated good agreement between the majority of postprocessed ("offline") and scanner-generated ("online") ADC implementations, while revealing several sources of discrepancies among different platforms. With the exception of isolated outliers, mostly attributable to metadata errors rather than algorithmic differences, the largest discrepancies observed were between online and offline parametric maps. The most consistent bias was for Siemens scanner acquisitions, where the online maps gave ADC values lower than consensus reference values derived from the offline maps. These ranged from −0.3% (phantom 4b) to −1.4% (in vivo 2b) to −3.5% (in vivo 4b). Based on communication with Siemens, the most likely explanation is the use by the online ADC algorithm of detailed image sequence information to calculate a more accurate b-value than the nominal value stored in the DICOM metadata, which is used for all offline calculations. A higher true b-value, obtained by accounting for diffusion and imaging gradient cross terms, will result in a lower calculated ADC value, as we observed. The General Electric online maps for the in vivo four b-value ADC also showed a marked discrepancy from the consensus reference (þ3.5%), though it agreed identically with one of the offline implementations (AI-OsX2, OsiriX IB Diffusion plugin).
The biases we report for the in vivo breast scans are of comparable magnitudes to measures of repeatability and reproducibility reported in breast ADC studies. Aliu et al. 2 reported a wCV of 11% in a repeatability study on normal volunteers, while Spick et al. 5 and Clauser et al. 25 found wCV values between 5.0% and 8.5% for breast tumor ADC measurements. In the ACRIN 6698 trial, whole-tumor ADC test-retest repeatability was 4.8%. 3 Our results indicate that choices in ADC analysis algorithm or between online and offline analysis platforms will have nonnegligible effects on breast ADC measures and should be considered in addition to biases arising from image acquisition when interpreting findings in breast DWI studies.
A consistent finding was a grouping of a majority of the implementations for multi-b ADC estimation into two groups with very similar results within-group but significant differences between the two groups. Based on the descriptions of the methods provided by each site, this appeared to be primarily driven by the choice between "log-linear" fitting, wherein a linear least-squares fit is done on the log of the image intensities, and a nonlinear least-squares fit of the untransformed data to the exponential diffusion equation. For the in vivo scans, the difference in implementations resulted in significant differences (p < 0.003) of 2.8% for ADC 4 and 1.2% for the ADC slow in the perfusion minimized analysis. Our results are comparable to those reported by Zeilinger et al. 26 using different methods. While the grouping based on pairwise wCV was also apparent in the four b-value phantom ADC 4 , no significant difference was found for the resulting ADC measures (p ¼ 0.22). We speculate that this may be due to the higher noise level and heterogeneity within each ROI in the in vivo scans giving a greater sensitivity to the fitting algorithm selection, but further work is needed to identify the cause. Finally, given the lack of ground truth values for the in vivo scans, it is important not to equate discrepancies with errors in the presented work, except in those cases where specific error sources could be identified. In particular, while the choice of reference values for most of our ROI result plots as the average of the two prominent AI groups allows easy visualization of the differences between the AIs, it also can lend an appearance of preference to those AIs over the "nongrouped" results.
The QIN ADC Mapping CP also highlighted some practical challenges of multicenter ADC analyses and centralized analysis of postprocessed parametric maps. For example, several sites had to implement code for the ADC-CP to analyze the less common full directional dataset, which may have resulted in somewhat higher variability in the results for those scans. While saving of directional data for DWI is not currently a common practice in clinical trials, it may become more so in the future given ongoing work on improving reproducibility of multiplatform DWI by gradient nonlinearity correction [27][28][29] and distortion correction. 30 Another lesson learned was the criticality of preservation of DICOM metadata for quantitative DWI. In particular, the case of lost scaling information in a Philips scannergenerated ADC map illustrates that significant errors can result from metadata corruption. While the nature of this project resulted in easy recognition of this problem, in a clinical trial setting, it might have gone unnoticed. Finally, the centralized analysis of parametric maps for this CP was greatly complicated by the multitude of file formats currently employed for the storage of these objects. Adoption of a common format, such as the parametric map DICOM object, 31 would aid metaanalysis of ADC data obtained from multicenter studies. Use of DICOM, specifically for ADC map storage, was addressed in a companion cooperative project. 24 A limitation of this study was the restriction to the monoexponential decay model, with the simple extension to a perfusion minimized ADC slow ∕P f calculation. For in vivo situations where the simple Gaussian diffusion model breaks down, several more complex models are currently employed such as biexponential models, 32 including intravoxel incoherent motion, 33,34 stretched exponentials, 35 and kurtosis. 36,37 As model complexity increases, dependency on AI choices will also increase. An additional limitation of this study stems from the choice of a single organ, the breast, for the in vivo datasets. As breast DWI is challenging, due largely to limitations in SNR, fat suppression quality, motion, and other artifacts, we consider these datasets a challenging test of the fitting algorithms' robustness. However, the results presented are only indirectly relevant to other applications, such as neural and abdominal imaging.
In conclusion, we found that while agreement among the majority of ADC mapping implementations was good, the biases in in vivo ADC measures both between different offline implementations and between vendor-generated and offline maps are significant. Furthermore, these differences may, in some cases, be large enough to adversely affect the analysis of multisite diffusion data. For any given longitudinal (e.g., treatment response) or cross-sectional study, we would recommend that all analyses be performed on a common platform and that the output parametric map metadata reflect both the DWI data origin and the details of the applied calculation algorithm.

Disclosures
Jayashree Kalpathy-Cramer is a consultant for Infotech Soft. Jiachao Liang is an employee of Hologic Inc. Maggie Fung is an employee of GEHC. Kathleen Schmainda has ownership interest in Imaging Biometrics LLC. Other coauthors have nothing to disclose. Contribution of NIST is not subject to copyright in the United States. Certain commercial equipment, instruments, and software are identified in this paper to foster understanding. Such identification does not imply recommendation or endorsement by the NIST, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.
David C. Newitt is a research specialist and an assistant director of the Breast Imaging Research Program at the University of California (UC), San Francisco. He received his PhD in solid state physics under Dr. Erwin Hahn at UC Berkeley in 1993, and has worked on MRI at UCSF since then. His current focus is on use of MR-DWI and DCE for treatment response monitoring of invasive breast cancer.
Paul E. Kinahan is a professor and vice chair for research in the Department of Radiology, University of Washington, with joint appointments in radiation oncology, physics and bioengineering. He is director of UWMC PET/CT imaging physics and head of the Imaging Research Laboratory. He received his PhD in biomedical engineering in 1994 at the University of Pennsylvania.
Wei Huang is an associate professor/scientist in the Advanced Imaging Research Center at Oregon Health and Science University. He is a magnetic resonance imaging (MRI) physicist by training and has more than twenty five years' experience in MRI and MR Spectroscopy (MRS) research. His current research focuses on imaging of underlying tumor biological functions using quantitative MRI methods for cancer detection and therapeutic monitoring. Thomas E. Yankeelov is the Moncrief professor of computational oncology and professor of biomedical engineering and diagnostics at the University of Texas in Austin. He serves as a director of the Center for Computational Oncology and a director of Cancer Imaging Research. The goal of his research is to improve patient care by employing advanced in-vivo imaging methods for the early identification, assessment, and prediction of tumors' response to therapy.
Nola Hylton is a professor of radiology and biomedical imaging at UCSF and directs the Breast Imaging Research Program. She has been integrally involved in the development of MRI for breast cancer detection and diagnosis for over 20 years and works with academic and industry partners on the clinical optimization of breast MRI technologies. Her current research program focuses on the application of quantitative MRI methods to characterize breast cancer response to treatment.
Biographies for the other authors are not available.