Preterm neonatal lateral ventricle volume from three-dimensional ultrasound is not strongly correlated to two-dimensional ultrasound measurements

Abstract. The aim of this study is to compare longitudinal two-dimensional (2-D) and three-dimensional (3-D) ultrasound (US) estimates of ventricle size in preterm neonates with posthemorrhagic ventricular dilatation (PHVD) using quantitative measurements of the lateral ventricles. Cranial 2-D US and 3-D US images were acquired from neonatal patients with diagnosed PHVD within 10 min of each other one to two times per week and analyzed offline. Ventricle index, anterior horn width, third ventricle width, and thalamo-occipital distance were measured on the 2-D images and ventricle volume (VV) was measured from 3-D US images. Changes in the measurements between successive image sets were also recorded. No strong correlations were found between VV and 2-D US measurements (R2 between 0.69 and 0.36). Additionally, weak correlations were found between changes in 2-D US measurements and 3-D US VV (R2 between 0.13 and 0.02). A trend was found between increasing 2-D US measurements and 3-D US-based VV, but this was not the case when comparing changes between 3-D US VV and 2-D US measurements. If 3-D US-based VV provides a more accurate estimate of ventricle size than 2-D US measurements, moderate–weak correlations with 3-D US suggest that monitoring preterm patients with PHVD using 2-D US measurements alone might not accurately represent whether the ventricles are progressively dilating. A volumetric measure (3-D US or MRI) could be used instead to more accurately represent changes.


Introduction
Posthemorrhagic ventricle dilatation (PHVD) is characterized by an enlargement of the cerebral ventricles and commonly occurs in preterm neonates with moderate to severe intraventricular hemorrhages (IVHs), grades II-IV, as graded by the clinical standard system. 1 PHVD can lead to lifelong neurological complications such as cerebral palsy as well as visuospatial disorders and lower developmental quotient. 2,3 Transfontanel cranial ultrasound (US) is sensitive and specific for detection of PHVD and is used clinically to diagnose and monitor this condition. 4 However, 2-D images cannot provide accurate volume measurements, thus ventricle volume (VV) is often estimated and monitored qualitatively. To address this, several quantitative measurements [ventricle index (VI), anterior horn width (AHW), thalamo-occipital distance (TOD), third ventricle width (3rd)] have been derived for 2-D US images. 5 These measurements rely on the linear widths of ventricles to estimate the changes in VV as the disease progresses. Such measurements have not been standardized across neonatal intensive care units (NICUs), therefore, there is a wide variation in the interpretation of 2-D USbased measurements' utility. This is further confounded by the user-dependent nature of 2-D US as the technician only records some image planes, furthermore, there has been no consensus on what should be measured by a radiologist to best illustrate VV. 6 The "gold standard" to quantify VV makes use of threedimensional (3-D) MR images, but can only be acquired once the patient is stable enough to be transferred to the imaging suite -often weeks after initial diagnosis and intervention for PHVD. Indeed, only a few studies directly comparing 2-D US and MRI measurements of the ventricles have been done. 7,8 Specifically, Horsch et al. 7 found that the AHW was strongly correlated with MRI VV (R 2 ¼ 0.88) when preterm born infants with varying ventricle sizes were imaged at term equivalent age.
In contrast to the "gold standard" MRI-based measurements, 3-D US images can be attained multiple times per week-early after birth-with limited adverse effects to the patient, as the imaging procedure is done within the incubator and takes only a few minutes to complete. 9 Additionally, serial 3-D US is far less expensive than longitudinal monitoring with MRI.
3-D US imaging studies have been able to attain VV measurements from preterm neonates with both normal ventricles and those with mild ventricle dilatation throughout the course of stay in the NICU. [10][11][12][13][14] Recently, a study found that 2-D measurements can be made using a reconstructed 3-D US image with reasonable interobserver variability and with no significant differences from measurements made on 2-D US images from the same day, 9 and another study has found excellent agreement for the diagnosis of different neonatal pathologies using 3-D US images compared with 2-D US obtained at the same time. 15 Additionally, when directly comparing VV made from 3-D US images and MRI, high-intraclass correlations (ICC) were found (ICC ¼ 0.92) in term born patients with mild ventricle dilation. 10 We have previously found very high correlations between MRI and 3-D US VV in preterm born infants with PHVD when imaged at term equivalent age (R 2 ¼ 0.99). 14 However, in most centers, even in those with 3-D US imaging capabilities, to guide treatment and therapy of patients with enlarged ventricles, the only quantitative measurements of ventricle size routinely performed are 2-D US measurements. While there are commercially available 3-D US probes that perform well, the high cost may limit their use. We have previously developed and validated a low-cost 3-D US system. 14,16 Since this system acquires images using the same 2-D US transducer used in clinical scans, we are able to make direct comparisons with clinical images and the estimates of ventricle size from those images.
In this paper, we compare 3-D US VV with previously reported 2-D US-based measurements made on the same day. Additionally, we compare the change in 2-D US measurements with the changes in VV estimates from 3-D US since serial changes in 2-D US could determine whether treatment is performed on a neonate with symptoms of increased intracranial pressure stemming from PHVD.

Selection of Subjects
The research protocol was approved by the local human Research Ethics Board as part of a larger study investigating the development of patients with IVH. Patients with a positive diagnosis of IVH on an initial clinical head US exam were prospectively recruited following informed consent from their parents. Once enrolled, patients underwent serial US exams one to two times per week until discharge from the NICU or transfer to a secondary care center. Infants with congenital brain abnormalities were excluded from the study. Decisions on when to intervene in patients with severe PHVD were based on clinical assessment of the patient (neurological exam, palpation of the fontanelle, increase in head circumference measurements, monitoring for increased spells of apnea, and bradycardia) and qualitative viewings of 2-D US images. The care team was blinded to 3-D US images and measurements.

Two-dimensional ultrasound images
Patients were imaged by a trained US technician who performed a 2-D head US exam prior to a 3-D US scan. Imaging was performed at the bedside in the NICU using an HDI 5000 US system (Philips, Bothel, Washington) and a C8-5 curved array 5 to 8 MHz broadband transducer (Philips). 2-D US exams were performed using a standard technique, acquiring images in coronal and sagittal planes through the anterior fontanelle, with a few screen captures chosen by the technician over the relevant landmarks as per the standardized exam. 17 The 2-D US exams required approximately 10 to 15 min. As the neonates' head's grow during the study (both due to normal development and PHVD), depth settings were adjusted accordingly to capture the ventricle system. As such, images were acquired between 7.1 and 9.9 cm in depth and 2-D US images were 640 × 476 pixels from 0.0200 × 0.0200 mm 2 per pixel for 7.1-cm depth to 0.0266 × 0.0266 mm 2 per pixel for a 9.9-cm depth.

Three-dimensional ultrasound images
3-D US images were acquired using a 3-D US motorized attachment developed in our laboratory, which housed the US transducer [ Fig. 1(a)]. Technical details of the 3-D US system were previously described. 16,18,19 The 3-D US images were acquired by locating the midline in the sagittal plane, then firmly holding the motor encasement to allow the device to mechanically tilt the transducer about the axis at the probe tip along parasagittal slices, while the technician's hand remained as still as possible.
This image can be viewed through any reconstructed plane (i. e., sagittal, axial, coronal, or oblique) using a multiplanar reformatted 3-D visualizer [see Fig. 1(b)] at the bedside to determine that the full VV was acquired with limited motion from either the patient or the technician during the scan. 20 An example of motion artifact that would require additional scans can be seen in Fig. 2(a), whereas Fig. 2(b) shows the same patient's "acceptable" 3-D US image from the same scanning session. Imaging specifications are such that 2-D US images were acquired with an angular spacing of 0.3 deg at 25 frames∕s over a scan angle of 60 deg to 72 deg, making total 3-D image acquisition between 8 and 9.6 s. 3-D US images ranged from 300 × 300 × 300 to 400 × 400 × 400 pixels with voxel sizes ranging from 0.2 × 0.2 × 0.2 mm 3 for 7.1-cm depth to 0.24 × 0.24 × 0.24 mm 3 for 9.9-cm depth. The same depth settings were used in 2-D and 3-D images on the same day for the same patient. Under ideal circumstances, a single sweep of the mechanical tilting device is able to capture the entire ventricular system in 3-D US. Generally, an acceptable image [ Fig. 2(b)] was acquired in two to three sweeps of the device, taking between 2 and 10 min for the entire 3-D US examination. Some patients with extremely dilated ventricles required acquisition of two separate 3-D US images to capture the entire ventricular system, one of the right lateral ventricle and one of the left.

Two-dimensional ultrasound
Two observers (FS and WR) analyzed the 2-D US images and reported the VI and AHW for the left and right lateral ventricles as well as the third ventricle width (3rd) and the largest TOD. Reporting a single TOD was the standard at our institution and was used as there were often large differences between the left and right sides, with one much larger than the other. 21 Therefore, to attain the best correlation possible, we used the largest TOD, as we believed this would be a better representative of both VV and change in the VV. Measurements are described in detail in Davies et al. 5 and examples are shown in Fig. 3. The differences (Δ) obtained between consecutive time points (i.e., between scans 2 and 3 and scans 3 and 4, but not between 2 and 4) were calculated for all 2-D US-based measurements.

Three-dimensional ultrasound
VV was obtained by manually segmenting the ventricles from the 3-D US images by a single trained observer (JK) and verified by an experienced clinician (SdR). Due to the high volume of images to be segmented (N ¼ 255), each image was only segmented once in order to mitigate user-fatigue, which can cause an increase in variance of segmentation. 22 The segmentation was performed on parallel sagittal slices on the reconstructed 3-D image. The manually segmented boundaries included the intraventricular blood clot as well as the choroid plexus, but not porencephalic cysts if they were present. Slice spacing was set at 0.5 or 1 mm depending on the amount of ventricular dilatation. Patients with smaller ventricles (that could be captured in a single image) required smaller slice spacing (0.5 mm) due to the highly irregular shape, as well as the thinness of the anterior and temporal horns. Patients with severe PHVD who required two 3-D US images to fully capture the ventricles (one image for right and one for the left lateral ventricle) had slice spacing set at 1 mm as the ventricles were much less irregular in shape and the segmentation of these patients required the most time and most contours to segment for determination of VV. For these patients, each of the left and right lateral VVs were segmented from separate images and added together. Each image set required an average of 20 min to segment, with 30 to 70 contours segmented per 3-D US image set. Figure 4 shows the 3-D US segmented image and volume rendered ventricle surface for a patient with mild ventricle dilatation as well as a patient with PHVD. The difference in 3-D US VVs (ΔVV) between successive time points was calculated and recorded.

Observer Agreement in Reported Two-Dimensional Ultrasound Measurements
As there were two observers (FS and WR) for the 2-D US images, a small subset of the images was measured by both observers to determine measurement agreement. Seventeen  images were analyzed, and they were all from different neonates with varying severities of IVH and ventricle dilatation. VI and AHW for the left and right lateral ventricles as well as the 3rd and the largest TOD were reported by the two observers.

Two-Dimensional Ultrasound Measurements Made on Two-Dimensional Ultrasound Images
In order to further validate our 3-D US system, VI, AHW, and TOD were measured on approximately the same plane on the 3-D US images as would be used for 2-D US. A 3rd was excluded as images were taken too close to the midline to visualize this structure in most "acceptable" 3-D US images. A single observer (FS) performed measurements on both the 3-D US images and time-matched 2-D US images. Twenty-eight image sets from different neonates with varying severities of IVH and ventricle dilatation were analyzed for this study.

Statistical Analysis
Linear regression (R 2 ) was performed using GraphPad Prism 6 (GraphPad Software, San Diego, California). A regression of R 2 > 0.7 was considered strong, 0.7 to 0.5 moderate, and <0.5 weak. Observer agreement between the two observers on the same 2-D US images was performed through absolute agreement, two-way random ICC using SPSS v.20 (IBM Corp, Armonk, New York). Absolute agreement, two-way random ICC was additionally performed to determine agreement between 2-D US images made on both 2-D and 3-D US for the same observer.

Patient Characteristics
Forty-two neonates were enrolled into the study following an initial diagnosis of IVH and subsequent ventricle dilatation between April 2012 and February 2015. The median age of enrollment was 9 days of life (range 4 to 30 days). Patient characteristics are summarized in Table 1. A total of 265 image sets (2-D and 3-D US images acquired on the same day) were collected for this study. Two hundred and fifty-five 2-D/3-D US exams provided all the required US measurements, with eight image sets missing a 3rd measurement and two missing TODs. Scans were performed between 25 and 42 2/7 weeks corrected gestational age (GA) with the median GA at imaging being 30 3/7 weeks (Fig. 5).
To compare changes between images, patients required at least two 3-D/2-D US image sets, and seven patients were discharged to a secondary care center or home after only a single imaging session leaving 35 to be used in the change study, with their 248 images for a total of 213 comparison points with a median of 6.5 days in between scans (range 3 to 46 days). These change scans were over the same GA range as the direct comparisons (25 to 42 2/7 weeks). Of the 35 patients who had >1 imaging session, the median number of images per patient was 7 (range 2 to 14).

Observer Agreement
There was a high ICC (>0.95) between the two observers for all 2-D US measurements, which can be seen in Table 2.

Two-Dimensional Ultrasound Measurements Made on Three-Dimensional Ultrasound Images
There was a high ICC (>0.70) between measurements made by a single observer on both 2-D and 3-D US images for VI, AHW, and TOD, which is presented in Table 3.

Ventricle Volumes Compared with Two-Dimensional Ultrasound Parameters
Linear regressions were performed between 3-D US VV and the corresponding 2-D US-based measurements recorded at the same time. Figure 6 shows the linear regressions (bold line) and 95% confidence intervals (CIs) (thin lines) of VV as measured . This appeared to be due to a very large porencephalic cyst on the left side that was communicating with the ventricular system and was erroneously included in the coronal measurements. This patient was removed from the analysis, the correlation for the left side VI and AHW became 0.65 and 0.69, respectively, and the linear regression line on Fig. 6 reflects this patient being removed from analysis.

Changes in Ventricle Volumes Compared with Changes in Two-Dimensional Ultrasound Parameters
We investigated the differences in 2-D and 3-D measurements between successive imaging sessions as an indication of how ventricular size changes post-IVH. Since there can be substantial differences in rates of change from one patient to the next, and due to the various numbers of imaging points per patient, linear regression between adjacent time points was used instead of multilevel modeling, or ANOVA. We believed that the linear regression on the difference of measurements at two time points would be fairly representative of what would happen on a dayto-day use of measurements when comparisons are made with the previous scans. Figure 7 shows the linear regressions (bold line) and 95% CIs (thin lines) of ΔVV as measured using 3-D US images against the corresponding change in 2-D US-based measurements. The Pearson correlation coefficients found between ΔVV estimated using 3-D US and the measured change in 2-D US parameters were between 0.13 and 0.02.

Discussion
As expected, there was a trend toward increasing 2-D US-based measurements with increased VV (Fig. 6), though there was no 2-D US measurement that "best" correlated with 3-D US VV. Additionally, due to the longitudinal nature of this study, we were able to show that changes in serial 2-D US measurements are weakly correlated to changes in VV and, indeed, there was  not an obvious trend. This is possible due to the variability in 2-D US-based measurements. Previously reported standard deviations of measurements ranged from 2 mm for the AHW 9 to 2.5 mm for the VI and 4 mm for the TOD, 8 whereas the mean absolute change between imaging sessions was 1.4 mm for AHW, 1.2 mm for VI, and 2.7 mm for TOD-less than measurement variability. In comparison, we previously found variance (standard error of measure) to be 0.23 cm 3 for intrauser and 0.24 cm 3 for interuser through a study of 3-D US ventricle segmentation using three different observers. 16 An additional study found variability in 3-D US as measured through standard deviation to be an average of 0.95 cm 3 (range 0.31 to 2.03 cm 3 ) for preterm patients born less than 32 weeks GA with ventricles that ranged in size from 5 to 34 cm 3 . 2 Given that a single observer, who was the one previously used in the prior published work, 16,23 was used in the present study for 3-D US VV, the mean absolute change in VV of 4.1 cm 3 was much higher than the intrauser variability reported in either previous study. Attempting to estimate change in an irregular 3-D volume, such as the lateral ventricles, with a linear measurement obtained from 2-D US has obvious limitations other than high measurement error as well. This could explain why, to our knowledge, no previous papers have found rates of change in 2-D US parameters that predict which patients will go on to develop PHVD and why there is a reliance on qualitative viewings of 2-D US scans to determine if an intervention is required.
A previous study 9 has shown that the 2-D US measurements of the VI and AHW can be made on 3-D US images with reasonable reproducibility and accuracy. While they looked specifically at the AHW and VI, we additionally found reasonable agreement with the TOD between 2-D and 3-D US. We found that the agreement (measured through ICC) is lower when compared between 2-D US and 3-D US than between users for the same 2-D US images, and we believe this could be because the measurements were not able to be made on exactly the same planes. While we are able to reproduce oblique planes using our 3-D US system, the plane selection is still a potential source of error in this study. Further research into how to best select and present parasagittal and paracoronal planes from 3-D US images is necessary to make robust 2-D US measurements as well as indicate other planes of interest such as those with cysts or other abnormalities. This is likely necessary to support translation of 3-D US imaging into the clinical setting.
In contrast to what had previously been shown using MRI VV and 2-D US images by Horsch et al., 7 we did not find strong correlations between VV and linear measurements. This could partially be due to a higher variance in volume measurement made on 3-D US in comparison with MR images as the boundaries are less clear in US images. 14 In Horsch et al.'s MRIbased VV study, they were only able to make one comparison at a single time point per patient at term equivalent age, and only 4 of 28 patients analyzed had ventricle dilatation. As most of the patients in that study would have clustered near the lower "normal" end of VV, the four with ventricle dilatation would have driven the linear regression and strengthened the correlation in that study, especially with such a relatively low sample size. In comparison, in the present study, we had patients with ventricle sizes along the continuum of both normal (2.7 cm 3 ) to very large (96.2 cm 3 ) and included a total of 255 image sets from 42 patients.
An additional strength of the present study is that we were able to make 3-D US-based VV measurements prior to term equivalent age in every preterm neonate studied with varying severities of ventricle dilatation throughout the course of their stay in the NICU as opposed to only when stable enough to be transferred to an MRI suite. This is in keeping with when neonates would be monitored most closely to detect IVH as well as to monitor for PHVD and to determine potential initial interventional therapy. A previous study examining normal preterm patients' VV from 3-D US found that small increases in VV occurred in the preterm period (from medians of 0.25 cm 3 at 25 weeks GA to a median of 0.56 cm 3 at 37 weeks GA). 11 However, the main limitation in comparing those values with the ones reported in our study is that the previous paper excluded the choroid plexus in their segmentation and would make for smaller than actual VV. To our knowledge, that is the only study on normal VV from 3-D US in neonates. In comparison, a review paper noted that over the course of GA 24 to 42 weeks, the VI was the only measurement that increased with GA, whereas AHW and TOD remained fairly constant, with the exception of a single paper, in which the review paper's authors' noted having a remarkably small standard deviation that did not agree with their experience or other published works. 8 In a follow-up paper, 23 this group developed new reference values for VI, AHW, and TOD for patients without US abnormalities born at varying GA (25 to 42 weeks) and found that increased GA was related to increased VI and TOD, but not AHW. However, studying neonates born <30 weeks GA and comparing VI, AHW, and TOD near birth to that at term equivalent age (GA 37 to 40 weeks), the authors found increases over time for all US measurements. 23 Recent studies 6,24 have advised using the >97th percentile of 2-D US measurements as the "action line" for when interventional therapies should begin. While this may provide guidance as to when the first intervention should occur, often patients with severe PHVD will remain over the 97th percentile even after intervention. Thus, subsequent interventions can no longer be guided by 2-D US measurements. The change in 3-D USbased VV as opposed to 2-D US-based measurements might be a better indicator as to when subsequent interventions are required and would provide another metric to those already proposed. Although outside the scope of this study, a future study should be performed using 3-D US VV to determine thresholds for interventional therapy while the patient is in the NICU and during follow-up to determine whether or not a neonate requires shunting, or whether or not the shunt has failed, as there are also no consensus imaging guidelines for that intervention.
The main limitation in our study from a translational standpoint is that the methodology involves manual segmentation of the images offline and does not provide timely data for routine patient care. For 3-D US to become a viable method for determining the VV at the bedside, an automated or semiautomated approach must be developed. This was beyond the scope of this pilot study. Neonates with PHVD pose a particularly challenging segmentation problem as the ventricles change fairly significantly in shape as well as echogenicity over time. For example, the ventricles become rounder over time, and the bleed, initially hyperechoic, slowly breaks down over time and becomes indistinguishable from the hypoechoic cerebrospinal fluid (CSF) (Fig. 8). The change in echogenicity over time makes simple thresholding or local optimization-based methods like active contour or level set methods not useful for this application, while the dramatic change in shape makes atlas-based approaches challenging. Given the inherent variability and prohibitively long segmentation times in manual segmentation, our lab has developed a semiautomated approach based on convex optimization with some encouraging preliminary results, 25 and we have integrated this new software into our acquisition software and are currently piloting this new protocol in a small sample of patients. The semiautomated segmentation algorithm has been validated through a multiobserver study and compared with multiuser manual segmentation. 26 The segmentation time was reduced to approximately 2 min in comparison to an average of 20 min for manual segmentation. In Chen et al., 26 a new framework for future validation of both manual and algorithmbased ventricle segmentation was presented and gave us knowledge on areas of local segmentation variability that were not easily seen using global metrics such as VV agreement, or dice coefficient. In particular, the temporal and posterior horns have the largest segmentation variability in both manual and semiautomated algorithm segmentation, 26 and this information can be used to generate better segmentations in the future. Additionally, an offline, atlas-initialized automated segmentation method 27 was recently developed, though has yet to be integrated into our protocols. Another limitation to direct translation to other centers is the use of a noncommercial system. As our 3-D US has been validated both using phantom 14 and patient images, 16 we would anticipate that there should not be a discrepancy between the VV attained from our system in comparison with a commercial system; however, we were not able to definitively prove this as our center does not have an appropriate commercial 3-D US transducer for this type of imaging. The HDI 5000 US system used in this study is the most easily assessable unit within the NICU at our center and there is no 3-D US transducer within the frequency range for neonatal head US. However, the semiautomated segmentation software we have developed would be able to accommodate 3-D US images from commercial systems, so there is potential for a future study to use this software along with a commercial 3-D US transducer. Any patient/technician motion [ Fig. 2(a)] identified through our 3-D US system could be a problem for a commercial system as well and could cause some additional errors in VV measurement. While beyond the scope of this study, detecting and analyzing motion artifacts during acquisition that both can and cannot be detected upon visual inspection would be interesting future work to both understand how this can be adjusted for and how it impacts volume measurements from 3-D US.
If 3-D US-based measurements of VV are more accurate and precise measurement of ventricle size than 2-D-based measurements, than we have shown that the moderate-weak correlation to 2-D US measurements suggests that 2-D US measurements alone might not be enough to determine the progression of PHVD. This is further shown, given the weak correlations between changes in serial 2-D US to changes in VV for neonatal patients. This could help explain why there is no consensus image-based standard on when PHVD is occurring at a rate that merits intervention. Previous studies 9,15 have shown that the time per exam is reduced when acquiring a 3-D scan instead of 2-D imaging sets, and the diagnostic accuracy is maintained. Therefore, a 3-D US approach would preserve the ability to obtain the clinical standard set of measurements, while potentially enhancing the ability to determine whether or not a new clinical standard could be developed using 3-D US-based VV estimates. We have presented further evidence that 3-D cranial US has potential as valuable tool; however, a future study with a larger cohort would be necessary before guidelines for intervention could be suggested for the clinical management of PHVD. 9