Task-based optimization of flip angle for fibrosis detection in T1-weighted MRI of liver

Abstract. Chronic liver disease is a worldwide health problem, and hepatic fibrosis (HF) is one of the hallmarks of the disease. The current reference standard for diagnosing HF is biopsy followed by pathologist examination; however, this is limited by sampling error and carries a risk of complications. Pathology diagnosis of HF is based on textural change in the liver as a lobular collagen network that develops within portal triads. The scale of collagen lobules is characteristically in the order of 1 to 5 mm, which approximates the resolution limit of in vivo gadolinium-enhanced magnetic resonance imaging in the delayed phase. We use MRI of formalin-fixed human ex vivo liver samples as phantoms that mimic the textural contrast of in vivo Gd-MRI. We have developed a local texture analysis that is applied to phantom images, and the results are used to train model observers to detect HF. The performance of the observer is assessed with the area-under-the-receiver–operator-characteristic curve (AUROC) as the figure-of-merit. To optimize the MRI pulse sequence, phantoms were scanned with multiple times at a range of flip angles. The flip angle that was associated with the highest AUROC was chosen as optimal for the task of detecting HF.

Task-based optimization of flip angle for fibrosis detection in T1-weighted MRI of liver 1 Introduction Chronic liver disease (CLD) is a widespread health concern that represents a common disease pathway for a number of important etiologies, including nonalcoholic steatohepatitis (NASH), alcoholic cirrhosis, and viral hepatitis. 1,2 These diseases lead to inflammation and damage, usually first involving the portal triad region surrounding the hepatic lobules, resulting in the deposition of collagen scar tissue in the extracellular matrix (ECM), a process diagnosed as hepatic fibrosis (HF). 1,3-5 HF is the hallmark of CLD. 2,[6][7][8] Monitoring for the presence of HF and staging (quantifying) the severity and progression over time are essential for the diagnosis and therapeutic management of CLD.
The current reference standard for CLD diagnosis and HF staging is needle biopsy of the liver. 6,[9][10][11] Biopsy provides cellular-resolution images that make it possible for a pathologist to identify fibrotic tissue and stage severity. When providing a diagnosis of HF, a pathologist will report severity using a numerical staging system based on one of several alternative scoring methods. Two of these techniques are the "Ishak score," using a seven-point scale, and the "METAVIR score," which uses a five-point severity scale. 3,9 Each scale has metrics for determining the severity of HF and each institution or medical organization adopts a particular scale.
Needle biopsy can provide diagnostic specificity for HF, but the technique suffers from multiple drawbacks. 2,8,10,12,13 The sample recovered in needle biopsy is ∼1 mm 3 and is used to determine the health of an organ ∼50;000 times larger than the sample's volume; CLD is a nonuniform disease affecting different regions to different degrees, making biopsies prone to volume sampling errors. Additionally, pathologists require extensive training to make a quantitative assessment of HF, and there is a significant variance between scores, 6,9,10,14,15 either between pathologists within the same center or between centers, which may be aggravated when different scales are used. There is also risk for the patient, who must undergo an invasive procedure that has potential complications, including pain, bleeding, and/or infection.
HF caused by CLD can be treated with therapies that delay progression or reverse damage to the liver; therapy is most effective if HF is diagnosed in early-stage disease. Change of lifestyle is effective at delaying or preventing progression of CLD in the cases of alcoholic liver disease and NASH. [16][17][18] There are also antiviral treatments for viral hepatitis. 19,20 The capacity to provide a quantitative, whole liver, noninvasive MRI surrogate measure for HF would be of pivotal impact to the diagnosis, management, and further development of improvements in new therapies.
As mentioned, the METAVIR score is a method to stage HF from needle biopsy. 9 It has also been suggested as a reference scale for studies that use MRI techniques to stage HF. The METAVIR score is based on a five-point scale ranging from F0 to F4. F0 corresponds to a healthy liver with no detectable HF. F1 is diagnosed when collagen has formed around the portal triads, the veins that supply blood to the hepatocytes that perform the primary function of removing toxins from the blood; F2 is based on identifying HF extending from the portal triads *Address all correspondence to: Jonathan F. Brand, E-mail: jbrand@optics. arizona.edu with fibrosis branching out between the triads; F3 stage is based upon identifying fibrosis bridging across portal triads; and F4, also referred to as cirrhosis, is called when thickened fibrotic bands both bridge triads and encase liver lobules.
Magnetic-resonance elastography (MRe) 13,21-24 is another MRI method that has been developed for HF detection and staging. MRe utilizes an externally triggered transducer that produces mechanical, pneumatically driven, longitudinal pressure distortion waves on the surface of the patient's body that transfer to the liver. 21,22,25,26 Deformation in response to the pressure waves is spatially and temporally measured using a phase-sensitive MRI technique, allowing conversion to a measure of liver tissue stiffness. 21,24,[27][28][29] Liver stiffness has been shown to correlate with the presence of HF. 13,24,27,[30][31][32] Limitations of MRe include the requirement for specialized equipment, additional time in the MR scanner, potential patient discomfort, and that MRe is relatively insensitive to early stages of HF, 23,27,29 ≤F2 disease. Our goal is to develop a quantitative MR imaging technique directly sensitive to the textural changes the pathologist is observing, but with the advantages of being noninvasive, fast, requiring only standard MRI systems, and with the capacity to provide whole-liver sampling.
Gadolinium (Gd) contrast agent (gadobenate dimeglumine) has previously been shown to accumulate in the extracellular space where collagen has emerged in the liver, providing contrast between healthy and fibrotic tissue. 2,6 Gadolinium reduces the T1 and T2 wherever it accumulates. The effect on T1 is much greater than on T2, making gadolinium an effective in vivo T1-imaging contrast agent. 33 In vivo images suitable for the detection of HF are collected with a T1-weighted MRI pulse sequence at the delayed phase of Gd-enhancement. The spatial resolution achievable in images acquired by clinical MRI is very near the scale of the characteristic size of the hepatic lobule and larger HF bands. However, the images are challenging to analyze reproducibly and quantitatively by unaided radiologists.
Since statistics of the data are not known, we chose to assume the data obey normal statistics, and use a Hotelling observer to assess local texture in formalin-fixed liver samples obtained at autopsy using MR images. Data were collected with a 3-D gradient-echo pulse sequence using a TR/TE/NA of 9.79 ms∕ 4.44 ms∕2. These settings were chosen, based on results outlined in Section 2.1, to recreate the contrast observed in vivo in Gdenhanced images that radiologists read to assess HF.
In clinical MRI, the operator has control over the TR, TE, and flip angle, and receives radiologist feedback to confirm if the sequence is collecting images with diagnostically acceptable contrast. The image sequence used is not necessarily ideal for performing the task of separating images of F0 and F4 liver images with a mathematical observer. Our goal is to use taskbased performance assessment using a linear ideal (Hotelling) observer to determine the optimal parameters for maximizing sensitivity to fibrotic structures in MR imaging.
The TR, TE, and flip angle parameters directly contribute to the contrast of an MR image. However, TR and TE also directly impact the scan time of the image sequence. Increasing the duration of the MR acquisition in the abdomen is not desirable due to increased artifacts from motion associated with the patient's breathing. Changing the flip angle has a similar effect on overall contrast without significantly impacting the length of the sequence. For this reason, we focus on determining the ideal flip angle for an MR sequence that will be used to assess HF.
We use area-under-the-receiver-operator-characteristic curve (AUROC) as the figure-of-merit, and present results of a study to find the optimal flip angle for detecting HF in liver phantoms. This optimization method is translatable to the clinical setting.

Liver Phantoms
Liver specimens were recovered from the University of Arizona's Department of Pathology for use as MRI phantoms. One-to two-inch thick slices were sectioned during autopsy and fixed in formalin. After the liver was fixed in formalin, biopsies were collected and the phantom placed in an air-tight container. The containers were then placed in the MRI to collect images.
To confirm our observation that MRI of formalin-fixed tissue is comparable to clinical contrast-enhanced MRI, we compared the textures of the liver tissues using a technique introduced by Burgess et al. The method takes the Fourier transform of the data and measures the radially averaged power spectra in frequency space. The results are reported on a log-log scale to determine the slope of the power spectra. Imaging modalities with similar slopes in the spectral density have similar image features. 34,35 We compared healthy and cirrhotic patient data to healthy and cirrhotic phantom images. Patient data were collected at a TR/TE/FA of 4.36 ms∕2.3 ms∕10 deg and phantom data were collected at a TR/TE/FA of 9.79 ms∕4.44 ms∕10 deg. All data sets were collected with resolutions of 1.5 mm 2 inplane resolution at a slice thickness of 3 mm. Results for this experiment are shown in Fig. 1. All of the collected spectra exhibited similar slopes to within experimental error. With this result, we conclude that the formalin-fixed tissue produces images with textures similar to in vivo clinical images of fibrosis. Higher-resolution in vivo experiments are not yet possible due to the limitations of patient respiratory motion.

Magnetic Resonance Imaging
All images in this optimization study were collected on a Siemens 3T Skyra MRI using the Siemens flex body imaging coil using a 3-D gradient-echo T1-weighted imaging sequence (3-D VIBE, Siemens) with TR∕TE ¼ 9.79 ms∕4.44 ms, 2 averages, and a range of FAs from ∼10 to ∼50 deg. The FAs available are based on hardware limitations. A field-of-view (FOV) of 26.5 × 26.2 × 3.36 cm with a sampling matrix of 768 × 760 × 96 was selected, resulting in images with isotropic resolution of 0.35 mm 3 . All images were collected at room temperature (22°C). The total scan time at one FA was ∼25 min. We collected enough data to fully train the covariance matrices required to calculate the Hotelling observer. This MRI optimization method will eventually be repeated with in vivo data sets once enough patient images are collected. Before the mathematical observer was trained to perform the task of HF-detection on liver tissue, a basic threshold was implemented to segment out and remove areas of the image that contained blood vessels from the analysis. We found this to be a necessary step in developing the observer technique.

Biopsy
Tissue biopsy was used as the gold standard to determine the METAVIR score for the phantoms prior to imaging. Eight scalpel biopsies of 10 × 10 × 1 mm 3 size were collected from each formalin-fixed liver. Large biopsies were possible since the tissue was excised from autopsy. H&E stained slides were  prepared for review by a pathologist, who evaluated each sample for the stage of fibrosis. Only phantoms with consistent METAVIR scores were selected to train and test the observers.

Local Texture Analysis
We tested texture analyses based on a local, normalized, 2-D discrete autocorrelation (2DAC) and a local, normalized, 2-D discrete circular autocorrelation (2DCC). We found that a Hotelling observer working with 2DCC is capable of distinguishing whether local regions drawn from an F0∕F1 or an F4 liver while requiring a modest amount of training data. The 2DCC is given by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 6 1 2 where σ is the modulo function that represents a circular shift of the conjugate vector. M and N are set to the ROI size; both were 7 in this experiment to encompass a little more than one typical ECM cell. The final vector is normalized a maximum value of one.
The results of the 2DCC on many 7 × 7 pixel regions of interest (ROIs) were the data used to train the optimal linear observer. Training ROI's were nonoverlapping and included no major blood vessels as a result of the threshold mask. The symmetries of the 2DCC ensured that the observer had no orientational dependence.

Optimal Linear Observer
To perform the classification task between a signal-absent class corresponding to normal liver, and a signal-present class corresponding to fibrotic liver, an optimal linear observer that maximizes detection signal-to-noise ratio (SNR), also known as the Hotelling observer, was trained using local 2DCCs from 2-D slices from MR images of phantoms with METAVIR scores confirmed via biopsy. The set of pixels from the texture analysis were ordered as a P × 1 vector. 36,37 The optimal linear observer template, is also a P × 1 vector and a test statistic is calculated as an inner product between a data vector and the observer vector 36,37 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 3 2 6 ; 5 4 0 τðgÞ ¼ X P p¼1 w p g p ¼w tg : The Hotelling observer is given by 36,37 where K j is the covariance matrix of the training data from class j andḡ 1 is the corresponding mean data. The Hotelling observer is the ideal linear observer when the data vectors are described by multivariate normal statistics and the covariance matrices of the data in each class are equal or approximately equal. The recovered test statistic τðgÞ is used to make a decision based on a threshold. If τðgÞ is greater than τ th then it is decided that H 1 is true, whereas if τðgÞ is less than τ th , H 0 is true. 37

Ideal Quadratic Observer
The Hotelling observer is the ideal observer only when K 0 ≅ K 1 . However, if the statistics are multivariate normal but the covariance matrices are not equal or nearly equal, then the quadratic observer is the ideal observer. It is given by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 6 3 ; 1 7 0 τðgÞ ¼ The ideal quadratic observer can be computed with the same data required to train the linear observer. 38 Decisions are made in the same manner as for the linear observer, using a comparison of τðgÞ to t th to decide ifg is a member of H 0 or H 1 .

Receiver Operator Characteristic Analysis
We recovered the receiver-operator-characteristic (ROC) curves and calculated the AUROC as the figure-of-merit for the observer. To perform ROC analysis, τ th was varied across the range of possible τ values spanned by the test statistics τ 0 and τ 1 . τ 0 is the test statistics from confirmed signal-absent testing data and τ 1 is the test statistics from confirmed signal-present testing data. At each threshold τ th , the false-positive fraction (FPF) and true-positive fraction (TPF) were calculated 37 using E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 3 2 6 ; 2 6 2 and E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 3 2 6 ; 2 1 8 P τ i denotes the total number of test statistics and P τ i ð> k <Þτ th denotes the number of scalar test statistics either above or below the threshold. For each threshold, the TPF was plotted as a function of the FPF, forming the ROC curve. The figure-of-merit for the ROC curve is the AUROC, which has a possible range from 0.5 to 1.0. An AUROC of 0.5 denotes a situation where the distribution of test statistics fully overlap one another and the observer can do no better than random guessing. An AUROC (or AUC for short) of 1.0 means a complete separation of the test-statistic distributions and perfect observer performance.

Curve Fitting
To determine the optimal FA, the AUC was plotted as a function of FA, AðθÞ. A smoothing spline was implemented to interpolate the AUC data using the MATLAB ® smoothing toolbox. The optimal FA was selected at the maximum of the spline curve.

Liver Phantoms and Biopsy
We collected biopsies from four liver phantoms fixed in formalin for imaging with MRI. Each phantom had biopsies from eight regions assessed by a pathologist. Only phantoms with homogeneous biopsy results were used in this study. Two phantoms were reported as F4, one as F0, and one as F1.
Representative images of biopsy slides are provided for each phantom in Fig. 2.
The F0 sample showed no sign of fibrosis, whereas the F1 biopsy showed early fibrosis forming around the portal veins. The two F4 phantoms have a complete ECM and lobules were clearly visible in the biopsy slides. The F0 and F1 livers were used to define the signal-absent class of the linear and quadratic observers and the two F4 livers define the signalpresent class to train the model observers. Figure 3 provides a representative MRI of the phantoms at TR/TE 9.79 ms∕4.44 ms at FA 19 deg associated with each biopsy sample.

MRI of Liver Phantoms
Each phantom was imaged at five flip angles: 8,15,19,30, and 45 deg in order. Selected slices from each phantom at 19 deg are shown in Fig. 3.
The images from the F0 and F1 phantoms appear relatively untextured and the liver tissue appears uniform in signal throughout a majority of the tissue. The F1 liver in Fig. 3(b) has some features associated with vasculature. This is dependent on the location the slice is removed from during autopsy. The vasculature features, which are dark, are ignored by our analysis. The F4 images suggest that there is visible contrast between the ECM and liver tissue in the cirrhotic livers that appears at the expected length scale associated with fibrosis.

Training Model Observers
The set of local 2DCCs from the F0 and F1 phantoms comprised our signal-absent data for training a model observer and the 2DCCs from the two F4 phantoms comprised the signal present data. To avoid bias in the results, only one phantom was used to train the model observer; the other phantom was selected as the testing data. With four phantoms, two in each class, we could derive and test 4 independent observers to check for reproducibility. 7 × 7 pixel ROIs were selected with independent gridding to calculate the means and covariance matrices. With this selection method, the liver in Fig. 3(a) had 248,439 ROIs, the liver in Fig. 3(b) had 56,865 ROIs, the liver in Fig. 3(c) had 49,858 ROIs, and the liver in Fig. 3(d) had 127,150 ROIs. The linear observers for each FA are shown in flattened 1-D form of length P in Fig. 4, based notation in Eq. (5). The observer index is the vector component. We find that the templates all detect the same features, regardless of choice of training and testing data and FA-namely the peaks in the 2DCC function associated with the ECM cell size. Figure 5 provides the 2-D representation M × N of the templates at each FA for one set of training data, based on indexing in Eq. (1). The 2-D templates show a high degree of rotational symmetry, as expected for 2DCCs, which make the results invariant to image rotation.
The Hotelling observer has a template form w ⇀ that one can visualize, whereas the quadratic observer does not. The sample covariance matrices for each flip angle for a representative signal-absent and signal-present training combination are shown in Fig. 6.

ROC Analysis and Curve Fitting
The four phantoms allowed for four different combinations of training and testing data, for which ROC analysis was performed and the AUC for each combination was calculated as a function of flip angle. ROIs were selected with a sliding window to increase sensitivity to local changes in texture; the liver in Fig. 3(a) had 7,686,034 ROIs, the liver in Fig. 3(b) had 2,952,176 ROIs, the liver in Fig. 3(c) had 2,503,899 ROIs, and the liver in Fig. 3(d) had 6,053,726 ROIs. The AUC values are plotted as a function of flip angle in Fig. 7. The mean relative AUCs, after a minimal least squares adjustment to remove overall offsets between training and testing combinations, were computed as a function of flip angle, and plotted for the linear and quadratic observers in Figs. 8 and 9, respectively. The optimal flip angle was chosen based on maximizing the AUC and for both the linear and quadratic observers was found to be near 24 deg. The AUC values for the quadratic observer did not improve upon the AUC values of the linear observer.

Conclusions and Future Work
Task-based optimization of MRI acquisition sequence parameters can be carried out whenever a model observer is applied to the MRI images, and this method should have extensive utility for a variety of clinical applications.
The method of optimization shown in this work was focused on phantoms, but the approach can be translated to clinical practice. Similar studies are planned that use data collected from in vivo scans and will thus be useful for improving patient sequences. The optimal flip angle determined for ex vivo phantoms at our resolution is not necessarily the optimal flip angle for the in vivo experiments.
Additionally, more moderate cases of HF will be collected to establish the AUCs for early detection. This will have the challenge of identifying intermediate cases, i.e., F1, F2, and F3, with gold standard verification. We expect to extend our techniques to repeat the optimization experiment for best multiple-class decisions.
We are acquiring more phantom data with early stage liver disease to further develop this tool, but this is difficult due to limited access to autopsy tissue samples. Even though only a limited number of phantoms were used in the current work, we were able to collect enough training data to calculate the required covariance matrices and test the Hotelling template. We are also considering alternatives to the 2DCC for local texture analysis.