It is estimated that there are 161,360 new cases of prostate cancer and 26,730 deaths from prostate cancer in the USA in 2017.1 Magnetic resonance image (MRI) becomes a routine modality for prostate examination.2–5 Accurate segmentation of the prostate and lesions from MRI has many applications in prostate cancer diagnosis and treatment. However, manual segmentation can be time consuming and subject to inter- and intrareader variations. In this study, a deep learning method is proposed to automatically segment the prostate on T2-weighted (T2W) MRI.
Recently, deep learning has dramatically changed the landscape of computer vision. The initial work was proposed for image classification6 using a big data set of natural images called ImageNet.7 Currently, most deep learning models are for image-level classification.6,8 To obtain a pixel-wise segmentation, some researchers9–15 proposed a patch-wise segmentation method, which extracts small patches (e.g., ) from images and then trains a patch-wise convolutional neural network (CNN) model. In the training stage, each patch extracted from training image is assigned a label, which can be directly fed into the image-level classification framework to learn a CNN model. If the center pixel of the patch belongs to the foreground, the label of this patch is 1. In the testing stage, the patches are extracted from the testing image first. The learned CNN model can be used to infer the label of testing patches. The label is assigned to the center pixel of the testing patches. Tajbakhsh et al.9 proposed a deep learning method for intima-media boundary segmentation from ultrasound image. They formulated the boundary segmentation task as a pixel-level classification problem. To train the CNNs, 200,000 training small patches were extracted from the images and AlexNet was used in their study. Zhang et al.10 proposed to use deep CNNs for segmenting isointense stage brain tissues using multimodality MRI. For each subject, they generated patches centered at each pixel from T1, T2, and fractional anisotropy images. The patches were considered as training and testing samples in their study. Ciresan et al.12 presented a deep neural network method to segment neuronal membranes in electron microscopy images. They used a special type of deep artificial neural network as a pixel classifier. The label of each pixel was predicted from raw pixel values in a square patch centered on it. Pereira et al.13 proposed a CNN-based method for segmentation of brain tumors in MRI. To train the CNN for low grade gliomas (LGG) and high grade gliomas (HGG), they extracted around 450,000 and 335,000 small patches, respectively. Kooi et al.14 proposed a CNN method for breast lesion detection in mammography. 39,872 patches were extracted for training the CNNs, which the size of patch is . Milletari et al.15 proposed a patch-wise deep learning method for segmentation of deep brain regions in MRI and ultrasound. They collected patches from both foreground and background and train a CNN.
The performance of patch-wise CNN models can be affected by the patch size. A large patch size reduces the localization accuracy and a small patch size only can see a small context. In addition, when the number of patches is large (each pixel/voxel assigned a patch), there is a high redundant computation that needs to be performed for neighboring patches. To solve these problems, Long et al.16 proposed an end-to-end pixel-wise, natural image segmentation method based on Caffe,17 a deep learning software. They modified an existing classification CNN to a fully convolutional network (FCN) for object segmentation. A coarse label map can be obtained from the network by classifying every local region, and then performing a simple deconvolution based on bilinear interpolation for pixel-wise segmentation. This method does not make use of postprocessing or posthoc refinement by random fields.18,19
Because the FCN algorithm achieves a good performance, researchers proposed various FCN-based methods for medical image segmentation.20–23 Ronneberger et al.20 took the idea of the FCN one step further and presented an framework called U-Net, which is a regular CNN followed by an up-sampling operation, where up-convolutions are used to increase the size of feature maps. Çiçek et al.24 extended the U-Net to obtain three-dimensional (3-D) segmentation. Yu et al.22 proposed a volumetric ConvNet to segment prostate from MRI. They extended a two-dimensional (2-D) FCN into a volumetric fully ConvNet (3D-FCN) to enable volume-to-volume segmentation prediction. Milletari et al.23 proposed a 3-D variant of U-Net architecture called V-Net for prostate segmentation. In the deep neural networks with several convolutional layers, it is important to provide a good initialization, which can improve segmentation performance. In addition, the efficiency of the deep learning architecture is nontrivial. To solve these problems, a pretrained model trained on a large number of natural images is used to initialize the parameters of the proposed network, which can also accelerate the model converges to minimum.
In this paper, we propose the use of FCN for the segmentation of the prostate on MRI. The contribution includes the modification of the FCN and its validation on prostate MRI. The preliminary study of the work was presented at the 2017 SPIE Medical Imaging Conference25 and the authors were requested to submit a full article to the Journal of Medical Imaging (JMI). As compared to the SPIE conference paper, this JMI article made major improvements: (1) we extended the method from 20 patients to 140 patients, (2) we added more literature review in Sec. 1, (3) we performed more segmentation experiments with three databases, and (4) we added significantly more results in this article. The rest of this paper is organized as follows: in Sec. 2, we present the details of the proposed algorithm. Section 3 evaluates the performance of the proposed algorithm and discusses the results of our experiments. We conclude the paper in Sec. 4.
Convolutional Neural Network
In practice, few people train an entire CNN from scratch, since it is difficult to collect a dataset of sufficient size, especially for medical images. In contrast, to learn from scratch, it is common to use a pretrained CNN on a large data set as an initialization and then retrain an own classifier on top of the CNN for the data set in hand, named as fine-tuning. Tajbakhsh et al.9 showed that knowledge transfer from natural images to medical image is possible based on CNN. Therefore, we fine-tune Long’s FCN model16 trained on PASCAL VOC data set26 and retrain it based on our medical image for the prostate segmentation, named as PSNet. Figure 1 shows the proposed deep learning method.
The early layers of PSNet learn low-level generic features that are applicable to most tasks. The late layers learn high-level specific features that are applicable to the application at hand.27,28 Therefore, we only fine-tune the last three layers of the FCN in this work. Figure 2 shows the filters and the outputs of the first hidden layer. The first layer learns simple features, such as edge, junction, and corner. Figure 3 shows the outputs of late hidden layers. The figure shows that the late layers learn high-level features, which yield highly abstracted output images.
The proposed PSNet predicts the probability of each voxel belonging to the prostate or background. For each prostate MRI, the prostate only has a small region compared with the background, which means that the number of the foreground voxels is less than that of the background voxels. This unbalance between the prostate and background regions will cause the learning algorithm to get trapped in local minima with the use of the softmax loss function in Caffe.17 Therefore, the prostate is often missed and the prediction tends to classify prostate voxels as the background. In this work, we use a weighted cross entropy loss function.29 The loss function is formulated as follows:
Data augmentation has been proven to improve the performance of deep learning.6,23,30 To obtain robustness and increased precision on the test data set, we augment the original training data set using image translations and horizontal reflections.
The proposed method was evaluated based on the manually labeled ground truth. Four quantitative metrics are used for segmentation evaluation, which are Dice similarity coefficient (DSC), relative volume difference (RVD), Hausdorff distance (HD), and average surface distance (ASD).31–35 The DSC is calculated as follows:
To compute the HD and ASD, a distance from a pixel to a surface is defined first as . The HD between two surfaces and is calculated as
The ASD is defined as
The proposed method was evaluated on three data sets of prostate MRI, which has 140 T2W MRI volumes in total. For the first data set, 41 in-house T2W prostate volumes were used for our experiments. All subjects were scanned at 1.5 and 3.0 T in Emory Hospital without endorectal coil. The voxel size varies from 0.625 to 1 mm. The size of the axial images is from to .
We also validated our method on two other publicly available data sets called ISBI201336 and PROMISE12,32 which contains 60 and 50 prostate T2 MRI, respectively. These two data sets have 11 overlapped subjects, which were removed in our experiments. Therefore, 99 T2W MR volumes were collected from these two data sets. The subjects were scanned at 1.5 and 3 T, in which part of subjects were scanned with endorectal coil. The voxel size varies from 0.4 to 0.625 mm, whereas the image size varies from to . To better analyze the prostate MRI data, an isotropic volume is obtained for each case using windowed sinc interpolation in our experiments.
We used fivefold cross-validation procedure to evaluate the segmentation performance. Specifically, we used 112 out of the 140 subjects to train the network and used the remaining subjects to evaluate the performance. The average performance across folds was reported.
In our experiments, training and inference were implemented in Python language. All the experiments ran on an Ubuntu workstation equipped with 32 GB memory, an Intel i7 6700 CPU, and an Nvidia GTX 1070 graph card with 8 GB video memory. The training time of the CNN model is 20 h with CuDNN acceleration. Learning rate was set as , while the iteration was 80,000. The weights in the networks were initialized randomly with the pretrained model from natural images. During training, the weights were updated by stochastic gradient descent algorithm with a momentum of 0.99 and a weight decay of 0.0005. One advantage of using FCN for image segmentation is that the entire image can be directly used as an input to the network for both training and testing phases. It leads to an efficient segmentation. Each prostate MRI was segmented in about 4 s. Caffe17 is used for implementation of the proposed method.
Qualitative Evaluation Results
The performance of the proposed deep learning method was evaluated qualitatively by visual comparison with the manually segmented contours. Figure 4 shows the qualitative results.
Three prostate segmentation methods were chosen to evaluate our approach, which are the FCN model trained from scratch, U-Net, and V-Net model. The implementation of U-Net is obtained from the webpage of the authors available in a Github repository: https://github.com/faustomilletari/VNet. We used the original implementation written by the authors of V-Net. To better fit the original implementations of FCN, U-Net, and V-Net, we use different image sizes for different architectures. Due to the memory constraint, V-Net adopts a smaller image size mentioned in the paper. The comparison results of these three methods with our method are provided in Table 1.
Quantitative comparison of the proposed method with three state-of-the-art methods.
|DSC (%)||RVD (%)||HD (mm)||ASD (mm)|
The average RVD of our method is 4.1%, which shows that the segmentation result obtained by the proposed method has a good balance between over-segmentation and under-segmentation. Our method achieved the highest DSC with the lowest standard deviation and the lowest HD with the lowest standard deviation. In addition, we also found that the results based on weighted cross entropy are better than that of using the Dice loss function. The proposed method yielded a DSC of 85.0%, whereas the Dice loss version obtained a DSC of 82.3%.
We proposed an automatic deep learning method to segment the prostate on MRI. An end-to-end deep CNN model was trained on three prostate MR data sets and achieved good performance for prostate MRI segmentation. To the best of our knowledge, this is the first study to fine-tune an FCN that has been pretrained using a large set of labeled natural images for segmenting the prostate on MRI.
Based on the experimental results, we found that the use of pretrained FCN with fine-tuning could yield satisfactory segmentation results. The performances are expected to be further improved by adding more training data sets. The proposed algorithm is efficient and does not require any handcrafted features. Currently, the deep learning algorithm learns the features automatically. Fortunately, other works37–39 have developed new CNN models for segmentation, which can be used to improve our segmentation algorithm in the future work. Deep learning-based segmentation methods rely not only on the selection of neural network architecture but also on the selection of loss function. In future, we will investigate the behavior of custom loss functions and their performances for segmentation task. The deep learning method can be generalized to various organs and lesions segmentation problems beyond prostate segmentation. It can be applied not only to MRI but also other imaging modalities such as CT and ultrasound images.
This research was supported in part by NIH Grant Nos. CA176684, CA156775, and CA204254.
Zhiqiang Tian is an associate professor at Xi’an Jiaotong University. He received his BS degree in control science and engineering from Northeastern University in 2004, and his MS and PhD degrees in control science and engineering from Xi’an Jiaotong University in 2007 and 2013, respectively. He was a postdoctoral fellow in the Department of Radiology and Imaging Sciences of Emory University. He is the author of more than 20 journal and conference papers. His current research interests include computer vision, medical image analysis, and deep learning.
Lizhi Liu is a professor and radiologist at Sun Yat-sen University Cancer Center. He received his BS degree in North Sichuan Medical University in 1996, and his MS and MD degrees in Sun Yat-sen University in 2008 and 2016, respectively. He is the author of more than 40 journal papers. His current research interests include oncologic imaging and medical image analysis.
Zhenfeng Zhang is currently a professor and vice chair in the Department of Radiology, the Second Affiliated Hospital of Guangzhou Medical University in China and obtained the medical degree in China and PhD degree in interventional radiology and oncology. His clinical work focused on diagnosis and minimal invasive treatment of lung cancer and liver cancer, research focused on radiomics and molecular imaging of cancers; clinical trials focused on NGS-based precision medicine study of lung and liver cancers, and multiple CAR-T cells immunotherapy of solid cancers combined with interventional technique strategies.
Baowei Fei is a Georgia Cancer Coalition distinguished scholar and director of the Precision Imaging Research Division in the Department of Radiology and Imaging Sciences at the Emory University School of Medicine. He is also a faculty member in the Joint Department of Biomedical Engineering at the Emory University and Georgia Institute of Technology. He received his PhD from Case Western Reserve University, Cleveland, Ohio, USA. Currently, he is a director of the Quantitative BioImaging Laboratory ( www.fei-lab.org).