Translator Disclaimer
17 January 2018 PSNet: prostate segmentation on MRI based on a convolutional neural network
Author Affiliations +
Automatic segmentation of the prostate on magnetic resonance images (MRI) has many applications in prostate cancer diagnosis and therapy. We proposed a deep fully convolutional neural network (CNN) to segment the prostate automatically. Our deep CNN model is trained end-to-end in a single learning stage, which uses prostate MRI and the corresponding ground truths as inputs. The learned CNN model can be used to make an inference for pixel-wise segmentation. Experiments were performed on three data sets, which contain prostate MRI of 140 patients. The proposed CNN model of prostate segmentation (PSNet) obtained a mean Dice similarity coefficient of 85.0±3.8% as compared to the manually labeled ground truth. Experimental results show that the proposed model could yield satisfactory segmentation of the prostate on MRI.



It is estimated that there are 161,360 new cases of prostate cancer and 26,730 deaths from prostate cancer in the USA in 2017.1 Magnetic resonance image (MRI) becomes a routine modality for prostate examination.25 Accurate segmentation of the prostate and lesions from MRI has many applications in prostate cancer diagnosis and treatment. However, manual segmentation can be time consuming and subject to inter- and intrareader variations. In this study, a deep learning method is proposed to automatically segment the prostate on T2-weighted (T2W) MRI.

Recently, deep learning has dramatically changed the landscape of computer vision. The initial work was proposed for image classification6 using a big data set of natural images called ImageNet.7 Currently, most deep learning models are for image-level classification.6,8 To obtain a pixel-wise segmentation, some researchers915 proposed a patch-wise segmentation method, which extracts small patches (e.g., 32×32) from images and then trains a patch-wise convolutional neural network (CNN) model. In the training stage, each patch extracted from training image is assigned a label, which can be directly fed into the image-level classification framework to learn a CNN model. If the center pixel of the patch belongs to the foreground, the label of this patch is 1. In the testing stage, the patches are extracted from the testing image first. The learned CNN model can be used to infer the label of testing patches. The label is assigned to the center pixel of the testing patches. Tajbakhsh et al.9 proposed a deep learning method for intima-media boundary segmentation from ultrasound image. They formulated the boundary segmentation task as a pixel-level classification problem. To train the CNNs, 200,000 training small patches were extracted from the images and AlexNet was used in their study. Zhang et al.10 proposed to use deep CNNs for segmenting isointense stage brain tissues using multimodality MRI. For each subject, they generated >10,000 patches centered at each pixel from T1, T2, and fractional anisotropy images. The patches were considered as training and testing samples in their study. Ciresan et al.12 presented a deep neural network method to segment neuronal membranes in electron microscopy images. They used a special type of deep artificial neural network as a pixel classifier. The label of each pixel was predicted from raw pixel values in a square patch centered on it. Pereira et al.13 proposed a CNN-based method for segmentation of brain tumors in MRI. To train the CNN for low grade gliomas (LGG) and high grade gliomas (HGG), they extracted around 450,000 and 335,000 small patches, respectively. Kooi et al.14 proposed a CNN method for breast lesion detection in mammography. 39,872 patches were extracted for training the CNNs, which the size of patch is 250×250. Milletari et al.15 proposed a patch-wise deep learning method for segmentation of deep brain regions in MRI and ultrasound. They collected patches from both foreground and background and train a CNN.

The performance of patch-wise CNN models can be affected by the patch size. A large patch size reduces the localization accuracy and a small patch size only can see a small context. In addition, when the number of patches is large (each pixel/voxel assigned a patch), there is a high redundant computation that needs to be performed for neighboring patches. To solve these problems, Long et al.16 proposed an end-to-end pixel-wise, natural image segmentation method based on Caffe,17 a deep learning software. They modified an existing classification CNN to a fully convolutional network (FCN) for object segmentation. A coarse label map can be obtained from the network by classifying every local region, and then performing a simple deconvolution based on bilinear interpolation for pixel-wise segmentation. This method does not make use of postprocessing or posthoc refinement by random fields.18,19

Because the FCN algorithm achieves a good performance, researchers proposed various FCN-based methods for medical image segmentation.2023 Ronneberger et al.20 took the idea of the FCN one step further and presented an framework called U-Net, which is a regular CNN followed by an up-sampling operation, where up-convolutions are used to increase the size of feature maps. Çiçek et al.24 extended the U-Net to obtain three-dimensional (3-D) segmentation. Yu et al.22 proposed a volumetric ConvNet to segment prostate from MRI. They extended a two-dimensional (2-D) FCN into a volumetric fully ConvNet (3D-FCN) to enable volume-to-volume segmentation prediction. Milletari et al.23 proposed a 3-D variant of U-Net architecture called V-Net for prostate segmentation. In the deep neural networks with several convolutional layers, it is important to provide a good initialization, which can improve segmentation performance. In addition, the efficiency of the deep learning architecture is nontrivial. To solve these problems, a pretrained model trained on a large number of natural images is used to initialize the parameters of the proposed network, which can also accelerate the model converges to minimum.

In this paper, we propose the use of FCN for the segmentation of the prostate on MRI. The contribution includes the modification of the FCN and its validation on prostate MRI. The preliminary study of the work was presented at the 2017 SPIE Medical Imaging Conference25 and the authors were requested to submit a full article to the Journal of Medical Imaging (JMI). As compared to the SPIE conference paper, this JMI article made major improvements: (1) we extended the method from 20 patients to 140 patients, (2) we added more literature review in Sec. 1, (3) we performed more segmentation experiments with three databases, and (4) we added significantly more results in this article. The rest of this paper is organized as follows: in Sec. 2, we present the details of the proposed algorithm. Section 3 evaluates the performance of the proposed algorithm and discusses the results of our experiments. We conclude the paper in Sec. 4.




Convolutional Neural Network

In practice, few people train an entire CNN from scratch, since it is difficult to collect a dataset of sufficient size, especially for medical images. In contrast, to learn from scratch, it is common to use a pretrained CNN on a large data set as an initialization and then retrain an own classifier on top of the CNN for the data set in hand, named as fine-tuning. Tajbakhsh et al.9 showed that knowledge transfer from natural images to medical image is possible based on CNN. Therefore, we fine-tune Long’s FCN model16 trained on PASCAL VOC data set26 and retrain it based on our medical image for the prostate segmentation, named as PSNet. Figure 1 shows the proposed deep learning method.

Fig. 1

The framework of the proposed deep CNN. There are seven hidden layers in the CNN. The number shows the feature or channel dimension of each hidden layer.


The early layers of PSNet learn low-level generic features that are applicable to most tasks. The late layers learn high-level specific features that are applicable to the application at hand.27,28 Therefore, we only fine-tune the last three layers of the FCN in this work. Figure 2 shows the filters and the outputs of the first hidden layer. The first layer learns simple features, such as edge, junction, and corner. Figure 3 shows the outputs of late hidden layers. The figure shows that the late layers learn high-level features, which yield highly abstracted output images.

Fig. 2

Filters and outputs of the first hidden layer of the PSNet. (a) Filters of the first hidden layer (3×3 filters) and (b) outputs of the first hidden layer (first 36 only).


Fig. 3

Output of (a) the fourth layer and (b) the fifth layer of the PSNet.


The proposed PSNet predicts the probability of each voxel belonging to the prostate or background. For each prostate MRI, the prostate only has a small region compared with the background, which means that the number of the foreground voxels is less than that of the background voxels. This unbalance between the prostate and background regions will cause the learning algorithm to get trapped in local minima with the use of the softmax loss function in Caffe.17 Therefore, the prostate is often missed and the prediction tends to classify prostate voxels as the background. In this work, we use a weighted cross entropy loss function.29 The loss function is formulated as follows:

where Pi represents the ground truth or golden standard, P^i denotes the probability of the voxel i belonging to the prostate, and WiC is the weight, which is set as 1/|pixels of classxi=C|. Based on the weighted cross entropy loss function, the class unbalancing problem can be alleviated.

Data augmentation has been proven to improve the performance of deep learning.6,23,30 To obtain robustness and increased precision on the test data set, we augment the original training data set using image translations and horizontal reflections.


Evaluation Metrics

The proposed method was evaluated based on the manually labeled ground truth. Four quantitative metrics are used for segmentation evaluation, which are Dice similarity coefficient (DSC), relative volume difference (RVD), Hausdorff distance (HD), and average surface distance (ASD).3135 The DSC is calculated as follows:

where |Sa| is the number of pixels of the prostate from the manually segmented ground truth, and |Sb| is the number of pixels of the prostate from the proposed method. The RVD is computed as follows:

To compute the HD and ASD, a distance from a pixel x to a surface Y is defined first as d(x,Y)=minyYxy. The HD between two surfaces X and Y is calculated as


The ASD is defined as

where |X| and |Y| represent the number of pixels in the surface X and Y, respectively.


Experimental Results


Data Set

The proposed method was evaluated on three data sets of prostate MRI, which has 140 T2W MRI volumes in total. For the first data set, 41 in-house T2W prostate volumes were used for our experiments. All subjects were scanned at 1.5 and 3.0 T in Emory Hospital without endorectal coil. The voxel size varies from 0.625 to 1 mm. The size of the axial images is from 320×320 to 256×256.

We also validated our method on two other publicly available data sets called ISBI201336 and PROMISE12,32 which contains 60 and 50 prostate T2 MRI, respectively. These two data sets have 11 overlapped subjects, which were removed in our experiments. Therefore, 99 T2W MR volumes were collected from these two data sets. The subjects were scanned at 1.5 and 3 T, in which part of subjects were scanned with endorectal coil. The voxel size varies from 0.4 to 0.625 mm, whereas the image size varies from 320×320 to 512×512. To better analyze the prostate MRI data, an isotropic volume is obtained for each case using windowed sinc interpolation in our experiments.

We used fivefold cross-validation procedure to evaluate the segmentation performance. Specifically, we used 112 out of the 140 subjects to train the network and used the remaining subjects to evaluate the performance. The average performance across folds was reported.


Implementation Details

In our experiments, training and inference were implemented in Python language. All the experiments ran on an Ubuntu workstation equipped with 32 GB memory, an Intel i7 6700 CPU, and an Nvidia GTX 1070 graph card with 8 GB video memory. The training time of the CNN model is 20 h with CuDNN acceleration. Learning rate was set as 1×109, while the iteration was 80,000. The weights in the networks were initialized randomly with the pretrained model from natural images. During training, the weights were updated by stochastic gradient descent algorithm with a momentum of 0.99 and a weight decay of 0.0005. One advantage of using FCN for image segmentation is that the entire image can be directly used as an input to the network for both training and testing phases. It leads to an efficient segmentation. Each prostate MRI was segmented in about 4 s. Caffe17 is used for implementation of the proposed method.


Qualitative Evaluation Results

The performance of the proposed deep learning method was evaluated qualitatively by visual comparison with the manually segmented contours. Figure 4 shows the qualitative results.

Fig. 4

The qualitative results of the proposed method. The red curves represent the prostate contours obtained by the proposed method, while the blue curves represent the contours obtained from manual segmentation by an experienced radiologist.



Quantitative Comparison

Three prostate segmentation methods were chosen to evaluate our approach, which are the FCN model trained from scratch, U-Net, and V-Net model. The implementation of U-Net is obtained from the webpage of the authors available in a Github repository: We used the original implementation written by the authors of V-Net. To better fit the original implementations of FCN, U-Net, and V-Net, we use different image sizes for different architectures. Due to the memory constraint, V-Net adopts a smaller image size mentioned in the paper. The comparison results of these three methods with our method are provided in Table 1.

Table 1

Quantitative comparison of the proposed method with three state-of-the-art methods.

DSC (%)RVD (%)HD (mm)ASD (mm)

The average RVD of our method is 4.1%, which shows that the segmentation result obtained by the proposed method has a good balance between over-segmentation and under-segmentation. Our method achieved the highest DSC with the lowest standard deviation and the lowest HD with the lowest standard deviation. In addition, we also found that the results based on weighted cross entropy are better than that of using the Dice loss function. The proposed method yielded a DSC of 85.0%, whereas the Dice loss version obtained a DSC of 82.3%.



We proposed an automatic deep learning method to segment the prostate on MRI. An end-to-end deep CNN model was trained on three prostate MR data sets and achieved good performance for prostate MRI segmentation. To the best of our knowledge, this is the first study to fine-tune an FCN that has been pretrained using a large set of labeled natural images for segmenting the prostate on MRI.

Based on the experimental results, we found that the use of pretrained FCN with fine-tuning could yield satisfactory segmentation results. The performances are expected to be further improved by adding more training data sets. The proposed algorithm is efficient and does not require any handcrafted features. Currently, the deep learning algorithm learns the features automatically. Fortunately, other works3739 have developed new CNN models for segmentation, which can be used to improve our segmentation algorithm in the future work. Deep learning-based segmentation methods rely not only on the selection of neural network architecture but also on the selection of loss function. In future, we will investigate the behavior of custom loss functions and their performances for segmentation task. The deep learning method can be generalized to various organs and lesions segmentation problems beyond prostate segmentation. It can be applied not only to MRI but also other imaging modalities such as CT and ultrasound images.


No conflicts of interest, financial or otherwise, are declared by the authors.


This research was supported in part by NIH Grant Nos. CA176684, CA156775, and CA204254.



R. L. Siegel, K. D. Miller and A. Jemal, “Cancer statistics,” CA—Cancer J. Clin., 67 (1), 7 –30 (2017). Google Scholar


B. Fei, C. Kemper and D. L. Wilson, “A comparative study of warping and rigid body registration for the prostate and pelvic MR volumes,” Comput. Med. Imaging Graphics, 27 (4), 267 –281 (2003). Google Scholar


B. Fei et al., “Slice-to-volume registration and its potential application to interventional MRI-guided radio-frequency thermal ablation of prostate cancer,” IEEE Trans. Med. Imaging, 22 (4), 515 –525 (2003). ITMID4 0278-0062 Google Scholar


W. Qiu et al., “Prostate segmentation: an efficient convex optimization approach with axial symmetry using 3-D TRUS and MR images,” IEEE Trans. Med. Imaging, 33 (4), 947 –960 (2014). ITMID4 0278-0062 Google Scholar


G. Litjens et al., “Computer-aided detection of prostate cancer in MRI,” IEEE Trans. Med. Imaging, 33 (5), 1083 –1092 (2014). ITMID4 0278-0062 Google Scholar


A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, 60 84 –90 (2017). CACMA2 0001-0782 Google Scholar


J. Deng et al., “ImageNet: a large-scale hierarchical image database,” in IEEE Conf. on Computer Vision and Pattern Recognition, 248 –255 (2009). Google Scholar


K. He et al., “Deep residual learning for image recognition,” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 770 –778 (2016). Google Scholar


N. Tajbakhsh et al., “Convolutional neural networks for medical image analysis: full training or fine tuning?,” IEEE Trans. Med. Imaging, 35 (5), 1299 –1312 (2016). ITMID4 0278-0062 Google Scholar


W. Zhang et al., “Deep convolutional neural networks for multi-modality isointense infant brain image segmentation,” NeuroImage, 108 214 –224 (2015). NEIMEF 1053-8119 Google Scholar


Y. Guo, Y. Gao and D. Shen, “Deformable MR prostate segmentation via deep feature learning and sparse patch matching,” IEEE Trans. Med. Imaging, 35 (4), 1077 –1089 (2016). ITMID4 0278-0062 Google Scholar


D. Ciresan et al., “Deep neural networks segment neuronal membranes in electron microscopy images,” in Proc. of the 25th Int. Conf. on Neural Information Processing System, 2843 –2851 (2012). Google Scholar


S. Pereira et al., “Brain tumor segmentation using convolutional neural networks in MRI images,” IEEE Trans. Med. Imaging, 35 1240 –1251 (2016). ITMID4 0278-0062 Google Scholar


T. Kooi et al., “Large scale deep learning for computer aided detection of mammographic lesions,” Med. Image Anal., 35 303 –312 (2017). Google Scholar


F. Milletari et al., “Hough-CNN: deep learning for segmentation of deep brain regions in MRI and ultrasound,” Comput. Vision Image Understanding, 164 92 –102 (2017). Google Scholar


J. Long, E. Shelhamer and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 3431 –3440 (2016). Google Scholar


Y. Jia et al., “Caffe: convolutional architecture for fast feature embedding,” in Proc. of the 22nd ACM Int. Conf. on Multimedia, 675 –678 (2014). Google Scholar


C. Farabet et al., “Learning hierarchical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. Intell., 35 (8), 1915 –1929 (2013). ITPIDJ 0162-8828 Google Scholar


B. Hariharan et al., “Simultaneous detection and segmentation,” Lect. Notes Comput. Sci., 8695 297 –312 (2014). LNCSD9 0302-9743 Google Scholar


O. Ronneberger, P. Fischer and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” Lect. Notes Comput. Sci., 9351 234 –241 (2015). LNCSD9 0302-9743 Google Scholar


G. Litjens et al., “A survey on deep learning in medical image analysis,” Med. Image Anal., 42 60 –88 (2017). Google Scholar


L. Yu et al., “Volumetric ConvNets with mixed residual connections for automated prostate segmentation from 3D MR images,” in Annual Conf. of Association for the Advancement of Artificial Intelligence, 66 –72 (2017). Google Scholar


F. Milletari, N. Navab and S.-A. Ahmadi, “V-Net: fully convolutional neural networks for volumetric medical image segmentation,” in Fourth Int. Conf. on 3D Vision (3DV), (2016). Google Scholar


Ö. Çiçek et al., “3D U-Net: learning dense volumetric segmentation from sparse annotation,” Lect. Notes Comput. Sci., 9901 424 –432 (2016). LNCSD9 0302-9743 Google Scholar


Z. Tian, L. Liu and B. Fei, “Deep convolutional neural network for prostate MR segmentation,” Proc. SPIE, 10135 101351L (2017). PSISDG 0277-786X Google Scholar


M. Everingham et al., “The Pascal visual object classes (VOC) challenge,” Int. J. Comput. Vision, 88 (2), 303 –338 (2010). IJCVEQ 0920-5691 Google Scholar


J. Yosinski et al., “How transferable are features in deep neural networks?,” in Proc. of the 27th Int. Conf. on Neural Information Processing Systems, 3320 –3328 (2014). Google Scholar


M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” Lect. Notes Comput. Sci., 8689 818 –833 (2014). LNCSD9 0302-9743 Google Scholar


P. F. Christ et al., “Automatic liver and lesions segmentation using cascaded fully convolutional neural networks and 3D conditional random fields,” Lect. Notes Comput. Sci., 9901 415 –423 (2016). LNCSD9 0302-9743 Google Scholar


D. Ciregan, U. Meier and J. Schmidhuber, “Multi-column deep neural networks for image classification,” in IEEE Conf. on Computer Vision and Pattern Recognition, 3642 –3649 (2012). Google Scholar


Z. Tian et al., “Superpixel-based segmentation for 3D prostate MR images,” IEEE Trans. Med. Imaging, 35 (3), 791 –801 (2016). ITMID4 0278-0062 Google Scholar


G. Litjens et al., “Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge,” Med. Image Anal., 18 (2), 359 –373 (2014). Google Scholar


H. Akbari and B. Fei, “3D ultrasound image segmentation using wavelet support vector machines,” Med. Phys., 39 (6), 2972 –2984 (2012). MPHYA6 0094-2405 Google Scholar


X. Yang et al., “Cupping artifact correction and automated classification for high-resolution dedicated breast CT images,” Med. Phys., 39 (10), 6397 –6406 (2012). MPHYA6 0094-2405 Google Scholar


H. Wang and B. Fei, “An MR image-guided, voxel-based partial volume correction method for PET images,” Med. Phys., 39 (1), 179 –194 (2012). MPHYA6 0094-2405 Google Scholar


N. Bloch et al., “NCI-ISBI 2013 challenge: automated segmentation of prostate structures,” (2015). Google Scholar


K. He et al., “Mask R-CNN,” in IEEE Int. Conf. on Computer Vision (ICCV), (2017). Google Scholar


H. Zhao et al., “Pyramid scene parsing network,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), (2017). Google Scholar


Z. Wu, C. Shen and A. V. D. Hengel, “Wider or deeper: revisiting the ResNet model for visual recognition,” (2016). Google Scholar


Zhiqiang Tian is an associate professor at Xi’an Jiaotong University. He received his BS degree in control science and engineering from Northeastern University in 2004, and his MS and PhD degrees in control science and engineering from Xi’an Jiaotong University in 2007 and 2013, respectively. He was a postdoctoral fellow in the Department of Radiology and Imaging Sciences of Emory University. He is the author of more than 20 journal and conference papers. His current research interests include computer vision, medical image analysis, and deep learning.

Lizhi Liu is a professor and radiologist at Sun Yat-sen University Cancer Center. He received his BS degree in North Sichuan Medical University in 1996, and his MS and MD degrees in Sun Yat-sen University in 2008 and 2016, respectively. He is the author of more than 40 journal papers. His current research interests include oncologic imaging and medical image analysis.

Zhenfeng Zhang is currently a professor and vice chair in the Department of Radiology, the Second Affiliated Hospital of Guangzhou Medical University in China and obtained the medical degree in China and PhD degree in interventional radiology and oncology. His clinical work focused on diagnosis and minimal invasive treatment of lung cancer and liver cancer, research focused on radiomics and molecular imaging of cancers; clinical trials focused on NGS-based precision medicine study of lung and liver cancers, and multiple CAR-T cells immunotherapy of solid cancers combined with interventional technique strategies.

Baowei Fei is a Georgia Cancer Coalition distinguished scholar and director of the Precision Imaging Research Division in the Department of Radiology and Imaging Sciences at the Emory University School of Medicine. He is also a faculty member in the Joint Department of Biomedical Engineering at the Emory University and Georgia Institute of Technology. He received his PhD from Case Western Reserve University, Cleveland, Ohio, USA. Currently, he is a director of the Quantitative BioImaging Laboratory (

© 2018 Society of Photo-Optical Instrumentation Engineers (SPIE) 2329-4302/2018/$25.00 © 2018 SPIE
Zhiqiang Tian, Lizhi Liu, Zhenfeng Zhang, and Baowei Fei "PSNet: prostate segmentation on MRI based on a convolutional neural network," Journal of Medical Imaging 5(2), 021208 (17 January 2018).
Received: 21 September 2017; Accepted: 20 December 2017; Published: 17 January 2018

Back to Top