The human retinal circulation is composed of complex capillary networks that are responsible for satisfying the high metabolic requirements of the multiple neuronal populations within the retina.1 Retinal vascular diseases, such as diabetic retinopathy and vascular occlusions, contribute significantly to the burden of visual impairment worldwide.2 Fluorescein angiography (FA) has been considered the gold standard in the evaluation and diagnosis of retinal vascular diseases. Despite its widespread use, this technique is limited by the background choroidal flush from resolving the fine structural details of the multiple layers of retinal capillaries.3 In addition, FA requires the administration of intravenous contrast dye, which carries a small risk of significant adverse events.4 Optical coherence tomography angiography (OCT-A) is a new imaging technology that allows noninvasive, dye-free visualization of the retinal circulation.5 We have implemented a speckle-variance technique for OCT-A as a noninvasive imaging modality that uses the change in the speckle pattern due to red blood cell movement in sequentially acquired OCT images; the corresponding intensity variance in the structural images is used to identify the retinal microvasculature. Using OCT-A, we have been able to show comparable quantitative and qualitative characteristics of the peripapillary,67.–8 foveal,9 and perifoveal10 images to cadaveric histological representation.
Macular capillary density is correlated to retinal thickness and visual functioning in patients with diabetic retinopathy.11 Hence, accurate serial quantification of the retinal microcirculation is a useful marker in evaluating the severity of retinal vascular diseases. Following OCT-A image acquisition, accurate segmentation of the retinal microvasculature is a critical step in the quantitative analysis of the retinal circulation. Retinal vessel segmentation has been demonstrated in multiple medical imaging modalities12,13 and is well documented in the literature. However, as the vasculature detail and appearance are different for each modality, optimal segmentation approaches may differ between modalities. For vessel segmentation in OCT-A images, only a limited body of work has been conducted.
Automated approaches of segmenting retinal vessels using OCT-A data are becoming more prevalent, yet manual segmentation remains the gold standard. Manual segmentation of the retinal blood vessels in OCT angiography images is a time-consuming and tedious task, which requires training. Reliable automated segmentation of these vessels is paramount for automated microvasculature quantification. The simplest automated approach, adaptive thresholding, has been used14 but is limited in its sensitivity to the selection of a suitable threshold as well as its insensitivity to the shape and morphology of the microvasculature. One group has skeletonized the OCT-A images of retinal vessels in order to obtain retinal vasculature perfusion density maps15 but this approach is still insensitive to the various widths of the vessels. Lastly, another group implemented automated blood vessel segmentation using a hybrid Hessian/intensity-based method while imaging wound healing in a mouse ear (pinna) with OCT-A.16 Although an accuracy of 0.94 was obtained when comparing the automated result to manual segmentations of human retinal fundus images, validation of the technique in human OCT-A retinal images still needs to be done.
This paper presents a new method for automated segmentation of blood vessels in retinal OCT-A images using deep neural networks (DNN). DNNs have shown promising results in solving a variety of problems, such as object recognition in images,17,18 speech recognition,19 semantic segmentation of images,20,21 handwritten character classification recognition,22 and text analysis.23
The main contribution of this paper is to demonstrate the high effectiveness of the deep learning approach to the segmentation of blood vessels in OCT-A images. The automated segmentation results on the images acquired from a clinical prototype OCT-A system were compared with the manual segmentations from two separate trained raters and discussed.
All subject recruitment and imaging took place at the Eye Care Centre of Vancouver General Hospital. The project protocol was approved by the Research Ethics Boards at the University of British Columbia and Vancouver General Hospital, and the experiment was performed in accordance with the tenets of the Declaration of Helsinki. Written informed consent was obtained by all subjects.
Speckle Variance Optical Coherence Tomography Imaging
Speckle variance OCT images of the foveal region in 12 eyes from 6 healthy volunteers aged years were acquired using a graphics processing unit-accelerated OCT-A clinical prototype.24 In total, 80 images were acquired. Briefly, the OCT system uses a 1060-nm swept source (Axsun Inc.) with 100-kHz A-scan rate and a full-width half-maximum bandwidth of 61.5 nm, which corresponds to a coherence length of in tissue. For the speckle variance calculation, three repeat acquisitions were obtained at each B-scan location. The scan area was sampled in a grid with a field of view in 3.15 s. Images were acquired either directly superiorly, nasally, inferiorly, or temporally from the foveal avascular zone. Processing of the OCT intensity image data and en face visualization of the retinal microvasculature was performed in real time using our open source code.25,26
For comparison, two raters segmented OCT-A images using a Wacom Intuos 4 tablet and GNU image manipulation program. For the cross-validation and training, Rater A segmented all 80 OCT-A images. For the repeatability analysis, 10 images were used and segmented by both rater A and rater B. Rater A segmented each image twice for intrarater agreement, while Rater B segmented each image once for interrater agreement.
Deep Neural Network Architecture
The automated segmentation of the blood vessels in the OCT-A images was performed by classifying each pixel into either the vessel or the nonvessel class using deep convolutional neural networks. Convolutional and max pooling layers are used as hierarchical feature extractors, which map raw pixel intensities into a feature vector, which is then classified using fully connected layers.
The convolutional layers in our algorithm are made of a sequence of square filters, which perform a two-dimensional convolution with the input image. To calculate the output of each map, convolutional responses are summed and passed through a nonlinear activation function. The nonlinear activation function used in this paper is a rectifying linear unit.
Max pooling layers generate their output by taking the maximum activation over nonoverlapping square regions. These layers do not have adjustable parameters and their size is fixed. By taking the maximum value of the activation function, the most prominent features are selected from the input image.
After six stages of varied convolutional and max pooling layers, a dropout layer was inserted, which can prevent network over-fitting and provide a way of combining an exponentially increasing number of different neural networks in an efficient manner.27 Then, two fully connected layers are used to classify the feature vector generated by the previous layers. The final fully connected layer contains two neurons where one neuron represents the vessel and other the nonvessel class. The network architecture used in this paper is very similar to the network architecture first used in Ref. 20. An overview is presented in Table 1 and graphically in Fig. 1.
Network layers architecture.
|Layer||Type||Maps and size||Kernel size|
|0||Input||1 map of neurons|
|1||Convolutional||32 maps of neurons|
|2||Max pooling||32 maps of neurons|
|3||Convolutional||32 maps of neurons|
|4||Max pooling||32 maps of neurons|
|5||Convolutional||32 maps of neurons|
|6||Max pooling||32 maps of neurons|
|8||Fully connected||150 neurons|
|9||Fully connected||2 neurons|
Network Training Methods
To train our network, original OCT-A images and the corresponding manual segmentations were used as inputs. Each training example consists of a square window around the training pixel. Missing pixels in windows at the image border were set to zero. To have a balanced training set, an equal number of vessel and nonvessel pixels were extracted from each image. If the number of vessel pixels was larger than the number of nonvessel pixels in an image, then all nonvessel pixels were selected for the training set and an equal number of vessel pixels were randomly selected from the pool of vessel pixels. Similarly, if the number of nonvessel pixels was larger than the number of vessel pixels in an image, then all vessel pixels were selected for the training set and an equal number of nonvessel pixels were randomly selected from the pool of nonvessel pixels.
Network Segmentation Methods
The trained network was then used to segment the original OCT-A images. First, a square window of the same size used for the training purposes was extracted around each pixel of the test images. A forward pass using all test image pixels was performed using the trained network, and each pixel was assigned a grayscale value, with higher values representing higher confidence of the pixel being a vessel pixel. These pixel values were aggregated into the output grayscale images, and median filtering with a small window was performed in order to decrease the noise level in the image.
Three-fold cross-validation on all images manually segmented by Rater A was performed. All 80 original images were randomly divided into three sets. Images from two of the sets were used to train the network, and images from the remaining set were used to test the network. This procedure was repeated three times with a different test set each time. Each set of the cross-validation was evaluated on a separate computer in order to decrease the total training and testing time. Each computer had a recent generation NVIDIA graphics card, which decreased computation time. The Caffe deep learning toolkit28 was used to efficiently use the processing power of the graphics card for computation of convolutional neural network parameters. Using parallel processing, all three sets were used to train the proposed neural network in approximately 30 h. Segmentation of a single image using the trained network took .
The segmentation performance was evaluated by pixel-wise comparison of the manually segmented images and the thresholded binary output of the neural network using varying thresholds. The number of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) were calculated using pixel-wise comparison between a reference manual segmentation and a target, which was either another manual segmentation, or the output of our automated method. In our context, a pixel is considered as TP if it is marked as a blood vessel in both the reference manual segmentation and in the target. A pixel is considered as FN if it is marked as blood vessel in the manual segmentation but missed by the target. A pixel is considered as FP if it is marked as vessel by our method but it is not marked as blood vessel in the target. A pixel is considered as TN if it is not marked as blood vessel in both the manual segmentation and in the target. Using the TP, FP, FN, and TN numbers we can calculate the accuracy: , sensitivity: , specificity: and positive predictive value (PPV): of the segmentation.
Using the PPV and sensitivity we can calculate the measure using Eq. (1).
All of these measures can be calculated on individual images but can also be calculated for the whole dataset. In Fig. 2, the dotted blue line shows the accuracy for all pixels in the dataset against the threshold value used to binarize the output of the network. The accuracy of blood vessel detection increases from the threshold value at 0, peaks at 0.83 with threshold value of 0.78, and then begins to decline. It is important to note that similar results are obtained in a wide range of thresholds, which indicates that the performance is not sensitive to the threshold chosen.
In Fig. 3, the accuracy for each image was calculated and averaged over all images. One standard deviation below the mean values is marked with a green dotted line and one standard deviation above the mean values is marked with a blue dotted line. Qualitatively, the deviation of accuracies is reasonably small for different thresholds, with the maximum mean accuracy of at the threshold value of 0.76, signifying that the performance of the method is consistent over the whole dataset. The accuracy of the deeper capillary network [inner nuclear layer (INL) to outer plexiform layer] is 0.8247 while the accuracy of the superficial capillary networks (inner limiting membrane to INL) is 0.8389; the lower accuracy of the deeper layers is likely due to projection artifact from the superficial vascular layers.
Using the sensitivity and specificity measurements over the range of thresholds we can plot the receiver operator characteristic (ROC) for our method, as shown in Fig. 4 with blue dots. The sensitivity and specificity were calculated using all pixels from the dataset.
In Fig. 5, the measure was calculated for the machine output using all pixels from the dataset and shown with blue dots.
Intrarater and Interrater Agreement
As described in Sec. 2.3 among the 80 images segmented by Rater A, 10 images were additionally segmented a second time by Rater A, and also by Rater B for assessing the intra- and interrater agreement. For convenience, we used the accuracy measures discussed above and the original segmentation by Rater A (Rater A1) as the ground-truth in order to assess its agreement with (1) the repeat segmentation of Rater A (Rater A2), (2) Rater B, and (3) the network. The machine segmentation accuracy results of (3) were obtained as part of the threefold validation in Sec. 3.1. The results are shown in Fig. 2 in dotted cyan, solid black, and solid red lines, respectively. The intra- and interrater accuracies for the manual raters are plotted as lines because they are independent of the threshold used for the machine based segmentation. From the Fig. 2, the intrarater, interrater, and machine-rater accuracies are comparable, suggesting that the automated segmentation is comparable to that of a human rater. As it was expected, the accuracy of the repeated segmentation is better than the accuracy of the second rater but the difference is small.
In Fig. 4, the ROC curve of the automated segmentation is compared with Rater A1 (solid red line). In the same figure, the cyan star represents the sensitivity and specificity pair for Rater A2 compared with Rater A1 and the black cross represents the sensitivity and specificity pair for Rater B compared with Rater A1. The ROC curve was created by plotting the sensitivity against the false-positive rate (1-specificity) at various thresholds to depict relative trade-offs between true positives and false positives. A completely random result would be represented by a diagonal line. As seen in Fig. 4, the results from the automated DNN method are better than the manual segmentation results and well above the random result.
In Fig. 5, we can see the measure curve for the machine output marked with solid red curve, and the measures for Rater A2 (dotted straight cyan line) and for Rater B (solid straight black line). The measure depicts the trade-off between precision and recall with each variable weighted equally. As such a higher -measure has a better balance between precision and recall. As seen in Fig. 5, there is a wide range of thresholds in which the balance between precision and recall is higher than the manual raters.
Capillary density (CD) is a clinical measure of quantifying retinal capillaries present in the OCT-A images. After segmentation of the vessels, CD can be calculated as the number of pixels in the segmented areas. Using the same 10 images from Sec. 2.3, we obtained the CD values from the segmentations by Rater A1, Rater A2, Rater B, and the network, and calculated the mean capillary density in order to evaluate the intrarater, interrater, and machine-to-rater repeatability of the CD measures. The result is presented in Table 2.
Mean capillary density comparison.
|Mean (N=10)||Standard deviation||Standard error mean||p-value|
A paired-samples -test was conducted to compare the capillary density of manual and automated segmentations. There was no significant difference in the scores for either of the manual raters or the machine.
The problem of blood vessel segmentation in OCT-A images is challenging due to the low contrast and high noise levels in OCT-A images. We have presented a deep convolutional neural network-based segmentation method and validation using 80 foveal OCT-A images. In the cross-validation in Sec. 3.1, the accuracy percentage of the trained network fell in range of 80% to 83%. From the results, we conclude that the machine based segmentation was comparable to the manual segmentation by a human rater.
In the intra- and interrater comparison in Sec. 3.2, we found similar degrees of agreement for the repeated segmentations by a single rater, and segmentations from two different raters, showing substantial intra- and interrater variability in the manual segmentation. This suggests that the trained network may perform as well as a new human rater. Given the amount of time ( to 25 min) required for a human rater to perform the segmentation manually versus 2 min for the automated method, this represents a tool that could be useful in the clinical environment to save valuable human time and present results to the clinician in a shorter interval.
In addition to comparison with manual segmentation, the validity and merit of automated segmentation of medical images can be assessed by deriving clinical parameters such as capillary density. This approach is particularly appropriate if the quality of the derived parameters can be measured, e.g., by the correlation to other relevant clinical features, and if the quality of the manual segmentation ground truth is not reliable. In Sec. 3.3, capillary density was calculated for the manual and machine segmentations. A paired-samples -test was conducted to compare the capillary density of manual and automated segmentations. There was no significant difference in the scores for either of the manual raters or the machine.
As the performance of a machine learning based approach is closely linked to the quality of the training data, using high quality data is important. However, the performance of a human rater, the ground-truth for training the network, is limited due to the difficulty in delineating the capillaries of some data sets. This was mainly due to poor contrast, vertical motion artifacts, and high noise levels. In Fig. 6, we can see an example of a poor dataset, with an accuracy of 77.12% and an example of a typical dataset with accuracy of 81.16%. We have observed performance variability in the vessel thickness due to the field of view and have chosen to train each field of view separately to take this into account. The dataset in this paper only contains images from one field of view (). The automated algorithm does segment the larger vessels (arterioles and venuoles) with a higher degree of certainty than the smaller vessels (capillaries).
This problem could be potentially mitigated by producing ground-truth data that is measurably better than data from a single expert by using images segmented by two or more trained volunteers as the input to the learning procedure. In this case, multiple segmentations of each image would be combined to select regions that are high in agreement by the raters, and the combined image would be then used for the learning procedure. A drawback to this approach would be the human labor cost of several trained raters segmenting a sufficiently large number of images for training purposes. Also, increasing the enface image quality in the acquisition stage would increase the quality of the manual rater accuracy and repeatability. This in turn can reduce the noise level in the ground truth data and make this method more robust.
Segmentation of the retinal microvasculature is an important step in quantification of retinal images for clinical purposes. For OCT-A, a new method for retinal vasculature visualization, automated segmentation of the retinal vasculature remains a relatively unexplored area. Through comparisons of results from the DNN method and manual raters, the accuracy of our method is found to be comparable to a manual rater. For clinical applications, this is an important step in creating an automated segmentation usable for clinical analysis.
The authors would like to acknowledge funding support from the Natural Sciences and Engineering Research Council of Canada (NSERC), the Brain Canada Foundation, Alzheimer Society Canada, the Pacific Alzheimer Research Foundation, the Michael Smith Foundation for Health Research (MSFHR), and Genome British Columbia. The authors would also like to acknowledge Vuk Bartulović, without whose contributions this would work would not be possible.
Pavle Prentašić is a PhD candidate at the Faculty of Electrical Engineering and Computing, University of Zagreb. He received his BS and MEng degrees in computer science from the University of Zagreb in 2010 and 2012, respectively. His current research interests include computer vision, machine learning, and biomedical image processing and analysis. He is a member of IEEE.
Morgan Heisler is a MASc student in the Faculty of Applied Sciences at Simon Fraser University, Canada. She received her BASc (Hons.) from Simon Fraser University in 2015 and her current research interests include optical coherence tomography and biomedical image processing and analysis.