## 1.

## Introduction

Cardiovascular disease is the leading cause of death globally.^{1} Atherosclerosis of the coronary artery disease results in remodeling and narrowing of the arteries that supply oxygenated blood to the heart, and thus may lead to myocardial infarction. Common interventional approaches include percutaneous coronary intervention and coronary artery bypass graft surgery.^{2} The choice of treatment will vary depending on a range of clinical factors, including morphology of the vessel wall, and degree of stenosis as quantified by cross-sectional luminal area.

Imaging of the vasculature, specifically coronary arteries, plays a critical role in assessment of these treatment options. X-ray computed coronary angiography and cardiac magnetic resonance imaging allow noninvasive imaging but are very limited in their ability to assess the structure of the artery walls.^{3} Invasive techniques, such as intravascular ultrasound (IVUS),^{4} provide cross-sectional imaging of the artery walls, but with limited spatial resolution.^{5} Intravascular optical coherence tomography (IVOCT) lacks the image penetration depth of IVUS but provides far higher resolution imaging, allowing visualization and quantification of critical structures such as the fibrous cap of atherosclerotic plaques and delineation of the arterial wall layers.^{6}7.^{–}^{8} In addition, IVOCT is finding application in imaging coronary stents to assess vascular healing and potential restenosis.^{9}^{,}^{10}

Delineation of the vessel lumen in IVOCT images enables quantification of the luminal cross-sectional area. Such delineation has also been used as the first step toward plaque segmentation^{11}^{,}^{12} and the assessment of stent apposition.^{13} However, manual delineation is impractical due to the high number of cross-sectional scans acquired in a single IVOCT pullback scan, typically $>100$ images. Automatic delineation of the lumen wall is challenging due to various reasons. Nonhomogenous intensity, blood residue, the presence and absence of different types of stents, irregular lumen shapes, image artifacts, and bifurcations are some of these challenges.^{14}

Previous delineation approaches have employed edge detection filters^{15} and spline-fitting to segment the lumen boundary and stent struts.^{16} Other approaches have included the use of wavelet transforms and mathematical morphology,^{17} Otsu’s automatic thresholding and intersection of radial lines with lumen boundaries,^{11}^{,}^{12} Markov-random fields models,^{18} and light back-scattering methods.^{19}

Deep learning is a type of machine learning algorithm utilizing artificial neural networks (ANN), which in recent years has been found to be useful for medical image processing. Input features are processed through a multilayered network, defined by a network of weights and biases, to produce a nonlinear output. During training, these weight and bias values are optimized by minimizing a loss function, mapping training input to known target output values. Convolutional neural networks (CNNs) are a particular subset of ANN that operate on input with regular structure: they apply convolutional filters to the input of each layer, and have proven to be highly effective in image classification tasks.^{20}21.^{–}^{22}

Most neural network applications in image processing are image-based classification models, where the network is trained to classify each pixel in the input image into one of several classes. The use of this technique has been extended into a variety of medical image segmentation applications. For example, CNNs have been used to classify lung image patches in interstitial lung disease^{23} as well as head and neck cancer in hyperspectral imaging.^{24} CNNs have also been applied in retinal layer and microvasculature segmentation of retinal OCT images,^{25}^{,}^{26} and arterial layers segmentation in patients with Kawasaki disease.^{27} These CNN methods employ the commonly used feature classification approach.

An alternative approach is to train the network to perform linear regression, in contrast to feature classification. Recently, a linear-regression CNN model has been demonstrated to outperform conventional CNN in cardiac left ventricle segmentation.^{28} CNN regression was used to infer the radial distances between the left ventricle centerpoint and the endo- and epicardial contours in polar space. This indicates the possibility of an alternative application of CNNs for image segmentation in comparable medical applications.

In this paper, we propose a method of coronary lumen segmentation for clinical assessment and treatment planning of coronary artery stenosis using a linear-regression CNN. We test the algorithm on *in vivo* clinical images and assess it against gold-standard manual segmentations. This is the first use of a linear-regression CNN approach to the automated delineation of the vessel lumen in IVOCT images. This paper is structured as follows: Sec. 2 provides experimental details and an explanation of the CNN architecture and implementation; Sec. 3 provides accuracy results benchmarked against interobserver variability of manual segmentation, and an assessment of the impact of varying the amount of training data; and Secs. 4 and 5 conclude with a discussion of the potential clinical impact and limitations of such an approach.

## 2.

## Materials and Method

## 2.1.

### IVOCT Data Acquisition and Preparation for Training and Testing

The data used for this study comprise IVOCT-acquired images of patients diagnosed with coronary artery disease. The IVOCT images were acquired from the University of Malaya Medical Center (UMMC) catheterization laboratory using two standard clinical systems: Illumien and Illumien Optis IVOCT Systems (St. Jude Medical). Both systems have an axial resolution of $15\text{\hspace{0.17em}\hspace{0.17em}}\mu \mathrm{m}$ and a scan diameter of 10 mm. The Ilumien system and the Ilumien Optis system have maximum frame rates of 100 and 180 fps, respectively. The study was approved by the University of Malaya Medical Ethics Committee (Ref: 20158-1554), and all patient data were anonymized.

In total 64 pullbacks were acquired from 28 patients [25%/75% male/female, with mean age 59.71 ($\pm 9.61$) years] using Dragonfly^{™} Duo Imaging Catheter with 2.7 F crossing profile when the artery was under contrast flushing (Iopamiro^{®} 370). The internal rotating fiber optic imaging core performed rotational motorized pullback scans for a length of 54 or 75 mm in 5 s. These scans include multiple pre- and poststented images of the coronary artery at different locations. These pullbacks were randomly assigned to one of two groups with a ratio of 7:3, i.e., 45 pullbacks were randomly designated as training sets and the remaining 19 as test sets. Excluding images depicting only the guide catheter, each pullback contains between 155 and 375 polar images. These images contain a heterogeneous mix of images with the absence or presence of stent struts (metal stents or bioresorbable stents or both), fibrous plaques, calcified plaques, lipid-rich plaques, ruptured plaques, thrombus, dissections, motion artifacts, bifurcations, and blood artifacts. The original size of each pullback frame was $984\times 496\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$ (axial × angular dimension), and was subsampled in both dimensions to $488\times 248\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$ to reduce training and processing time. For each image, raw intensity values were converted from linear scale to logarithmic scale before normalizing by mean and standard deviation.

Gold-standard segmentations were generated on both training and test sets by manual frame-by-frame delineation using ImageJ^{29} in Cartesian coordinates, according to the document of consensus,^{14} whereby a contour was drawn between the lumen and the leading edge of the intima. The contour was also manually drawn across the guidewire shadow and bifurcation at locations that best represent the underlying border of the main lumen, gauged by the adjacent slices. The manual contour of the lumen border for each image was subsequently converted to polar coordinates, smoothed and interpolated to 100 points using cubic B-spline interpolation method for CNN training and testing.

## 2.2.

### Convolutional Neural Networks Regression Architecture and Implementation Details

Using our linear-regression CNN model, in each polar image we infer the radius parameter of the vessel wall at 100 equidistant radial locations, rather than the more conventional approach of classifying each pixel within the image. This has the advantage of avoiding the physiologically unrealistic results that may arise from segmentation of individual pixels. The lumen segmentation was parameterized in terms of radial distances from the center of the catheter in polar space.

The general flow of the proposed CNN model is shown in Fig. 1. Our network consists of a simple structure with four convolutional layers and three fully connected layers, including the final output layer. All polar images were padded circularly left and right before being windowed for input. The window dimension was $488\times 128\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$ centered on each individual radial point, therefore yielding 100 inputs and 100 evaluated radial distances per image.

The details of the network architecture are presented in Table 1. In the network architecture, a filter kernel of size $5\times 5\times 24$ with boundary zero-padding was applied for all convolutional layers, yielding 24 feature maps at each layer. In the first layer, a stride of 2 was also applied along the angular dimension to reduce computational load. The first three layers were also max-pooled by size $2\times 2$. Each fully connected layer contains 512 nodes. Exponential linear units^{30} were used as the activation functions for all layers, including both convolutional and fully connected layers, except the final layer. Dropout with keep probability of 0.75 was applied to the fully connected layers FC1 and FC2, to improve the robustness of the network.^{31} The final layer outputs a single value representative of the radial distance between the lumen border and the center of the catheter for the radial position being evaluated.

## Table 1

Linear-regression CNN architecture for lumen segmentation at each windowed image. The output is the radial distance at the lumen border from the center of the catheter. CN, convolutional layer; FC, fully connected layer.

Layer | In | Weights | Pooling | Out |
---|---|---|---|---|

CN1a | $488\times 128\times 1$ | $1\times 5\times 5\times 24$ | $2\times 2$ | $244\times 32\times 24$ |

CN2 | $244\times 32\times 24$ | $24\times 5\times 5\times 24$ | $2\times 2$ | $122\times 16\times 24$ |

CN3 | $122\times 16\times 24$ | $24\times 5\times 5\times 24$ | $2\times 2$ | $61\times 8\times 24$ |

CN4 | $61\times 8\times 24$ | $24\times 5\times 5\times 24$ | — | $61\times 8\times 24$ |

FC1 | 11712 | $11712\times 512$ | —- | 512 |

FC2 | 512 | $512\times 512$ | — | 512 |

Out | 512 | $512\times 1$ | 1 |

## a

A stride of size 2 was applied on the angular dimension to reduce computational load.

The objective function used for the network training is the standard mean-squared error. Starting from a random initialization, the weight and bias parameters are iteratively minimized by calculating the mean squared error between the gold-standard radial distance and the output of the CNN training. The Adam stochastic gradient algorithm was used to perform the optimization, i.e., minimization, of the objective function.^{32} The network was trained stochastically with a mini-batch size of 100 at a base-learning rate of 0.005. The learning rate was halved every 50,000 runs. The training was stopped at 400,000 runs where convergence was observed (i.e., when the observed losses had ceased to improve for at least 100,000 runs). The trained weights and biases of the network, amounting to $\sim 6.3$ million parameters, are subsequently used to predict the lumen contour on the test sets.

The neural network was designed in a Python (Python Software Foundation, Delaware) environment using the TensorFlow v1.0.1 machine learning framework (Google Inc., California). The execution of the network was performed on a Linux-based Intel i5-6500 CPU workstation with NVIDIA GeForce GTX1080 8GB GPU. The training time for 45 training sets was 13.8 h and the complete inference time for each test image was 40.6 ms.

## 2.3.

### Validation

The accuracy of our proposed linear-regression CNN lumen segmentation was validated against the gold-standard segmentation of the test data pullback acquisitions, which were the aforementioned 19 manually delineated pullbacks. These pullbacks contain in total 5685 images. The accuracy was assessed in three ways: (1) on a point-by-point basis via distance error measure, (2) in the form of binary image overlaps, and (3) based on luminal area.

The first assessment involves point-by-point analysis on the 100 equidistant radial contour points from all images, whereby the mean absolute Euclidean distance error between the gold standard and predicted contours was computed for each image.

The second assessment was performed to evaluate the regions delineated as lumen. The amount of overlap between the binary masks as generated from the predicted contours and the corresponding gold standards was computed using the Dice coefficient and Jaccard similarity index.

The third assessment targeted at the luminal area, which is one of the clinical indices to locate and grade the extent of coronary stenosis for treatment planning. Luminal area was computed from the binary mask produced from the predicted contours and compared against the corresponding gold standard. We also performed a one-tailed Wilcoxon signed ranks test on the errors of the estimated luminal areas at the significance level of 0.001. Three-dimensional (3-D) surface models of the lumen wall were also generated for all pullbacks to facilitate visual comparison of the segmentation by manual contouring and automated contouring using the proposed CNN regression model.

## 2.4.

### Dependency of Network Performance on Training Data Quantity

To understand the dependency of the network performance to the amount of training data required, we assessed the variation in accuracy of the 19 test pullbacks against different numbers of training datasets. Tests were performed with 10, 15, 20, 25, 30, 35, 40, and 45 pullbacks. The training pullbacks for each group were selected randomly. The number of training runs with different training sets was kept constant at 400,000 runs, with a similar base learning rate and learning rate decay protocol.

## 2.5.

### Interobserver Variability Against Convolutional Neural Networks Accuracy

To quantify the allowable variation in segmentation, we performed an experiment to assess variation in the manual gold standard that would be generated by three independent observers.

One hundred images were selected randomly from five pullbacks of the test sets and the lumen manually delineated by three independent observers. The interobserver variability was assessed through Bland–Altman analysis, consistent with Celi and Berti in their study on the segmentation of coronary lesions.^{11} Specifically, the signed differences among all possible corresponding pairs of luminal areas from all three observers were plotted against their mean area differences. Bland–Altman analyses were also performed on luminal areas evaluated by the CNN against the corresponding evaluation by all observers. These analyses provide an understanding of the total bias and limits of agreement (i.e., 95% confidence interval or $1.96\times $ standard deviation of the signed differences from the mean) among all observers themselves as well as between the CNN and the observers.

## 3.

## Results

## 3.1.

### Dependency of Network Performance on Training Data Quantity

The results assessing the impact of training data quantity on CNN accuracy are shown in Fig. 2. The value reported here is the mean positional accuracy of each point along the vessel wall. There was notable improvement in CNN accuracy with an increase in the training data quantity up until 25 training data sets. Beyond that, the mean absolute error per image varied little with increased data. However, the optimal CNN segmentation was obtained from training with the highest sample size, i.e., 45 pullbacks consisting of 13,342 training images, as summarized in Table 2. At 45 training pullbacks, the median of the mean absolute error per image as quantified using point-by-point analysis was 21.87 microns, whereas Dice coefficient and Jaccard similarity index were calculated as 0.985 and 0.970, respectively.

## Table 2

Accuracy of CNN segmentation with 45 training pullbacks (n=13,342). The values are obtained based on the segmentation on 19 test pullbacks (n=5685).

Measure | Median (interquartile range) |
---|---|

Mean absolute error per image (point-by-point analysis), $\mu \mathrm{m}$ | 21.87 (16.28, 31.29) |

Dice coefficient | 0.985 (0.979, 0.988) |

Jaccard similarity index | 0.970 (0.958, 0.977) |

Representative segmentation results are shown in Fig. 3. Apart from performing well on images with clear lumen border contrast Fig. 3(a), linear-regression CNN segmentation has shown robustness in segmenting images with inhomogenous lumen intensity (b), severe stenosis (c), blood residue due to suboptimal flushing (d)–(f), multiple reflections (g), embedded stent struts (h) and (i), malapposed metallic stent struts (j), malapposed bioresorbable stent struts (k), and minor side branches [(c), (i), and (l)]. Acceptable lumen segmentation was found at the shadow behind the guide wire and metallic stent struts across all images. Errors were observed to occur most frequently at major bifurcations (angle spanning $>\sim 90\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$), where the appropriate boundary for segmenting the main vessel was ambiguous [Figs. 4(c)–4(d)]. Seventy-two percent of the 100 worst performing segmentations were found to contain major bifurcations and, at these locations, overestimation of the area of the main vessels was noted.

Based on the results obtained with the optimal training quantity (45 pullback data sets), we calculated luminal area estimates in all 19 test pullbacks, as tabulated in Table 3. CNN segmentation yields median (interquartile range) luminal area of 5.28 (3.88, 7.45) ${\mathrm{mm}}^{2}$ matching well with the results of manual segmentation of 5.26 (3.93, 7.45) ${\mathrm{mm}}^{2}$ (i.e., gold standard). The median (interquartile range) absolute error of luminal area was 1.38%, which is statistically significantly below 2% ($p<0.001$) as tested by the one-tailed Wilcoxon signed rank test. Figure 5 shows two representative examples of the 3-D reconstructed vessel wall from two different pullbacks for visual comparison of CNN regression (middle column) against gold-standard manual (left column) segmentation. The vessel wall was color-coded with the cross-sectional luminal area. Difference in luminal area between CNN regression and gold-standard segmentation is color-coded on the vessel wall on the right column.

## Table 3

Luminal area in 19 test pullbacks with optimal training.

Method | Median (interquartile range) |
---|---|

Luminal area (${\mathrm{mm}}^{2}$) | |

Manual segmentation area | 5.28 (3.88, 7.45) |

CNN segmentation area | 5.26 (3.93, 7.45) |

Percentage errora (%) | |

Signed percentage error | 0.06 (−1.24, 1.53) |

Absolute percentage errorb | 1.38 (0.63, 2.62) |

## a

Normalized by manual segmentation area.

## b

Significantly below 2%, p<0.001

## 3.2.

### Interobserver Variability Against Convolutional Neural Networks Accuracy

The Bland–Altman analysis among all three observers showed a bias (mean signed difference) of $0.0\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{2}$ and limits of agreement of $\pm 0.599\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{2}$ in terms of luminal area estimation [Fig. 6(a)]. Comparing the CNN with all observers, the bias was $0.057\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{2}$ and the variability in terms of limits of agreement was comparable at $\pm 0.665\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{2}$ [Fig. 6(b)]. These results suggest that automated segmentation had sub-100 micron bias to over-estimate luminal area, and that the variation between automated and manual estimates of luminal area was only slightly greater than the interobserver variability among human observers.

## 4.

## Discussion

Lumen dimension is an important factor in the optimization of percutaneous coronary intervention. This measure allows the clinician to localize and measure the length of lesions along the vessel wall before making an optimum selection of stent for deployment. It also allows one to indirectly assess the quality of stenting (i.e., based on total expansion of the narrowed artery) and is the first step toward quantifying the amount of stent malapposition. Misinterpretation of lesion location and length results in both clinical and financial consequences as additional stents are required for redeployment, and overlapping of multiple stents is often associated with increased incidences of restenosis, thrombosis, and adverse clinical outcomes.^{33}

Manually quantifying coronary lumen dimension from IVOCT images over the entire extent of the imaged segment is currently not clinically feasible in view of the number of sample images available per pullback (i.e., $>100$ images). Automatic segmentation is desirable but challenging due to the significant variety of image features and artifacts obtained in routine scanning, restricting the operation of most image processing algorithms to a specific subset of good quality images. Deep learning techniques have been shown to be more robust in a pool of heterogeneous input images, and this has also been demonstrated in our results.^{28} Our study represents the first to employ such a technique, combined with a linear regression approach, to the automatic segmentation of lumen from IVOCT images.

Our results showed a notable increase in CNN accuracy up to 25 training pullbacks, and incremental improvements thereafter. The median accuracy in luminal radius at each radial location, against a manual gold standard, was $21.87\text{\hspace{0.17em}\hspace{0.17em}}\mu \mathrm{m}$ at optimal training with 45 training pullbacks, which is comparable with the OCT system’s axial resolution ($15\text{\hspace{0.17em}\hspace{0.17em}}\mu \mathrm{m}$). The median luminal area was marginally greater by manual segmentation in comparison with CNN segmentation (i.e., 5.28 versus $5.26\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{2}$), yielding a median error of 1.38% (i.e., significantly $<2\%$ at $p=0.001$). The CNN also has good limits of agreement against all observers ($\pm 0.665\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{2}$), which is comparable with the limit of agreement among all observers ($\pm 0.599\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{2}$).

Published algorithms have required the prior removal of guide-wires or blood artifacts in the images as well as interpolation of output contours across guidewire shadow and bifurcation^{11}^{,}^{12}^{,}^{27}^{,}^{34} to complete an accurate segmentation. Our linear-regression CNN algorithm did not require additional pre- and postprocessing of the data, with the behavior across these features arising implicitly from the training data. In addition, the proposed method works on a wide spectrum of IVOCT images whether in the presence or absence of stent struts. We found this approach to be of utility in assessing patients both pre- and poststenting. Furthermore, the CNN segmentation was able to segment images regardless of stent types and no prior information on implanted type is needed, as can be required by some other segmentation techniques,^{35} making it applicable in a wider range of clinical settings.

We note that while training time was significant (13.8 h for 45 training pullbacks), this is all precomputed prior to clinical usage. The subsequent time to process a test image was extremely small (40.6 ms). Thus, the use of linear-regression CNNs offers the potential of intra-operative assessment of the vessel lumen during an intervention.

Limitations of the algorithm occur at areas with highly irregular lumen shapes, and at major bifurcations, where vessel lumen of the main branch is ambiguous even for manual segmentation. We note that this implementation of the algorithm has adopted a two-dimensional processing approach where each image is processed independently. Extending this to a volumetric approach, where adjacent slices influence the segmentation of each image, may result in more stable results in these situations. Alternatively, some form of energy minimization approach may be incorporated into the CNN cost function to enforce additional regularization of the lumen shape.

## 5.

## Conclusion

This paper has demonstrated a linear-regression CNN for the segmentation of vessel lumen in IVOCT images. The algorithm was tested on clinical data and compared against a manual gold standard. Results suggested that the CNN provided accurate estimates of the lumen boundary, with errors only slightly greater than the interobserver variability among multiple human observers. In addition, the algorithm was fast, processing test images at a rate of 40.6 ms per image. Our results suggest that the linear-regression CNN-based approach has the potential to be incorporated into a clinical workflow and provide quantitative assessment of vessel lumen in an intraoperative time frame.

## Acknowledgments

This research was funded by the University of Malaya Research Grant (RP028A-14HTM) and the University of Malaya Postgraduate Research Grant (PG052-2015B). Prof. McLaughlin is supported by a Premier’s Research and Industry Fund grant provided by the South Australian Government Department of State Development, and by the Australian Research Council (CE140100003 and DP150104660).

## References

*In-vivo*segmentation and quantification of coronary lesions by optical coherence tomography images for a lesion type definition and stenosis grading,” Med. Image Anal. 18(7), 1157–1168 (2014). http://dx.doi.org/10.1016/j.media.2014.06.011 Google Scholar

## Biography

**Yan Ling Yong** received his bachelor’s degree in biotechnology from Pennsylvania State University, USA. Currently, he is a postgraduate student at the University of Malaya, Malaysia, performing research in image processing.

**Li Kuo Tan** received his master’s degree in biomedical engineering from Monash University, Australia. He is a lecturer in the University of Malaya, Malaysia. His research interests include medical imaging and image processing.

**Robert A. McLaughlin** is the chair of biophotonics at the University of Adelaide, Australia. He received his PhD from the University of Western Australia and subsequently was a postdoc at the University of Oxford.