Major calcifications are of great concern when performing percutaneous coronary interventions because they inhibit proper stent deployment. We created a comprehensive software to segment calcifications in intravascular optical coherence tomography (IVOCT) images and to calculate their impact using the stent-deployment calcification score, as reported by Fujino et al. We segmented the vascular lumen and calcifications using the pretrained SegNet, convolutional neural network, which was refined for our task. We cleaned segmentation results using conditional random field processing. We evaluated the method on manually annotated IVOCT volumes of interest (VOIs) without lesions and with calcifications, lipidous, or mixed lesions. The dataset included 48 VOIs taken from 34 clinical pullbacks, giving a total of 2640 in vivo images. Annotations were determined from consensus between two expert analysts. Keeping VOIs intact, we performed 10-fold cross-validation over all data. Following segmentation noise cleaning, we obtained sensitivities of 0.85 ± 0.04, 0.99 ± 0.01, and 0.97 ± 0.01 for calcified, lumen, and other tissue classes, respectively. From segmented regions, we automatically determined calcification depth, angle, and thickness attributes. Bland–Altman analysis suggested strong correlation between manually and automatically obtained lumen and calcification attributes. Agreement between manually and automatically obtained stent-deployment calcification scores was good (four of five lesions gave exact agreement). Results are encouraging and suggest our classification approach could be applied clinically for assessment and treatment planning of coronary calcification lesions.
Major calcifications are of great concern when performing percutaneous coronary intervention (PCI) because they can hinder stent deployment. Approximately 700,000 PCIs are performed each year, and many involve the use of stents to open up obstructed coronary arteries.1 Calcified plaques are found in 17% to 35% of patients undergoing PCI.2–4 Calcifications can lead to stent underexpansion and strut malapposition, which in turn can lead to increased risk of thrombosis and in-stent restenosis.5–10 A cardiologist has several options when confronting a calcified lesion: high balloon pressures (up to 30 atm) to fracture the calcification, scoring balloon, ShockwaveTM intravascular lithotripsy (IVL), rotational atherectomy, etc. In some cases, the lesion may not be treatable.
Intravascular optical coherence tomography (IVOCT) has significant advantages for characterizing coronary calcification, as compared to other imaging modalities commonly used by interventional cardiologists. Although clinicians routinely use x-ray angiography for treatment planning to describe the vessel lumen, angiography does not provide specific information regarding vascular wall composition except in the case of severely calcified lesions.11 Intravascular ultrasound (IVUS) can identify the location of coronary calcification, but cannot assess the thickness because the radiofrequency signal is reflected from the calcium tissue interface giving an acoustic shadow.12 IVOCT, however, provides the location and often the thickness of a calcification.13 IVUS has better penetration depth (IVUS: 5 to 10 mm; IVOCT: 1 to 2 mm)14,15 and does not require blood clearing for imaging. However, IVOCT has superior resolution (axial: 15 to ; lateral: 20 to ) as compared to IVUS (axial: 150 to ; lateral: 200 to ).16,17
Currently, the need for specialized training, uncertain interpretation, and image overload ( images in a pullback) have suggested a need for automated analysis of IVOCT images. There are multiple reports of automated IVOCT image analysis. Ughi et al.18 applied machine learning to perform pixelwise classification of fibrous, lipid, and calcified plaque. Athanasiou et al.19 segmented calcification and then classified lipid, fibrous, and mixed tissues using 17 features with -means and postanalysis. Zhou et al.20 developed a classification and segmentation method using texture features described by the Fourier transform and discrete wavelet transform to classify adventitia, calcification, lipid, and mixed tissue. Our group developed machine learning21 and deep learning22,23 methods to automatically classify plaque regions. Rico-Jimenez et al.24 used linear discriminant analysis to identify normal and fibrolipidic A-lines. Yong et al.25 proposed a linear regression convolutional neural network to automatically segment the vessel lumen. Abdolmanafi et al.26 used deep learning to identify layers within the coronary artery wall and to identify Kawasaki disease.27 Recently, Gessert et al.28 used convolutional neural networks to identify IVOCT frames that contain plaque.
Although all of the aforementioned studies were promising, some limitations exist. (1) Many studies have a limited number of images, limiting the ability to generalize. (2) Experimental design for some studies used the same lesion for training, validation, and testing. This will cause the model to overfit. (3) One study did only lumen segmentation without any plaque characterization. (4) Many studies did slice-level or region of interest classification. It was unclear how this information could be used clinically. In our study, we did pixelwise segmentation and used the results to calculate the stent deployment calcification score that defines lesions that would benefit from plaque modification prior to stent implantation. (5) It is unclear if that all reports use a sufficiently large base of support (receptive field) in the image to capture a priori knowledge of calcified plaque distribution [e.g., calcified lesions have an “orientation” roughly parallel to the lumen in the () representation].
In this paper, we focus on the important problem of segmenting calcifications in IVOCT images and assessing their impact on stent deployment. We build on previous studies and use deep learning to perform semantic segmentation of the lumen and calcification within IVOCT images. We use a large manually segmented training set with voxels labeled as lumen, calcification, and other. We use conditional random fields (CRF) to clean noisy segmentation results. Rather than simply reporting DICE or voxel sensitivity/specificity, as done in most previous publications, we report comparisons of automated versus manual assessments of clinically relevant calcification attributes. These include calcification depth, angle, and thickness. In addition, to assess calcification impact on stent deployment, we evaluated a previously reported, stent deployment calcification score,13 as computed from our automatically segmented calcifications. To our knowledge, this is the first publication focusing on segmentation and on clinically important analyses of calcified plaques.
Image Processing and Analysis
Preprocessing and Data Sets Augmentation
Preprocessing steps are applied to the raw IVOCT images obtained in the polar () domain. Data values are log transformed to convert multiplicative speckle noise into an additive form. Image speckle noise is reduced by filtering with a normalized Gaussian kernel (standard deviation 2.5 pixels in a footprint).18 Optionally, IVOCT () images are scan converted to create () images. We evaluate both () and () data representations for segmentation of IVOCT data. Images in the () representation are ( by 0.75 deg). For () representations, images were ().
During training, data are augmented to provide more examples and to change locations of calcifications so as to improve spatial invariance of methods. For anatomical () images, we rotate the images with an angle picked randomly between to . To augment () data, we concatenate all the () images to form one large 2-D array, where the direction corresponds to tissue depth and the corresponds to catheter rotation, which rotates from 0 deg to 360 deg for each image. By changing an offset angular shift, we can resample new 360 deg () images. In practice, we shifted the starting A-line five times by increments of 100 A-lines. Data augmentation steps for the () representations are shown in Fig. 1. Note that all images in this report are shown after log conversion for improved visualization.
Deep Learning Model Architecture and Implementation Details
We choose SegNet29 as our network architecture (Fig. 2). SegNet is an end-to-end hour-glass-shaped encoder–decoder convolutional neural network, which was pretrained on the CamVid dataset.30 Each encoder/decoder convolution set consists of a convolution layer, a batch normalization layer,31 and a rectified linear unit (ReLU) layer.32 All convolution layers were set to have the following hyperparameters: filter size of 3, a stride of 1, and zero padding of size 1. These parameters were empirically selected using onefold of our training data as described in Sec. 3.2. This filter size was chosen to detect small features, including the edges of calcified plaques. The depth of the network was 5. In our implementation, we performed transfer learning with weighted initialized using VGG-16.
The base of support (or receptive field) for each layer can be found as33,34
We process the data by using a batch size of 2. We implement batch normalization layer to normalize each input channel across a minibatch. This is done as31
Finally, in our implementation, convolutional and batch normalization layers are followed by a ReLU and a max pooling layer. A ReLU layer performs a threshold operation to each element, where any input value less than zero is set to zero3235 All max pooling layers had a pool size of 2 pixels and stride of 2 pixels. Max pooling channels transfer the maximum responses and their indices from the encoder to the decoder to identify corresponding locations when upsampling. The model produces pixelwise probability scores for each class label (lumen, calcification, or other) with the same size and resolution as the input image.
Segmentation Refinement Strategy
We use CRF as a postprocessing step to refine the results from the deep learning model. A method to integrate network outputs to a fully connected CRF is described previously.36 The deep learning model gives a score (vector of class probabilities) at each pixel. The CRF uses these values, pixel intensities, and corresponding spatial location information to generate crisp class labels. This process results in images with reduced noise as compared to simply performing a classwise median filter operation over the image. The goal is to reduce noise by generating a new labeling that favors assigning the same label to pixels that are closer to each other spatially using the scores generated by the neural network. For IVOCT images, the appearance kernel is inspired by the observation that nearby pixels with similar intensity are likely to be in the same class.
Overall, for each pixel, the CRF takes in probability estimates of each class, and the image pixel intensity, as input and outputs its final class ownership. Similar processing was performed when network training experiments were performed on the () images as well. Details of this implementation are described in A.2 in the Supplementary Material.
Computation of Calcification Attributes and Stent Deployment Calcification Score
We followed methods described previously37,38 to calculate plaque average thickness, average depth, and angle automatically. Figure 3 summarizes the method of calcified plaque quantification. First, the centroid of the lumen was determined (indicated by ). Next, rays were computed, which initiate from the centroid of the lumen and traverse to the back edge of the calcification border. The average depth and thickness of the calcification are defined using the following equations:
We used the method described by Fujino et al.13 for determining the stent deployment calcification score. The idea of calcification scoring is to define lesions that would benefit from plaque modification prior to stent implantation. The method is a cumulative score based on calcification: length, maximum angle, and maximum thickness. As quoting from their manuscript: “we assigned 1 or 2 points to each of three conditions: 2 points for maximum calcium angle , 1 point for maximum calcium thickness , and 1 point for calcium length .”13 In their study, they found that lesions with calcification score of 0 to 3 had “adequate stent expansion,” whereas lesions with a score of 4 had “poor stent expansion.”
Datasets and Labeling
The dataset included 48 VOIs taken from 34 clinical pullbacks, giving a total of 2640 in vivo images. The average number of images per VOI is 55 images. In vivo IVOCT pullbacks were obtained from the University Hospitals Cleveland Medical Center (UHCMC) imaging library.39 The dataset has calcification lesions, lipidous lesions, and mixed lesions with both calcification and lipidous regions, sometimes in the same image. In addition, VOIs not containing a calcification were also included in the dataset. All pullbacks were imaged prior to any stent implantation.
The in vivo IVOCT images were acquired using a frequency-domain OCT system using Illumien Optis (St. Jude Medical, St. Paul, Minnesota). The system comprises of a tunable laser light source sweeping from 1250 to 1360 nm. The system was operated at a frame rate of 180 fps, at a pullback speed of 36 mm/s, and has an axial resolution around . The pullbacks were analyzed by two expert readers in the Cartesian () view. Labels from () images were converted back to the polar () system for polar data set training.
The two expert readers manually labeled the VOIs using definitions given in the consensus document.37 Labels required consensus between the two readers. Calcifications are seen as a signal poor regions with sharply delineated front and/or back borders in IVOCT images. When a calcification was extremely thick and its back border was not clear due to attenuation, the maximum thickness was limited to 1 mm. An additional class “other” was used to include all pixels which could not be labeled into lumen or calcified plaque.
Network Training and Optimization
Our data was split into training, validation, and test, where VOIs were kept intact within a group. A tenfold cross-validation procedure was used to measure classifier performance and variation across data samples. For each fold, we assigned roughly 80% of the VOIs for training, 10% for validation (used to determine stopping criteria for training), and 10% for held out testing. The VOIs were rotated until all VOIs were in the test set once. Mean and standard error of sensitivities over the tenfolds are determined. As classes are not balanced regarding numbers of pixels, we use class weighting, as described by Eigen and Fergus.40 Details of this are described in A.1 in the Supplementary Material.
There are several issues associated with training. We optimize the categorical cross entropy error using the Adam optimizer41 with weight decay of . We avoid overfitting by adding a regularization term for the weights to the loss function. Training is stopped when the loss on the validation dataset does not improve by more than 0.01% for 10 consecutive epochs or when the network is trained for 120 epochs. In practice, the maximum number of epochs was rarely reached.
Image preprocessing and deep learning models are implemented using MATLAB 2017b (MathWorks Inc., Natick, Massachusetts) environment. The execution of the network is performed on a Linux-based Intel Xeon Processors x86_64 (x86_64 indicates Intel Xeon 64-bit platform; architecture based on Intel 8086 CPU) with a CUDA-capable NVIDIA™ Tesla P100 16GB GPU.
We now describe semantic segmentation results. In Fig. 4, segmentation of lumen and calcification are shown prior to CRF refinement. Both lumen and calcification regions show good agreement with GR labels. In Table 1, we compare segmentation performance when using the same labeled data arranged in () and (). Segmentation on the () representation gave superior performance for all classes. Therefore, all figures and all remaining analyses are done using the () data representation. We simply map results to () for easier visual interpretation. We found that refinement of segmentation results using CRF was a desirable step (Fig. 5). Deep learning segmentation after noise cleaning gave visually more accurate results in all test cases and enhanced performance (Table 2).
Comparison of segmentation performance when using the same labeled data arranged in (x,y) and (r,θ). Confusion matrices show performance of classifier across all 10 folds of the training data. Numbers indicate the mean and standard deviation for segmentation sensitivity (in percentage) across all folds. All results are after using noise-cleaning strategy. For x,y data: mean values ± standard deviation for (sensitivity, specificity, and F1 score) for each class is: other: (0.95±0.02, 0.96±0.02, 0.97±0.03), lumen: (0.98±0.02, 0.98±0.01, 0.90±0.01), calcium: (0.82±0.06, 0.97±0.01, 0.42±0.03). For (r,θ) data: mean values ± standard deviation for (sensitivity, specificity, and F1 score) for each class is: other: (0.97±0.01, 0.98±0.01, 0.98±0.01), lumen: (0.99±0.01, 0.99±0.006, 0.99±0.008), calcium: (0.85±0.04, 0.99±0.004, 0.73±0.01). Overall, when analyzing sensitivity, specificity, and F1 score, the classifier trained on the (r,θ) data had better performance. Using the Wilcoxson signed-rank test, we determined statistically significant differences (p<0.01) between the two methods for calcification F1 score.
|x,y||Predicted “other”||Predicted “lumen”||Predicted “calcification”|
|Predicted “other”||Predicted “lumen”||Predicted “calcification”|
Sensitivity and Dice coefficient calculated (A) before and (B) after segmentation noise cleaning using CRF for all classes for (r,θ) dataset. Improvement was not only observed visually but also numerically, as Dice coefficient for calcifications was improved from 0.42 to 0.76 with noise cleaning as in Table 2. CRF noise cleaning improved performance, and Wilcoxon signed-rank test suggested a significant difference (p<0.005) for calcifications.
We determined that lumen segmentation via deep learning was superior to our earlier dynamic programming lumen segmentation approach.38 Using the Wilcoxson signed-rank test, we determined statistically significant differences () between the two methods. Some clear instances of improvement are shown in Fig. 6. In particular, the dynamic programming approach can fail in the presence of thrombus or very eccentric lumens.
We used automated semantic segmentations to compute calcification attributes (Fig. 7). We analyzed the agreement between automated and manual measurements, including lumen area [Fig. 7(a)], calcification angle [Fig. 7(b)], calcification thickness [Fig. 7(c)], and calcification depth [Fig. 7(d)]. We observed excellent agreement of lumen areas, except for mismatch in images containing side branches. Calcification angle, thickness, and depth had good agreement between manual and automated measurements across the range of calcifications observed. Mean values of agreement were (95% CI, to ); (95% CI, to 37 deg); (95% CI, to 0.25 mm); and 0.05 mm (95% CI, to 0.25 mm) for lumen area, calcification angle, calcification thickness, and calcification depth, respectively.
Finally, we used automated semantic segmentation to compute the stent deployment calcification score as described in Sec. 2.4 (Table 3). We assessed five representative lesions. We found strong agreement between manual and automated assessments for four out of five cases. The case that had the least agreement between manual and automated assessment is shown in Fig. 8. What makes this case challenging is the calcification is separated by the guidewire shadow. Manual analysts defined this lesion as two calcifications; automated results showed this as one.
IVOCT-based calcification scoring for representative lesions. We used the calcification scoring system developed by Fujino et al.13 on five held out lesions and compared manual and automated measurements. Scores are based on lesion length, maximum thickness, and maximum angle. Score is cumulative sum of the following metrics: two points for maximum angle >180 deg, one point for maximum thickness >0.5 mm, and one point for length >5 mm. The idea of calcium scoring is to define the lesion that would benefit from plaque modification prior to stent implantation. Lesions with calcium score of 0 to 3 had excellent stent expansion, whereas the lesions with a score of 4 had poor stent expansion. Scores for each attribute as shown as follows: attribute value (score). The calcification scores are identical between manual and predicted results for the first four lesions. Lesion 5 is a challenging case and is shown in Fig. 8.
|Lesion||Name||Frames||Length (mm)||Maximum calcium angle (deg)||Maximum calcium thickness (mm)||Score||Maximum calcium angle (deg)||Maximum calcium thickness (mm)||Score|
|1||Ca1||19||1.9 (0)||45 (0)||0.52 (1)||1||89 (0)||0.91 (1)||1|
|2||Ca2||20||2 (0)||68 (0)||0.67 (1)||1||124 (0)||0.91 (1)||1|
|3||Ca3||73||7.3 (1)||330 (2)||0.60 (1)||4||328 (2)||0.82 (1)||4|
|4||Ca4||32||3.2 (0)||132 (0)||1.1 (1)||1||123 (0)||1.4 (1)||1|
|5||Ca5||123||12.3 (1)||146 (0)||0.88 (1)||2||227 (2)||1.0 (1)||4|
We developed an automated method for calcification analysis, which included methods for semantic segmentation using deep learning, for calculation of calcification attributes, and for calculation of a previously developed stent-deployment calcification score. We used SegNet [with transfer learning using the pretrained VGG-16 weights and with receptive field of () that enable substantial contextual information to be included for determining areas containing calcifications] and trained/tested on 48 VOIs (2640 IVOCT images). The dataset contained a variety of lesion types, including: calcifications, lipidous, and mixed segments with both calcifications and lipidous regions, as well as segments devoid of these characteristics. Having a variety of disease states is the key for any robust learning system. In a remaining dataset held out from any optimization, we automatically computed the stent-deployment calcification score and obtained very good agreement with manual determinations. This suggests that our methods (with optional manual corrections, as argued below) could predict stent treatment outcomes from prestent IVOCT images and could help determine, which lesions would benefit from prestent lesion preparation (e.g., atherectomy).
When we compared segmentation performance using () and () representations of the data, we found that () gave a better sensitivity, specificity, and across all classes. There are multiple potential reasons. First, data are originally acquired in the () domain. To create the () representation, data must be geometrically transformed leading to increased interpolation as one goes out from the catheter center. Potentially, this interpolation effect could negatively affect the success of local kernels. Second, the () data representation was amenable to an elegant data augmentation scheme as described in Sec. 2.1, allowing us to create heavily augmented data. Third, we were able to process the () images at full resolution, but had to resize the () images in order to train the SegNet model. This could have affected the ability of the CNN to recognize features such as the sharp edges at calcifications. Fourth, in the () domain, calcified lesions have one “orientation” with the leading and trailing edges roughly parallel to the lumen. In the case of the () representation, lesions are at all possible orientations in the image array. Even though we augmented data by rotating the () images, the similar look of lesions in () may have comparatively enhanced learning.
We found it beneficial to implement CRF for refinement of initial segmentation results. We applied CRF to the vector of class probabilities and the input image intensity, at each pixel location. This enhanced the final segmentation and improved the performance of the downstream analysis. As shown in Fig. 5, CRF smooths the segmentation results and prevents isolated spots of calcification from appearing in our results. This causes a visual improvement in our results, and this improvement is reflected numerically by the increase in sensitivities and Dice coefficient following CRF implementation.
Our approach has advantages for segmenting the lumen as compared to previous methods such as dynamic programming38 (Fig. 6). The presence of image artifacts (e.g., thrombus or improper blood clearing during image acquisition) as well as very eccentric lumens create challenges to lumen segmentation algorithms that use edges, such as our previous dynamic programing approach. Our deep learning approach takes contextual area information into account, which reduces the impact of these artifacts on determining the lumen border.
We were able to quantify calcification attributes based on the automated segmentations, including lumen area, calcification arc, thickness, and depth (Fig. 7). For lumen area, the automated measurements were excellent (good precision and bias) as compared to manual assessments. Most errors were in regions with side branches, which are ambiguous for analysts to label. Automated measurements of calcification arc also had strong agreement with manual assessments. Segmentation errors are mostly related to calcification deposits that have small arc angles (), which have less impact on clinical decision-making. We had high correlation with manual analysis with large arc angles (), which is encouraging, as these large calcifications are more likely candidates for plaque modification prior to stenting. Calcification thickness measurements had good agreement between manual and automated assessments, although our algorithm had a tendency to overestimate calcification thickness. Our algorithm tends to agree with manual determination of the calcification front border but has less agreement with the back border. This is due to the IVOCT signal having limited depth penetration, making determination of the calcification back border difficult, even for manual assessments. Finally, the calcification depth had a strong correlation between automated and manual measurements. We observe a trend that errors tend to increase with larger depths. One reason is that calcification depth is based on both the lumen and calcification segmentation, so errors in lumen segmentation (observed in larger lumens) could propagate to the calcification depth measurement.
Ultimately, we want to be able to use calcification segmentations to provide information to cardiologists concerning the need for employing calcification modification strategies (e.g., atherectomy or IVL as with ShockwaveTM). Visualization of segmented calcification is one approach, but another is calculation of the stent-deployment calcification score. Automatically obtained scores were identical to manually obtained ones in four out of five cases. The score defines lesions that would benefit from plaque modification prior to stent implantation. The method is a cumulative score based on calcification attributes (i.e., maximum angle). Lesions with calcification score of 0 to 3 had “adequate stent expansion,” whereas lesions with a score of 4 had “poor stent expansion.” The case with disagreement is shown in Fig. 8. This case is challenging because the calcification is separated by the IVOCT guidewire shadow. Analysts chose not to label this region, but our automated method bridged the guidewire region, calling it as one continuous calcification. It is highly likely that calcifications occur behind the guidewire in this lesion, but we can only be certain if histology is acquired from this sample.42 Based on the scoring system presented in Table 3, if this region was calcifications, lesion preparation would be necessary for treatment. Thus, interpreting what is behind the guidewire would alter clinical decision-making. Although automated stent deployment calcification score is promising, if this were to be implemented clinically, one would likely want to allow operator editing of calcifications, particularly at locations important to the score (e.g., the image having the maximum arc angle). Using today’s GPU hardware (NVIDIA GTX 1080 Ti), it is possible to perform calcification semantic segmentation in under 1 s per frame. This suggests that live-time use in the clinic would be possible, especially if the operator identified volumes of interest (VOI) for analysis.
There are potential modifications to our study. Developing our segmentation method required the manual labeling of thousands of IVOCT images. It is possible that some of our labels could be wrong (e.g., Fig. 8), and that analysts might change their mind after viewing automated results. Thus, we could implement an active learning scheme where analysts could do a second pass of the dataset to possibly modify the labels after viewing automated results. In this study, 48 VOI from 34 pullbacks were used. It is possible that the use of more cases could improve generalizability. In addition, it would be interesting to include labeled lipidous regions. Finally, adding additional 3-D information might help make some determinations.
Coronary calcifications are a major determinant of the success of coronary stenting. We developed an automatic method for semantic segmentation of calcifications in IVOCT images using deep learning. Results can be applied to determine calcification attributes, and for computation of an IVOCT-based calcification score, which can help predict stent treatment outcome for target lesions.
Dr. Bezerra has received consulting fees from Abbott Vascular. Other authors report no relevant conflicts of interest with this manuscript.
This project was supported by the National Heart, Lung, and Blood Institute through U.S. National Institutes of Health (NIH) Grants R21HL108263, R01HL114406, and R01HL143484, by NIH construction Grant (C06 RR12463), and by the Choose Ohio First Scholarship. These grants were attained via collaboration between Case Western Reserve University and University Hospitals of Cleveland. The content of this report is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The grants were obtained via collaboration between Case Western Reserve University and University Hospitals of Cleveland. This work made use of the High-Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. The veracity guarantor, David Prabhu, affirms to the best of his knowledge that all aspects of this paper are accurate.