Purpose: Quantitative lung measures derived from computed tomography (CT) have been demonstrated to improve prognostication in coronavirus disease 2019 (COVID-19) patients but are not part of clinical routine because the required manual segmentation of lung lesions is prohibitively time consuming. We aim to automatically segment ground-glass opacities and high opacities (comprising consolidation and pleural effusion).
Approach: We propose a new fully automated deep-learning framework for fast multi-class segmentation of lung lesions in COVID-19 pneumonia from both contrast and non-contrast CT images using convolutional long short-term memory (ConvLSTM) networks. Utilizing the expert annotations, model training was performed using five-fold cross-validation to segment COVID-19 lesions. The performance of the method was evaluated on CT datasets from 197 patients with a positive reverse transcription polymerase chain reaction test result for SARS-CoV-2, 68 unseen test cases, and 695 independent controls.
Results: Strong agreement between expert manual and automatic segmentation was obtained for lung lesions with a Dice score of 0.89 ± 0.07; excellent correlations of 0.93 and 0.98 for ground-glass opacity (GGO) and high opacity volumes, respectively, were obtained. In the external testing set of 68 patients, we observed a Dice score of 0.89 ± 0.06 as well as excellent correlations of 0.99 and 0.98 for GGO and high opacity volumes, respectively. Computations for a CT scan comprising 120 slices were performed under 3 s on a computer equipped with an NVIDIA TITAN RTX GPU. Diagnostically, the automated quantification of the lung burden % discriminate COVID-19 patients from controls with an area under the receiver operating curve of 0.96 (0.95–0.98).
Conclusions: Our method allows for the rapid fully automated quantitative measurement of the pneumonia burden from CT, which can be used to rapidly assess the severity of COVID-19 pneumonia on chest CT.
Coronavirus disease 2019 (COVID-19) is a global pandemic and public health crisis of catastrophic proportions, with over 437 million confirmed cases worldwide as of March 2, 2022.1 Although the vaccines are now available, they are not 100% effective; new strains are emerging and immunization coverage varies significantly between the world regions due to socio-economic differences. It is likely that vaccine boosters will be necessary, and continuous monitoring for the disease will be needed. Although the diagnosis of COVID-19 relies on a reverse transcription polymerase chain reaction (RT-PCR) test in respiratory tract specimens, computed tomography (CT) remains the central modality in disease staging.2–5 Specific CT lung features include peripheral and bilateral ground-glass opacities (GGOs), with round and other specific morphology as well as peripheral consolidations, and increasing extension of such opacities has been associated with the risk of critical illness.6–8 Although conventional visual scoring of the COVID-19 pneumonia extent correlates with clinical disease severity, it requires proficiency in cardiothoracic imaging and ignores lesion features, such as volumes, density, or inhomogeneity.9,10 On the other hand, CT-derived quantitative lung measures are not part of the clinical routine, despite being demonstrated to improve prognostication in COVID-19 patients, due to prohibitively time-consuming manual segmentation of the lung lesions required for computation.11–13 The chest CT is currently indicated in COVID-19 patients with moderate or severe respiratory symptoms and high pretest probability of infection, or any other clinical scenario requiring rapid triage. Importantly, over 15 million chest CT (including cardiac CT) are performed a year in the Unites States for indications not related to COVID-19.14 Additionally, every thoracic Positron Emission Tomography (PET)/CT and Single Photon Emission Computed Tomography (SPECT)/CT scan (including myocardial perfusion imaging) will include Computed Tomography Attenuation Correction (CTAC) covering the lung area. Parenchymal opacification associated with COVID-19 can be potentially seen on these exams. Critically, in the coming months and years, it is likely that COVID-19 changes may often be an incidental finding on chest CT performed for other diseases in asymptomatic COVID-19 patients. These incidental findings may also be on CTAC maps often acquired in conjunction with myocardial perfusion SPECT and PET MPI. Indeed, some first reports of such incidental findings have been reported on PET/CT in the Journal of Nuclear Medicine (April 2020) by Albano et al.15 in Italy, followed by others.16–19 It is worth noting that these CTAC scans are not routinely reviewed for other abnormalities and are often viewed with window and level settings not set for review of lung abnormalities. Thus, a rapid automated alert system for COVID-19 related abnormalities would be of great benefit in such situations.
Deep learning, a class of artificial intelligence (AI), has shown to be very effective for automated object detection and image classification from a wide range of data. Myriad AI systems have been introduced to aid radiologists in the detection of lung involvement in COVID-19, with several presenting the potential to improve the performance of junior radiologists to the senior level.12,20 Bai et al.21 developed a classification network to differentiate between COVID-19 pneumonia and other pneumonia and achieved good performance in diagnosing the disease, achieving an area under the receiver operating characteristic (AUROC) of 0.95. They provided a heatmap in an effort to explain the model predictions, but it will be of great importance in disease staging and prognosis if the model can pin-point the legions accurately. This shortcoming was addressed by Zhang et al.,20 who developed a system that can diagnose the disease, segment the lungs and lesions into several classes, and be used to evaluate drug treatment effects. They developed a two-stage segmentation network for segmenting lesions in lungs from CT slices, experimenting with various segmentation frameworks, and adopted DeepLabv3 as the backbone for its better segmentation performance. The model was evaluated using mean Dice coefficient and pixel accuracy by five-fold cross-validation test, achieving a 0.587 mean Dice score.
On the other hand, Fan et al.22 developed a novel COVID-19 lung infection segmentation network that combines high-level features using a parallel partial decoder to generate a global map as initial guidance for further steps. To establish a relationship between lesion boundaries, they used their novel implicit recurrent reverse attention modules. The final training loss comprised weighted binary cross-entropy applied at different stages of the network and weighted intersection over union loss. The authors went beyond to address the shortage of expert annotations by modifying their training strategy to accommodate semi-supervised learning into their model. Although this model does not perform multi-class segmentation by itself, it can separate the lesions into two classes using UNet and their model output as guidance for segmentation, achieving a mean Dice score of 0.541.
Similarly, Chaganti et al.11 also developed a system for binary segmentation of CT abnormalities related to COVID-19. They trained two different models: one for segmenting lung lobes and another for lesions. The lung segmentation model was trained using a deep image-to-image network, and the lesion segmentation model was trained using a UNet-like architecture. The lesion segmentation model performs binary segmentation, that is, all of the lesions (GGOs and consolidations) were treated as one class during training and later separated into two classes by thresholding the voxels at Hounsfield units (HU) during inference. Finally, they introduced two measures for evaluating the severity of the disease: percentage of opacity and percentage of high opacitiy. The overall performance was evaluated using Pearson correlation between the severity measures.
Gao et al.23 developed a dual-branch combination network for joint binary segmentation and classification of COVID-19 using CT images. They proposed a lesion attention module to improve the sensitivity of the model in detecting small lesions. The lesion attention module is also used to interpret model predictions for the assessment of classification results. They achieved a Dice score of 0.835 on an internal test set in segmenting the lesions and an AUROC of 0.9771 in classifying COVID-19 patients.
The work presented in this paper builds on previous research to explore the quantitative prognostication and disease staging by segmenting the COVID-19 lesions into multiple classes. Earlier work focused on segmentation using one slice in the CT at a time, whereas we focus on benefiting from additional information about the anatomy and the lesions in several adjacent slices. However, most three-dimensional (3D) medical segmentation networks consume a lot of memory in storing the intermediate features for skip connections24,25 making them difficult to implement in low-end clinical systems. To this end, we adopt the state-of-the-art segmentation network by Tao et al.26 and replace the attention from multi-scale input to attention from adjacent slices using long short-term memory (LSTM) recurrent network,27 which are well-known for their long data sequence/series processing capabilities. We do so to imitate a radiologist reviewing adjacent slices of a CT scan and aggregate lesion information while making manual annotations. We employ a specific variant of the LSTM network known as the convolutional long short-term memory (ConvLSTM) network,28 which is capable of handling images directly. ConvLSTM operates directly on images, facilitating rapid segmentation and accurate 3D quantification of the disease involvement of lung lesions in COVID-19 pneumonia from both contrast and non-contrast CT images. ConvLSTM networks have the capability of preserving relevant features while simultaneous dismissing irrelevant ones in the form of the feedback loop, which translates into a memory-sparing strategy for the holistic analysis of the images.
The cohort used in this study comprised 264 patients, who underwent chest CT and had a positive RT-PCR test result for SARS-CoV-2. A total of 197 patients were included in the training cohort (), and 68 were used for external validation (). Datasets for 187 out of 197 patients from the training cohort were collected from the prospective, international, multicenter registry involving centers from North America [Cedars-Sinai Medical Center, Los Angeles ()], Europe [Centro Cardiologico Monzino (), and Istituto Auxologico Italiano (); both Milan, Italy], Australia [Monash Medical Centre, Victoria, Australia ()], and Asia [Showa Medical University, Tokyo, Japan ()], where either non-contrast () or contrast-enhanced () chest CT was performed to aid in the triage of patients with a high clinical suspicion for COVID-19, in the setting of a pending RTPCR test or comorbidities associated with severe illness from COVID-19. The population is given in Table 1. Datasets for the remaining 10 COVID-19 patients were derived from an open-access repository of non-contrast CT images; therefore, no clinical data were provided for this cohort. Out of 31,560 transverse slices available, 15,588 had lesions. The external testing cohort comprised 68 non-contrast CT scans of COVID-19 patients: about 50 from an open-access repository29 and 18 additional ones from Italy (Centro Cardiologico Monzino). There were 12,102 transverse slices available in this cohort, and 6,503 had lesions (Table 2). All data were deidentified prior to being enrolled in this study. The CT images from each patient and the clinical database were fully anonymized and transferred to one coordinating center for core lab analysis. The study was conducted with the approval of local institutional review boards (Cedars-Sinai Medical Center IRB# study 617) and written informed consent was waived for fully anonymized data analysis.
Patient baseline characteristics and imaging data in a training cohort.
Note: The data presented in the table are as n (%) or mean ± SD.
Ground Truth Generation
Images were analyzed at the Cedars-Sinai Medical Center core laboratory by two physicians (K.G. and A.L.) with 3 and 8 years of experience in chest CT, respectively, and who were blinded to clinical data. A standard lung window (width of 1500 HU and level of ) was used. Lung abnormalities were segmented using semi-automated research software (FusionQuant Lung v1.0, Cedars-Sinai Medical Center, Los Angeles, California). These included GGO, consolidation, or pleural effusion according to the Fleischner Society lexicon. Consolidation and pleural effusion were collectively segmented as high-opacity to facilitate the training of the model due to a limited number of slices involving these lesions. Chronic lung abnormalities, such as emphysema or fibrosis, were excluded from segmentation, based on correlation with previous imaging and/or a consensus reading. GGO was defined as hazy opacities that did not obscure the underlying bronchial structures or pulmonary vessels; consolidation as opacification obscuring the underlying bronchial structures or pulmonary vessels; and pleural effusion as a fluid collection in the pleural cavity. The total pneumonia volume was calculated by summing the volumes of the GGO and consolidation components. The total pneumonia burden was calculated as (total pneumonia volume/total lung volume) × 100%. Difficult cases of quantitative analysis were resolved by consensus.
Additionally, to assess the diagnostic performance of the methods trained and tested with controls (without any lung abnormalities), we utilized a set of cases from the national lung screen trial (NLST)30 with normal lung scans. The population characteristics are described in Table 3.
NLST controls baseline characteristics.
NOTE: The data presented in the table are as n (%) or mean ± SD.
The objective is to learn the function to classify each CT voxel into one of following three classes: GGOs, high opacities, and background. This act of differentiating regions based on their semantic properties is called semantic segmentation.4.1, we introduce the data preprocessing technique used in our method. In Sec. 4.2, we explain in detail the functioning of each block of our network architecture. Finally, in Secs. 4.3 and 4.4, we introduce the loss functions31 and optimization techniques used in our method.
CT scans from different scanners or with different reconstruction parameters may have different appearance (as seen in column 1 of Fig. 1) and contain voxel values (HU) ranging between to for a 12-bit scan. Therefore, there is a need for homogenizing the data before we train or infer from it. The input stack of CT images are first cropped to the body region of the middle-most image and resized to . Because we have a very small dataset to train on, we randomly augment the data with rotation of , translation of up to 10-pixels in the - and -directions, and scaling of [0.9, 1.05] times. Finally, we normalize the data by clipping the Hounsfield units between to (expert reader’s lung window), followed by a voxel intensity scaling technique called standardization or -score normalization.
To crop the scan to the body region, we threshold the scan at and create a binary mask followed by a series of morphological operations: closing, erosion, dilation, etc., and obtain a bounding box around the largest object in the threshold scan. Transferring this bounding box, the original scan returns the body cropped input scan (shown in column 2 of Fig. 1).
The network architecture, shown in Fig. 2, is inspired by the hierarchical multi-scale attention for semantic segmentation26 with major changes in the attention branch. Instead of the attention branch looking at the input at various different scales as in Ref. 26, we formulate the attention branch to focus on adjacent slices to aggregate information about the lesions/anatomy from the neighboring slices using a ConvLSTM in the attention branch of the network to improve lesion recognition.
All of the larger and easy-to-classify lesions are segmented by this branch of the network. It consists of two trainable blocks: the dense block , also referred to as Trunk elsewhere in the paper, and the segmentation block . Throughout this paper, the subscript of represents the branch name, and the superscript represents the block in that branch.
This is the feature extraction block that extracts 256 feature maps of size from input . It is made up of the first dense block of DenseNet121.32 The reason for choosing a DenseNet for feature extraction is its ability to strengthen feature propagation and mitigate the vanishing-gradient problem, as well as its reduced number of trainable parameters.
Segmentation block 1
This block is downstream to the dense block. It uses the 256 up-scaled feature maps from as input and classifies each voxel into one of three classes. This block is composed of three convolutional sub-blocks: the first two are made up of convolutional layers followed by a batch normalization layer and a leaky ReLU layer and the final sub-block is just a convolutional layer (see segmentation block in Fig. 2).
All of the errors made by the main branch in ambiguous and difficult to segment parts of the lesions are corrected by the attention branch using information from adjacent slices (shown in Fig. 3). The attention branch comprises a sequential processor , a segmentation block , and a self-attention block .
We used ConvLSTM33 for processing sequential data. The ConvLSTM block allows for imitating a radiologist reviewing adjacent slices of a CT scan and aggregate lesion information from adjacent slices to detect lung abnormalities and ensure appropriate annotations.
Segmentation block 2
This block is structurally identical to segmentation block 1, except for the input layer. It takes in the main segmentation slice concatenated with ConvLSTM output as the input.
As in Ref. 26, we also adopt an attention mechanism to combine multi-branch outputs ( and ) together at a pixel level. The attention block is identical to the segmentation block in structure with the only difference being that the final convolutional layer is followed by a sigmoid layer. This block takes in the output of the ConvLSTM block, as shown in Eq. (5), as input and learns to pixel-wise weight () the outputs from the two branches to produce the final prediction [Eq. (7)].
The final prediction is given by the following equation in which the argmax is taken over the channel dimension:
In our training, we utilize a combination of focal loss31 and Visual Geometry Group (VGG) loss.34 The focal loss compensates for the imbalance between background, GGO, and high opacity classes. The importance for each of the classes in focal loss was set to [0.1, 1.0, 1.0], respectively, and the focusing parameter was set to 3. This focusing parameter in the focal loss allows the model to penalize the hard to classify samples more than the easy ones. We tap into the low-level features in the VGG network to compute the VGG loss, which represent edge information, for better segmentation output. These losses are weighted equally during training
The model parameters were optimized using an Adam (adaptive moment estimation) optimizer35 with initial learning rate of , weight decay of , and training batch size of 32. All of the model parameters were initialized using Xavier initialization,36 except for the dense block, which was initialized using the weights pre-trained on ImageNet.37 To avoid over-fitting while fully train the model, we use a popular learning rate scheduler called ReduceOnPlateau (Fig. 4).38 In this technique, a metric (validation loss, accuracy, etc.) is continuously monitored throughout the training. If no improvement is seen in the tracked metric for “patience” number of epochs/iterations, the current learning rate is then reduced by the given “factor.” The training continues as usual until the learning rate is reduced beyond a certain minimum (). As soon as the learning rate hits this minimum, the training is stopped, saving the model at the last best validation metric step. In our experiment, the parameters factor and patience were set to 0.1 and 5, respectively.
We trained the model using the Pytorch (v1.7.1) deep-learning framework and incorporated research CT lung analysis software (Deep Lung) written in C++. The training was performed on an NVIDIA TITAN RTX 24GB GPU with a tenth generation Intel Core i9 CPU. Deep Lung can be used with or without the GPU acceleration.
The primary endpoint of this study was the performance of the deep-learning method compared with the evaluation by the expert reader. The model is extensively evaluated using the Dice similarity coefficient for structural similarities. The reported Dice score is the mean of per-patient Dice scores computed over all slices in the scan. We also show the quantitative performance on volumes using the Bland–Altman plot and coefficient of determination (Pearson correlation). To perform a robust non-biased evaluation of the framework, five-fold cross-validation was used, using five independently trained identical models and five exclusive hold-out sets, each of 20%. The whole cohort of cases was split into five subsets called folds. For each fold of the five-fold cross-validation, the following data splits were used: (1) training split (125 or 126 cases) was used to train the ConvLSTM; (2) alidation split (32 cases) was defined to tune the network, select optimal hyperparameters, and verify that there was no over-fitting; and (3) test split (39 or 40 cases) was used for the evaluation of the method. The final results were obtained by concatenating the results from five test subsets. Thus, the overall test population was 197, referred to as internal test set further in the paper. We also test our model on an unseen external dataset consisting of patients.
Diagnostic Per-Patient Performance
To assess the diagnostic performance of the convLSTM on a per-patient basis, we trained our model utilizing an additional NLST controls (read as number of controls in training) during the five-fold cross-validation, making the total training cases . An additional set of normal NLST cases (read as number of controls in testing), added during testing, were evaluated with the best fold model from the five-fold cross-validation. Thus, the total normal NLST cases included in experiment sums to . Each normal case was evaluated with the model, which did not include these cases for training. We report the specificity at 95% sensitivity for the convLSTM models trained with and without additional controls. The diagnostic sensitivity and specificity was compared using McNemar’s test39 on paired measurements.
Table 4 shows how the results are affected by altering different building blocks of our model.40 We select the model with the best validation Dice score (mean of GGO and high opacity) for the final evaluation. The model configurations with the highest and lowest performances are highlighted in green and orange, respectively. We experimentally found that the best results were obtained at buffer size .
Ablation study on fold-1 for model selection (Ncov=197). Highest and lowest performances are highlighted in bold and italic, respectively.
In Table 5, we show the performance of our model as compared with UNet2D and UNet3D across five-folds (). For fair comparison, UNet2D and UNet3D were trained with an identical training setup to our model, i.e., the same loss function, optimizer, learning rate strategy, and training fold splits. The performance is measured with two main metrics: mean Dice score and compute resource utilization. The mean Dice score reported in Table 5 gives the binarized mean Dice score per class. The computation time and memory are calculated for 128 CT slices and 16 CT slices, respectively, on an Nvidia Titan RTX GPU and Intel i9 CPU using Pytorch Profiler.41 In Fig. 5, we show the significance of our results using the Wilcoxon signed-rank test. We see that our model outperforms UNet2D () and Unet3d () in segmenting high-opacities, has a comparable performance to UNet2D () in segmenting GGOs, and significantly outperforms UNet3D () in segmenting GGOs. But the major advantages of our model over the other two are in terms of computational resources as follows:
Hence, it is can be easily deployed on less powerful machines in clinical setups.
Model comparisons on Ncov=197 (UNet2D, UNet3D, and our). Best performance is highlighted in bold.
Note: The data preprocessing is the same for all models and takes about 2.52 s for 128 CT slices.
Model complexity in terms of number of trainable parameters and the required tera floating point operations (TFLOPs) is shown in Table 6.
Model complexity (UNet2D, UNet3D, and our).
Note: For TFLOPs, the lower the better.
Lesion Quantification in the Internal Testing (Ncov = 197) and External Testing (Next = 98) Cohorts
In Table 7, we present the interquartile range (IQR) and coefficient of determination () on volumes between expert and automatic segmentation along with the overall per-patient mean Dice score for both internal as well as external test datasets. In the internal test set (), no significant difference between volumes of expert and automatic segmentations was observed for GGOs (). Similarly, no significant difference between volumes of expert and automatic segmentations was observed for GGOs () or high opacities in the external test set ().
Our model performance on Ncov=197 and Next=68.
Note: IQR, interquratile range; ml, milliliter; R2, coefficient of determination.
The Bland–Altman analysis on the internal test set demonstrated a low bias of 0.56 (Fig. 6) and 18.61 ml (Fig. 7) for GGOs and high opacities, respectively. Similarly, the Bland-Altman analysis on the external test set demonstrated a low bias of 7.16 (Fig. 8) and 2.92 ml (Fig. 9) for GGOs and high opacities, respectively. After further analyzing the anomalies (cases outside the limit of agreement) in the Bland–Altman plots (Figs. 6 and 7), we observed that some input scans were corrupted due to various reasons including motion artefacts, errors in expert annotations, etc., as shown in Fig. 10. Thus, a significant () difference in high opacity volumes between expert and automatic segmentations was observed in the internal test set.
The internal testing cohort consisted of 30 contrast enhanced and 167 non-contrast CT scans. We observed no significant difference () between the mean Dice scores calculated for segmentations from non-contrast and contrast-enhanced CT scans, which were () and (), respectively.
We trained the same convLSTM model with and without additional controls and tested them with five-fold cross-validation. We also tested an additional unseen controls with the best performing model from five-fold cross-validation. The AUROC with and without NLST in training was 0.965 and 0.959, respectively, but they did not reach significance. However, McNemar’s test results (Table 8) show that the model trained with an additional NLST cases significantly increased the specificity at 95% sensitivity of the model. Thus, adding NLST controls to the training decreased the false positive rate in diagnosis. The overall per-patient mean Dice score also improved drastically, as shown in Table 8.
Diagnostic performance on Ntotal=Ncov+Ncontrol=892 NLST patients.
We developed and evaluated a novel deep-learning ConvLSTM network approach for fully automatic quantification of the COVID-19 pneumonia burden from both non-contrast and contrast-enhanced chest CT. To the best of our knowledge, ConvLSTM has not been applied before for segmentation of medical imaging data. We demonstrated that automatic pneumonia burden quantification by the proposed method shows strong agreement with expert manual measurements and rapid performance that is suitable for clinical deployment. Although vaccines have been developed to protect from COVID-19, the incidental findings of COVID-19 abnormalities due to imperfect vaccination rates and new strains will be a mainstay of medical practice. This method will provide a ‘real-time’ detection of parenchymal opacifications associated with COVID-19 to the physician and aid image-based triage to optimize the distribution of resources during the pandemic. Figure 11 shows the lesion annotations (expert and automatic) in 3D for one of the patients in test set.
The evolution of deep-learning applications for COVID-19 is reflecting the changing role of CT imaging during the pandemic. Initially, when RT-PCR testing was unavailable or delayed, chest CT was used as a surrogate tool to identify suspected COVID-19 cases.42 AI-assisted image analysis could improve the diagnostic accuracy of junior doctors in differentiating COVID-19 from other chest diseases including community-acquired pneumonia and facilitate prompt isolation of patients with suspected SARS-CoV-2 infection.20,43
Currently, when RT-PCR testing is widely available with timely results, rapid quantification of the pneumonia burden from chest CT as proposed here can aid prognostication and disease staging in patients with COVID-19. As demonstrated in prior investigations, increasing attenuation of GGO and a higher proportion of consolidation in the total pneumonia burden had prognostic value, thus underscoring the importance of utilizing all CT information for training the patients.13,44 Manual segmentation of the lung lesions is, however, challenging and prohibitively time-consuming task due to complex appearances and ambiguous boundaries of the opacities.45 To automate the segmentation of respective lung lesions in COVID-19, several different segmentation networks have been introduced.11,20,22,46 Most of these tend to consume a lot of memory in storing the intermediate features for skip connections, and it may be favorable to use several input slices to improve the performance of semantic segmentation tasks.24,25 We propose the application of ConvLSTM, presenting the potential to outperform other neural networks in capturing the spatio-temporal correlations, due to its capability of preserving relevant features with simultaneous dismission of irrelevant ones in the form of the feedback loop for the memory-sparing strategy and holistic analysis of the images.28 It has been found that ConvLSTM localized at the input end allowed for effectively capturing the global information and optimizing the model performance.
Automated segmentation of lung lesions with ConvLSTM networks offers a solution to generating big data with limited human resources and minimal hardware requirements. Because results of segmentation are presented to the human reader for visual inspection, eventual corrections enable the implementation of a human-in-the-loop strategy to reduce the annotation effort and provide high-volume training datasets to improve the performance of deep-learning models.45 Furthermore, objective and repeatable quantification of the pneumonia burden might aid the evaluation of the disease progression and assist the tomographic monitoring of different treatment responses.
Our study had several limitations. First, different patient profiles and treatment protocols between countries may have resulted in heterogeneity in COVID-19 pneumonia severity. Second, most of the CT scans were acquired during the hospital admission; therefore, availability of the slices with high-opacity (consolidations and plural effusion), representing a peak stage of the disease, was limited. Finally, training and external validation datasets comprised a relatively low number of patients manually segmented by two expert readers; however, to mitigate this, we have utilized repeated testing that has allowed us to evaluate expected average performance of the model.
In our experiments, we have a diverse multi-center cohort not typically available for training. But for future research, in experiments with limited availability of expertly annotated data, it is desirable to incorporate advanced data augmentation techniques as proposed in Refs. 47 and 48 and regularization techniques49 for better model generalization and for mitigating the issue of over-fitting.
We proposed and evaluated a deep-learning method based on convolutional LSTM and Hierarchical multi-scale attention network for fully automated quantification of the pneumonia burden in COVID-19 patients from both non-contrast and contrast-enhanced CT datasets. The proposed method provided rapid segmentation of lung lesions with strong agreement with manual segmentation and may represent a robust tool to generate big data with an accuracy similar to that of an expert reader. The model generalized very well on unseen external datasets. In our proposed method, the attention network using ConvLSTM largely helps with error correction in segmentation and can be used in other segmentation tasks in which one can leverage information from adjacent slices of the scan.
The authors have no relevant financial interests in the manuscript and no other potential conflicts of interest to disclose.
This research was supported by Cedars-Sinai COVID-19 funding. This research was also supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health (NIH; R01HL133616). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Kajetan Grodecki was supported by the Foundation for Polish Science (FNP). IRCCS Istituto Auxologico Italiano research was supported by the Italian Ministry of Health. We thank the National Lung Screening Trial (NLST) consortium for supporting our research by providing us with valuable data. A preliminary version50 of this work with a subset of patients was presented at SPIE Medical Imaging 2022.
https://doi.org/10.1148/ryct.2020200441 Google Scholar
https://pytorch.org/docs/1.7.1/_modules/torch/optim/lr_scheduler.html#ReduceLROnPlateau Google Scholar
https://pytorch.org/docs/1.7.1/autograd.html#profiler Google Scholar
Aditya Killekar is a programmer/analyst at the Cedars-Sinai Medical Center, Los Angeles, California. He received his MS degree in electrical engineering from the University of Southern California in 2018. He specializes in computer vision and machine learning. His current research interests include applications of deep learning in cardiac imaging. Apart from research, he is passionate about teaching and has served as a volunteer to educate and inspire students from various parts of Los Angeles.
Kajetan Grodecki, MD, PhD, graduated from Medical University of Warsaw and he is currently working as a cardiology resident. He is interested in non-invasive modalities to optimize interventional procedures as well as developing AI-based solutions to imporve risk stratification.
Piotr Slomka is the Director of Innovation in Imaging, Professor of Medicine and Cardiology, Division of Artificial Intelligence in Medicine, Cedars-Sinai, and Professor of Medicine In-Residence, UCLA School of Medicine. He received his PhD in medical biophysics from the University of Western Ontario. He serves as PI for an NIH R35 Outstanding Investigator Award aimed to transform the clinical utility of PET/CT in detection and management of high-risk coronary artery disease.