Purpose To investigate the impact of image registration on deep learning-based synthetic CT (sCT) generation. Methods Paired MR images and CT scans of the pelvic region of radiotherapy patients were obtained and non-rigidly registered. After a manual verification of the registrations, the dataset was split into two groups containing either well-registered or poorly-registered MR-CT pairs. In three scenarios, a patch-based U-Net deep learning architecture was trained for sCT generation on (i) exclusively well-registered data, (ii) mixtures of well-registered and poorly-registered data or on (iii) poorly-registered data only. Furthermore, a failure case was designed by introducing a single misregistered subject in the training set of six well-registered subjects. Reconstruction quality was assessed using mean absolute error (MAE) in the entire body and specifically in bone and Dice similarity coefficient (DSC) evaluated cortical bone geometric fidelity. Results The model trained on well registered data had an average MAE of 27.6 ± 2.6HU on the entire body contour and 79.1 ± 16.1HU on the bone. The average cortical bone DSC was 0.89. When patients with registration errors were added to the training, MAE’s were higher and DSC lower with variations by up to 36HU for the average MAEbone. The failure mode demonstrated the potential far-reaching consequences of a single misregistered subject in the training set with variations by up to 38HU for MAEbone. Conclusion Poor registration quality of the training set had a negative impact on paired, deep learning-based sCT generation. Notably, as low as one poorly-registered MR-CT pair in the training phase was capable of drastically altering a model.
Purpose: To assess the feasibility of deep learning-based high resolution synthetic CT generation from MRI scans of the lower arm for orthopedic applications. Methods: A conditional Generative Adversarial Network was trained to synthesize CT images from multi-echo MR images. A training set of MRI and CT scans of 9 ex vivo lower arms was acquired and the CT images were registered to the MRI images. Three-fold cross-validation was applied to generate independent results for the entire dataset. The synthetic CT images were quantitatively evaluated with the mean absolute error metric, and Dice similarity and surface to surface distance on cortical bone segmentations. Results: The mean absolute error was 63.5 HU on the overall tissue volume and 144.2 HU on the cortical bone. The mean Dice similarity of the cortical bone segmentations was 0.86. The average surface to surface distance between bone on real and synthetic CT was 0.48 mm. Qualitatively, the synthetic CT images corresponded well with the real CT scans and partially maintained high resolution structures in the trabecular bone. The bone segmentations on synthetic CT images showed some false positives on tendons, but the general shape of the bone was accurately reconstructed. Conclusions: This study demonstrates that high quality synthetic CT can be generated from MRI scans of the lower arm. The good correspondence of the bone segmentations demonstrates that synthetic CT could be competitive with real CT in applications that depend on such segmentations, such as planning of orthopedic surgery and 3D printing.
Predicting pseudo CT images from MRI data has received increasing attention for use in MRI-only radiation therapy planning and PET-MRI attenuation correction, eliminating the need for harmful CT scanning. Current approaches focus on voxelwise mean absolute error (MAE) and peak signal-to-noise-ratio (PSNR) for optimization and evaluation. Contextual losses such as structural similarity (SSIM) are known to promote perceptual image quality. We investigate the use of these contextual losses for optimization.
Patch-based 3D fully convolutional neural networks (FCN) were optimized for prediction of pseudo CT images from 3D gradient echo pelvic MRI data and compared to ground truth CT data of 26 patients. CT data was non-rigidly registered to MRI for training and evaluation. We compared voxelwise L1 and L2 loss functions, with contextual multi-scale L1 and L2 (MSL1 and MSL2), and SSIM. Performance was evaluated using MAE, PSNR, SSIM and the overlap of segmented cortical bone in the reconstructions, by the dice similarity metric. Evaluation was carried out in cross-validation.
All optimizations successfully converged well with PSNR between 25 and 30 HU, except for one of the folds of SSIM optimizations. MSL1 and MSL2 are at least on par with their single-scale counterparts. MSL1 overcomes some of the instabilities of the L1 optimized prediction models. MSL2 optimization is stable, and on average, outperforms all the other losses, although quantitative evaluations based on MAE, PSNR and SSIM only show minor differences. Direct optimization using SSIM visually excelled in terms subjective perceptual image quality at the expense of a voxelwise quantitative performance drop.
Contextual loss functions can improve prediction performance of FCNs without change of the network architecture. The suggested subjective superiority of contextual losses in reconstructing local structures merits further investigations.