KEYWORDS: Education and training, 3D modeling, Magnetic resonance imaging, Acoustics, Tongue, Motion detection, Data modeling, Motion models, Performance modeling, Diseases and disorders
Understanding the relationship between tongue motion patterns during speech and their resulting speech acoustic outcomes—i.e., articulatory-acoustic relation—is of great importance in assessing speech quality and developing innovative treatment and rehabilitative strategies. This is especially important when evaluating and detecting abnormal articulatory features in patients with speech-related disorders. In this work, we aim to develop a framework for detecting speech motion anomalies in conjunction with their corresponding speech acoustics. This is achieved through the use of a deep cross-modal translator trained on data from healthy individuals only, which bridges the gap between 4D motion fields obtained from tagged MRI and 2D spectrograms derived from speech acoustic data. The trained translator is used as an anomaly detector, by measuring the spectrogram reconstruction quality on healthy individuals or patients. In particular, the cross-modal translator is likely to yield limited generalization capabilities on patient data, which includes unseen out-of-distribution patterns and demonstrates subpar performance, when compared with healthy individuals. A one-class SVM is then used to distinguish the spectrograms of healthy individuals from those of patients. To validate our framework, we collected a total of 39 paired tagged MRI and speech waveforms, consisting of data from 36 healthy individuals and 3 tongue cancer patients. We used both 3D convolutional and transformer-based deep translation models, training them on the healthy training set and then applying them to both the healthy and patient testing sets. Our framework demonstrates a capability to detect abnormal patient data, thereby illustrating its potential in enhancing the understanding of the articulatory-acoustic relation for both healthy individuals and patients.
Magnetic Resonance Imaging with tagging (tMRI) has long been utilized for quantifying tissue motion and strain during deformation. However, a phenomenon known as tag fading, a gradual decrease in tag visibility over time, often complicates post-processing. The first contribution of this study is to model tag fading by considering the interplay between T1 relaxation and the repeated application of radio frequency (RF) pulses during serial imaging sequences. This is a factor that has been overlooked in prior research on tMRI post-processing. Further, we have observed an emerging trend of utilizing raw tagged MRI within a deep learning-based (DL) registration framework for motion estimation. In this work, we evaluate and analyze the impact of commonly used image similarity objectives in training DL registrations on raw tMRI. This is then compared with the Harmonic Phase-based approach, a traditional approach which is claimed to be robust to tag fading. Our findings, derived from both simulated images and an actual phantom scan, reveal the limitations of various similarity losses in raw tMRI and emphasize caution in registration tasks where image intensity changes over time.
Deep learning (DL) has led to significant improvements in medical image synthesis, enabling advanced image-toimage translation to generate synthetic images. However, DL methods face challenges such as domain shift and high demands for training data, limiting their generalizability and applicability. Historically, image synthesis was also carried out using deformable image registration (DIR), a method that warps moving images of a desired modality to match the anatomy of a fixed image. However, concerns about its speed and accuracy led to its decline in popularity. With the recent advances of DL-based DIR, we now revisit and reinvigorate this line of research. In this paper, we propose a fast and accurate synthesis method based on DIR. We use the task of synthesizing a rare magnetic resonance (MR) sequence, white matter nulled (WMn) T1-weighted (T1-w) images, to demonstrate the potential of our approach. During training, our method learns a DIR model based on the widely available MPRAGE sequence, which is a cerebrospinal fluid nulled (CSFn) T1-w inversion recovery gradient echo pulse sequence. During testing, the trained DIR model is first applied to estimate the deformation between moving and fixed CSFn images. Subsequently, this estimated deformation is applied to align the paired WMn counterpart of the moving CSFn image, yielding a synthetic WMn image for the fixed CSFn image. Our experiments demonstrate promising results for unsupervised image synthesis using DIR. These findings highlight the potential of our technique in contexts where supervised synthesis methods are constrained by limited training data.
The thalamus is a subcortical gray matter structure that plays a key role in relaying sensory and motor signals within the brain. Its nuclei can atrophy or otherwise be affected by neurological disease and injuries including mild traumatic brain injury. Segmenting both the thalamus and its nuclei is challenging because of the relatively low contrast within and around the thalamus in conventional magnetic resonance (MR) images. This paper explores imaging features to determine key tissue signatures that naturally cluster, from which we can parcellate thalamic nuclei. Tissue contrasts include T1-weighted and T2-weighted images, MR diffusion measurements including FA, mean diffusivity, Knutsson coefficients that represent fiber orientation, and synthetic multi-TI images derived from FGATIR and T1-weighted images. After registration of these contrasts and isolation of the thalamus, we use the uniform manifold approximation and projection (UMAP) method for dimensionality reduction to produce a low-dimensional representation of the data within the thalamus. Manual labeling of the thalamus provides labels for our UMAP embedding from which k nearest neighbors can be used to label new unseen voxels in that same UMAP embedding. N-fold cross-validation of the method reveals comparable performance to state-of-the-art methods for thalamic parcellation.
KEYWORDS: Image segmentation, Magnetic resonance imaging, Voxels, Deep learning, Thalamus, White matter, Data modeling, Visualization, Realistic image synthesis
T1-weighted (T1w) magnetic resonance (MR) neuroimages are usually acquired with an inversion time that nulls the cerebrospinal fluid—i.e., CSFn MPRAGE images—but are rarely acquired with the white matter nulled—i.e., WMn images. Since WMn images can be useful in highlighting thalamic nuclei, we develop a method to synthesize these images from other images that are often acquired. We propose a two-part model, with a deep learning based encoder and a decoder based on an imaging equation which governs the acquisition of our T1w images. This model can be trained on a subset of the dataset where the WMn MPRAGE images are available. Our model takes image contrasts that are often acquired (e.g., CSFn MPRAGE) as input, and generates WMn MPRAGE images as output, along with two quantitative parameter maps as intermediate results. After training, our model is able to generate a synthetic WMn MPRAGE image for any given subject. Our model results have high signal-to-noise ratio and are visually almost identical to the ground truth images. Furthermore, downstream thalamic nuclei segmentation on synthetic WMn MPRAGE images are consistent with ground truth WMn MPRAGE images.
Analysis of tongue motion has been proven useful in gaining a better understanding of speech and swallowing disorders. Tagged magnetic resonance imaging (MRI) has been used to image tongue motion, and the harmonic phase processing (HARP) method has been used to compute 3D motion from these images. However, HARP can fail with large motions due to so-called tag (or phase) jumping, yielding highly inaccurate results. The phase vector incompressible registration algorithm (PVIRA) was developed using the HARP framework to yield smooth, incompressible, and diffeomorphic motion fields, but it can also suffer from tag jumping. In this paper, we propose a new method to avoid tag jumping occurring in the later frames of tagged MR image sequences. The new approach uses PVIRA between successive time frames and then adds their stationary velocity fields to yield a starting point from which to initialize a final PVIRA stage between troublesome frames. We demonstrate on multiple data sets that this method avoids tag jumping and produces superior motion estimates compared with existing methods.
Medical image segmentation is one of the core tasks of medical image analysis. Automatic segmentation of brain magnetic resonance images (MRIs) can be used to visualize and track changes of the brain’s anatomical structures that may occur due to normal aging or disease. Machine learning techniques are widely used in automatic structure segmentation. However, the contrast variation between the training and testing data makes it difficult for segmentation algorithms to generate consistent results. To address this problem, an image–to– image translation technique called MR image harmonization can be used to match the contrast between different data sets. It is important for the harmonization to transform image intensity while maintaining the underlying anatomy. In this paper, we present a 3D U-Net algorithm to segment the thalamus from multiple MR image modalities and investigate the impact of harmonization on the segmentation algorithm. Manual delineations of thalamic nuclei on two data sets are available. However, we aim to analyze the thalamus in another large data set where ground truth labels are lacking. We trained two segmentation networks, one with unharmonized images and the other with harmonized images, on one data set with manual labels, and compared their performances on the other data set with manual labels. These two data groups were diagnosed with two brain disorders and were acquired with similar imaging protocols. The harmonization target is the large data set without manual labels, which also has a different imaging protocol. The networks trained on unharmonized and harmonized data showed no significant difference when evaluating on the other data set; demonstrating that image harmonization can maintain the anatomy and does not affect the segmentation task. The two networks were evaluated on the harmonization target data set and the network trained on harmonized data showed significant improvement over the network trained on unharmonized data. Therefore, the network trained on harmonized data provides the potential to process large amounts of data from other sites, even in the absence of site-specific training data.
The human tongue muscles plays an important role in multiple vital human functions. Most tongue regions are extensively interdigitated with two orthogonal muscle fibers. Reconstruction of the tongue muscle fiber orientations can help understand the deformation of each muscle group and its function. High angular resolution diffusion imaging (HARDI), one of the diffusion weighted imaging techniques, has been used to resolve the crossing muscle fibers in the tongue. Most existing fiber reconstruction methods use HARDI data to estimate the fiber orientation distribution function (fODF), from which the distinct fiber orientations can be identified by a peak finding algorithm. The assignment of the primary and second fiber orientations can be inconsistent with neighboring voxels. In this paper, we propose a fiber matching algorithm to refine the display of the fiber orientations, which can be used as a post-processing step for fiber reconstruction. The fiber matching algorithm takes the fiber orientations that are reconstructed by a deep convolutional neural network as input, and computes the similarity between neighboring fibers under different assignments. The optimal assignments are achieved by solving a quadratic unconstrained binary optimization model. The proposed method was shown to greatly improve the fiber assignments on synthetic tongue fiber orientations. Application to post-mortem human tongue indicated that the proposed method can reconstruct the complex muscle fibers of the human tongue and improve the visualization of the fiber orientations.
KEYWORDS: Image segmentation, Thalamus, Convolutional neural networks, Medical imaging, Visualization, Traumatic brain injury, Super resolution, Magnetism, Magnetic resonance imaging, Image processing algorithms and systems
Thalamus segmentation plays an important role in studies that are related to neural system diseases. Existing thalamus segmentation algorithms use traditional image processing techniques on magnetic resonance images (MRI), which suffer from accuracy and efficiency. In recent years, deep convolutional neural networks (CNN) have been able to outperform many conventional algorithms in medical imaging tasks. We propose segmenting the thalamus using a 3D CNN that takes an MPRAGE image and a set of feature images derived from a diffusion tensor image (DTI). Experimental results demonstrate that using CNNs to segment the thalamus can improve accuracy and efficiency on various datasets.
Speech is generated through complex contacts of the tongue with the palate and teeth. Evaluation of the tonguepalate contact can be beneficial in studies of linguistics, diagnosis and treatment of speech disorders, and speech synthesis. In this paper, we propose a method of tongue-palate contact assessment based on cine MR images during speech. We use a 2D U-Net to segment the space between the top of the tongue and the palates on the sagittal slices of the cine images. Then a series of MR palatograms are generated by computing the vertical thickness of the segmented space on all the sagittal slices and projecting onto the axial plane. Compared to static palatography and electropalatography, the proposed method assesses the tongue-palate contact information as well as the tongue-to-palate distances over time. We generate a sequence of MR palatograms for two healthy subjects, from three uttered phrases. During pronunciation of the selected phrases, the tongue-palate contact points and the relative tongue-to-palate distances were similar between the subjects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.