1.IntroductionDistributed learning has become a promising alternative to centralized learning for training machine learning (ML) models in medical image analysis, addressing many patient privacy regulations and overcoming administrative barriers.1 It offers a practical solution for accessing large and diverse datasets by enabling ML model training in distributed environments. Despite its success, distributed learning does not address the inherent challenge of image acquisition biases across multiple centers,1 which is especially important when no imaging protocol harmonization is used. Image acquisition biases can, for example, be caused by differences in imaging acquisition protocols and/or scanner types across multiple centers, potentially leading the ML model to learn patterns unrelated to the main task (i.e., spurious correlations).2 Several previous studies have explored the effects of image acquisition biases. For instance, Refs. 3 and 4 demonstrated that combining data acquired with different pulse sequences or scanners in multi-center studies can lead to biases in brain volumes when conducting brain morphology analysis for cross-sectional or longitudinal studies. In addition, variations in cortical thickness measurements were observed due to distinct imaging acquisition protocols, magnetic field strength, and pulse sequence parameters.5–7 Moreover, Glocker et al.8 found that simple intensity-based harmonization techniques cannot eliminate all of a scanner’s encoded effects. Similar findings were also reported by Ref. 9, demonstrating that ML models can accurately identify the site where brain scans were acquired, even after intensity harmonization. Recently, Souza et al.2 showed that scanner types can be classified from the internal feature representations of a model trained for Parkinson’s disease (PD) classification in a centralized approach. Acquisition bias has also been reported for other data modalities when working with multi-center datasets such as molecular data10 and magnetic resonance imaging (MRI)-derived features.11,12 All of these studies reinforce that ML models can identify and possibly exploit image acquisition biases, such as scanner type, as potential shortcuts for the disease classification task, practically diminishing their clinical utility.13 Shortcuts can emerge in ML models for various reasons, as described in more detail by Geirhos et al.13 First, models may disproportionately focus on making good predictions for the majority group, leading to potential misclassifications in minority group(s). Second, ML models often operate with minimal effort, meaning that once a model identifies a feature to perform its task, it may rely exclusively on that feature, even if it represents a data artifact. Third, a model might learn features unrelated to the intended task, such as classifying scanner types instead of disease groups. Last, a model may combine multiple features to make a decision. For example, during a disease classification task, the model might incorrectly associate scanner type with disease status, even though the scanner used to acquire the data should not influence the diagnosis. Hence, recent ML research has aimed to develop methods to harmonize data across different centers to reduce imaging biases and improve ML models’ reliability, generalizability, usability, and validity.14 Data harmonization is the process of transforming data from different sources into a common format or reference frame to enable meaningful comparisons or integration. Most data harmonization methods in ML that employ MRI volumetric data utilize generative adversarial networks (GANs).15–20 However, many of these methods have significant limitations, requiring paired data (e.g., data from the same individual using every scanner or image modality to be harmonized across), and large quantities of data from each site. These factors can hinder the implementation of GANs in distributed learning environments, especially in scenarios involving numerous centers, where each center contributes only a few data points for training. Distributed learning is commonly implemented following the federated learning (FL) paradigm. In FL, all centers train a copy of a model initialized by a server in parallel. After each training round, the server aggregates the parameters from the models trained at each center, updates the global model, and sends it back to the centers for further training. This process is repeated until the aggregated model meets predefined convergence criteria.21 Managing centers with limited datasets presents significant challenges in FL setup, including the risk of the local models overfitting and the difficulty of defining an aggregation function that does not marginalize centers with fewer datasets.22 Although data harmonization methods have been proposed specifically for the FL,23–25 this has been limited to the scenario where centers contribute large amounts of data for training. In contrast to FL, the traveling model (TM) visits one center at a time in a sequential process.22,26 Therefore, the model is initialized at a server or the first center, followed by training with the data available in the first center. Subsequently, the updated model is transferred to the next center, and training continues with locally available data. This process is repeated until the final center is reached, completing one training cycle. Multiple cycles can be conducted to enhance the model’s overall performance.22 Despite being less explored than FL, previous research has found that the TM approach to distributed learning is especially effective when centers are only able to provide a few datasets for training.22,26 Instances in which healthcare facilities possess limited data are commonly found in scenarios involving rare diseases with low prevalence,27 small or rural hospitals admitting a reduced number of patients, and hospitals in low- or middle-income countries due to a lack of sufficient imaging equipment, skilled clinicians, and radiologists.28 The effectiveness of TM arises from the iterative training of a single model, which alleviates the FL challenge of the local model’s overfitting to some extent. Recently, the TM method has been successfully applied to classify PD using three-dimensional (3D) MRI data acquired across 83 centers worldwide.26 Although the TM demonstrated state-of-the-art performance on this task [achieving an area under the receiver operating characteristic curve (AUROC) of 83%, comparable with the centralized approach of 80%], it did not address or investigate the inherent challenge of image acquisition biases. In this specific example, attempting to directly apply existing data harmonization methods designed for FL to the TM approach becomes impractical as 51.8% (43 out of 83) of the centers in this PD dataset contribute less than 10 data points, with half of them (21 out of 43) contributing less than five data points for training. Unfortunately, restricted data access is a common issue in medical image analysis, which significantly affects the effectiveness of data harmonization methods, especially when insufficient data are available for training. Thus, the current state-of-the-art techniques in this domain may severely counterbalance the advantageous capability of the TM to empower centers to contribute smaller amounts of data effectively in a distributed learning scenario. An alternative to GAN-based and FL-specific data harmonization approaches is to remove domain-specific information (e.g., scanner or acquisition protocol) from the model’s learned feature representations while trying to retain as much essential disease-related information as possible. For example, Dinsdale et al.29 introduced a data harmonization technique for centralized learning trained on neuroimaging data following this idea. Their work employed an adversarial training setup, with domain-specific information being “unlearned” from the features used by the model for the main task (e.g., brain age prediction). In their setup, the adversarial network comprises an encoder for extracting features from input data, a classification head for the main task, and a classification head for the domain task (i.e., scanner). Despite the effectiveness of their work, certain restrictions, such as ensuring that batches include representation from every scanner and oversampling the smallest dataset, must be adapted for a distributed environment where batches are composed of data from a single center. Adapting this method to distributed learning is challenging because centers do not share information, which limits the unlearning step, particularly in the FL setup, where training occurs in parallel. To address this challenge, Dinsdale et al.23 proposed an adaptation that tracks site information to generate features on the central server. Although sharing these features with each site enables the unlearning procedure to be performed effectively, it contradicts the principle of distributed learning, which aims to avoid data sharing in any way. The TM training approach not only allows centers with limited local data to participate but also eliminates the need to share information or features during the unlearning step. Thus, investigating the adversarial training setup is theoretically viable for the TM approach, although this has not yet been experimentally verified. Therefore, this work introduces HarmonyTM, which adapts and extends the harmonization framework proposed by Ref. 29 to the TM approach, which is subsequently tested for PD classification. Overall, we aim to learn a feature representation that minimizes the effect of image acquisition biases (i.e., spurious correlations) while retaining high performance for the main task and enabling centers to contribute very small sample sizes. Our major contributions include: (1) the development of the first data harmonization method for the TM approach and (2) the evaluation of HarmonyTM in reducing the impact of scanner differences while successfully distinguishing between patients with PD and healthy participants. 2.Material and MethodsIn this work, we evaluate HarmonyTM using the most extensive multi-center database for PD classification published to date. 2.1.DatasetAll analyses performed in this work utilize the unique multi-center PD database, first presented in Ref. 26, which includes 1817 T1-weighted MRI scans acquired in 83 centers worldwide.30–42 This database stands out for its diversity—it features a wide range of participant demographics, varying numbers of brain scans per center, multiple scanner vendors (e.g., Siemens, GE, and Phillips), and 23 different scanner types. The pre-processing of the database included skull-stripping using HD-BET,43 resampling to an isotropic resolution of 1 mm using linear interpolation, bias field correction using the advanced normalization tools (ANTs) non-parametric non-uniform intensity normalization technique (version 2.3.1), affine registration to the PD25-T1-MPRAGE-1mm brain atlas44 using ANTs, and cropping to to reduce irrelevant background. Table 1 summarizes the centers’ contributions, population demographics, and scanner types. Table 1Centers’ contributions and demographics.
2.2.Travelling ModelIn this work, we implemented the PD TM originally presented in Ref. 26 as the basis for all experiments. In essence, this approach involves defining an initial traveling sequence that determines the order in which the model is transferred between centers. Following this, the first center initializes and trains the model with the locally available data before passing it on to the next center. This training process continues until the model has visited every center, completing one cycle. Subsequently, a new traveling sequence is defined to introduce cycle-to-cycle variability, effectively simulating the batch shuffling process commonly used in centralized approaches. A batch size of five is employed when a center has five or more locally available datasets. In cases where fewer than five datasets are available, the batch size is adjusted accordingly. This variation was needed because 21 of the 83 centers in the distributed learning network had fewer than five datasets available for local training. In addition, computational limitations restrict the maximum batch size to five at any given time. The Adam optimizer began with an initial learning rate of 0.0001 and employed exponential decay throughout each cycle as described in Ref. 26. Moreover, the entire training process is iterated for 30 cycles, with only one epoch of training occurring at each center to improve the model’s performance. Although the training was conducted on a single computer equipped with an NVIDIA GeForce RTX 3090 GPU, it strictly adheres to the TM concept by retrieving data from one center per epoch. The entire training process takes to complete. Details of the deep learning architectures used in this work are described in Sec. 2.3. 2.3.Harmonization StrategyThe deep learning architecture used in this work is based on the state-of-the-art simple fully convolutional network (SFCN),45 which achieved high performance for PD classification using multi-center T1-weighted MRI scans in centralized and TM approaches.26,46 The model’s encoder comprises seven blocks: The initial six blocks mirror the structure of the original SFCN model. These include five blocks featuring a 3D convolutional layer with kernel filters, batch normalization, max pooling, and ReLU activation, whereas one block incorporates a 3D convolutional layer with kernel filters, batch normalization, and ReLU activation. The last block is tailored for our specific task and includes a 3D average pooling layer, a dropout layer with a 0.2 rate, and a flattening layer with 768 features (neurons). The classification head utilizes the encoder output and consists of a single dense layer using a sigmoid activation function for the binary output, distinguishing between patients with PD and healthy participants. Meanwhile, the domain head is a single layer employing a softmax activation function for the multiclass output, categorizing 23 different scanner types. Before removing domain-specific details, it is necessary to pre-train the network components with the TM approach. Therefore, the encoder and disease classification head are initially trained until convergence. Following this, the encoder is frozen, and the scanner classification head is trained until convergence. Finally, utilizing these pre-trained models, the scanner harmonization procedure is implemented in three steps as follows. It is important to highlight that these steps are performed for each batch, such that the three training steps occur at each center before transferring the model to the next center (see Fig. 1).
The scanner harmonization process involves the application of four distinct losses. In the first step, a PD classification loss [Eq. (1)] is utilized to minimize the binary cross entropy. This consists of adjusting the model’s parameters to diminish the difference between its predictions and the ground truth labels (PD versus HP). In the second step, a scanner classification loss [Eq. (2)] is employed with a similar objective as in step 1. However, categorical cross-entropy is minimized in this instance, given that the prediction involves identifying scanner types and only the domain head is optimized in this step. An adversarial confusion loss [Eq. (3)] is computed in the third step, introducing a counteractive objective to step 2. Step 2 strives for the model to recognize scanner types precisely, whereas step 3 aims to counteract this ability by removing information related to scanner types from the feature representations, where represents the batch size, which varies depending on the amount of data available at each center. In the end, the total loss [Eq. (4)] is computed as the sum of the three losses described. This cumulative loss is subsequently used to optimize the encoder. Compared with the implementation proposed in Ref. 29, two important adaptations were made. (1) Introducing to manage varying batch sizes during training. In our case, accounts for the variation in batch size due to differing amounts of data across centers. (2) Eliminating the requirement for oversampling, which was used in the original method29 to ensure that every scanner type was represented in each batch. Their centralized approach allowed for control over such a batch composition, ensuring the representation of every scanner type and oversampling underrepresented types to achieve balance. However, this is not feasible in the TM approach, where each center only has access to its own data. The harmonization process was iterated through 30 cycles, taking to complete. The code is available in a GitHub repository available at: https://github.com/RaissaSouza/scanner-harmonization: 2.4.BaselineAs a baseline for comparison to our TM approach, we trained an identical deep learning architecture as outlined in Sec. 2.3 in a centralized fashion. Here, both the model and the entire database are accessible at a single center, where training takes place. We utilize a batch size of five with shuffling and an initial learning rate of 0.001 to keep the results comparable as much as possible. Unlike the TM approach, where each batch only includes data from the same center, in the centralized approach, batches may comprise data from various centers. The two adaptations made for the TM approach—incorporating and eliminating oversampling—were maintained in the centralized approach. 2.5.Task EvaluationTo quantitatively evaluate the performance of the PD classification, we employed the following key metrics: the AUROC, accuracy, sensitivity, specificity, precision, and -score. AUROC measures the model’s ability to distinguish between patients with PD and HP across all possible logit thresholds. In contrast, classification accuracy measures the overall correctness of the model by calculating the ratio of correctly predicted instances to the total instances for a fixed threshold of 0.5. Sensitivity measures the proportion of patients with PD correctly identified by the model, whereas specificity measures the proportion of healthy participants correctly identified by the model. Precision assesses how many predicted positives are actually true positives. Finally, the -score, which is the harmonic mean of precision and sensitivity, provides a balanced measure that accounts for both false positives and false negatives. In all cases, a higher score indicates a better predictive performance. The accuracy, sensitivity, specificity, precision, and -score metrics were also employed to assess the scanner information within the trained model’s feature representation. As the scanner classification is a multi-class problem, the weighted average accounting for class size was employed for each metric. In addition, confusion matrices were employed to visually investigate the degree of information encoded for each scanner type. 2.6.Feature Representation EvaluationWe utilized an unsupervised ML approach to evaluate how the proposed harmonization procedure affects the encoding of information related to scanner type and disease status (PD versus HP) within the model. This technique employs principal component analysis (PCA) to investigate the underlying patterns within the features of the last encoder’s layer, as proposed in Ref. 47. Following this step, similar to Ref. 47, we generated two-dimensional scatter plots for the first two dominant PCA modes, considering both the scanner type and the presence of disease. Finally, logistic regression was employed to measure the degree to which scanner types and main task classes (PD versus HP) can be linearly separated within each PCA mode. 3.ResultsThe results of this work show that the PD classification performance of the TM not only remains stable but also improves after removing scanner-specific information from the feature representation (see Table 2). As expected, the scanner classification performance drops across all metrics after harmonization. Moreover, it can be observed that the TM approach is less prone to encoding scanner information (53% accuracy) before harmonization when compared with the centralized approach (65% accuracy). Nevertheless, in both cases, the harmonization method reduces scanner classification abilities to 30% accuracy and sensitivity, 96% specificity, 12% precision, and 16% -score. Figure 2 supports these findings as illustrated by the confusion matrices of the centralized and TM approaches before and after harmonization. The matrices display the counts of predictions for each pair of actual and predicted classes, where each row represents the actual scanner types, and each column represents the predicted scanner types. In Fig. 2(b), it is evident that after scanner harmonization, the majority of datasets are mistakenly classified as Siemens Skyra and Siemens Trio Tim. Together, these two scanners constitute the largest portion (37.5%) of the test set (Fig. S1 in the Supplementary Material shows the proportion of each scanner type in the test set). Furthermore, our results demonstrate that the TM approach leads to the greatest improvements in PD classification after scanner harmonization, achieving 76% accuracy, 82% AUROC, 83% sensitivity, and 75% -score compared with the centralized approach. None of the models showed improvements in specificity and precision after harmonization, although their performance remained comparable. Table 2Parkinson’s disease and scanner classification performances before and after scanner harmonization.
Figures 3 and 4 display the distributions of the first two modes of PCA applied to the feature representations of the models, with color-coded by scanner type in Fig. 3 and color-coded by disease class in Fig. 4. In Fig. 3, it can be seen that before scanner harmonization, the distribution of scanner types within the feature representation is more distinct and less overlapping compared with the corresponding results after harmonization. In contrast, Fig. 4 shows that the distributions of patients with PD and healthy participants are more similar before harmonization but become more distinguishable after harmonization. Logistic regression analysis reveals that, before scanner harmonization, PCA mode 1 encodes information for disease classification, achieving accuracies of 58% and 53% for the centralized and TM approaches, respectively. After harmonization, these percentages increase to 71% and 75%, respectively. Moreover, logistic regression analysis revealed that some information is encoded through PCA modes 1 and 2 for scanner classification, yielding accuracies of 38% and 35% before harmonization, which subsequently decreases to 28% after harmonization. The complete logistic regression analysis results are provided in Tables S1 and S2 in the Supplementary Material. 4.DiscussionIn this work, we introduced HarmonyTM, the first data harmonization method specifically designed and evaluated for the TM approach. The results showed that HarmonyTM is effective in creating a feature representation that reduces image acquisition biases (i.e., spurious correlations) and enhances disease-related information, achieving the highest classification performance for PD after scanner harmonization. We specifically focused this first feasibility analysis on scanners because of the well-known bias effects.9 Although our study demonstrated the effectiveness of HarmonyTM in harmonizing imaging data from different scanners using T1-weighted MRI scans, the method can be readily adapted to address other potential spurious correlations and different imaging modalities. However, further experimental validation is needed to confirm its broader applicability. It is crucial to highlight that the TM approach itself was found less prone to learning spurious correlations than the centralized approach even before data harmonization. This resistance may be attributed to the fact that each training data batch originates from a single center in the TM setup. By avoiding the inclusion of data from multiple centers in a single batch, the likelihood of the model learning image acquisition biases associated with different factors (e.g., scanner types) across centers is reduced. Conversely, the TM approach’s use of single-center batches may explain why it achieved less balanced metrics compared with the centralized approach after harmonization. The centralized approach, which includes data from multiple centers in each batch, increases the likelihood of training in batches where both classes are represented. This contrasts with the TM approach, where training batches from centers such as OASIS, SALD, and ADNI consist only of healthy participants, and centers such as UOA, RUH, and some from PPMI include only PD cases. Our results indicate that before scanner harmonization, the difference in the distribution of patients with PD and healthy participants within the feature representation of the last encoder’s layer was poorly noticeable in the centralized as well as TM approach. Conversely, the distributions of scanner types exhibited more noticeable differences, allowing for some degree of classification, with a higher level of accuracy of the logistic regression model from the two PCA modes computed from the model trained using the centralized approach compared with the model trained using the TM approach. These observations are consistent with previous research,2 which demonstrated that the feature representation of a PD classifier trained with the centralized approach can be used directly for classifying scanner types, achieving even better performance in scanner type classification compared with PD classification. Following scanner harmonization, a shift occurred: the distinctions between patients with PD and healthy participants became more pronounced, whereas the differences in scanner-type distributions became less prominent. This pattern aligns with the results reported by Dinsdale et al.,29 who initially introduced this method for the centralized training approach. Although their research implies that removing image acquisition biases could lead to a loss of disease-related information when the entire dataset is used for harmonization, we chose to apply harmonization to the full training dataset. As a result, we did not impose restrictions based on data availability, which could substantially reduce center participation in real-life scenarios. Despite these concerns, we found that HarmonyTM benefits from using the full dataset for harmonization. Rather than negatively impacting performance, this approach led to improved disease classification outcomes. It is essential to highlight some of the limitations of this work. First, we utilized a single established PD classifier model, which was trained using a realistic multi-site database. However, only one specific but widely used image modality (T1-weighted MRI) was investigated in this work for harmonization. Therefore, the good performance of HarmonyTM in other scenarios involving different deep learning architectures, other medical imaging modalities, and additional medical imaging tasks has yet to be demonstrated. It is crucial to recognize that biases must be first identified in the data, as it is impossible to unlearn unknown biases. It is noteworthy, however, that our work utilized the largest PD multi-center database, encompassing various scanner types and small datasets from many centers, compared with the datasets used in many other multi-center analyses, potentially enhancing the generalizability of results. Moreover, our investigation was limited to a single neuroimaging modality—T1-weighted MRI data. Although HarmonyTM can be applied to any two-dimensional (2D) or 3D data, the results may vary depending on the degree of spurious correlations present in the specific image modality. Furthermore, although we used a computer equipped with an NVIDIA GeForce RTX 3090 GPU, we believe that any center with comparable resources should be capable of training HarmonyTM. Last, it is important to note that we used only one training epoch at each center per cycle to minimize the risk of overfitting. However, we did not assess the performance at each center to ensure that overfitting did not occur. Nevertheless, additional strategies, such as regularization and the use of an external validation set, could be implemented to further analyze and prevent overfitting. Future work could address several limitations identified in this study. These include exploring HarmonyTM’s effectiveness across different deep learning architectures, medical imaging modalities, and diagnostic tasks to assess its generalizability. In addition, further investigation into strategies for mitigating overfitting risks and determining the minimal computational resources required would enhance its viability for under-resourced centers. 5.ConclusionThis work introduced HarmonyTM, a method specifically designed for harmonizing 3D MRI data in the context of the TM approach. HarmonyTM tackles the issue of image acquisition biases across different centers, a common challenge in systems that learn from data distributed across multiple locations. To the best of our knowledge, this is the first work implementing a data harmonization method for the TM approach. Our findings demonstrate the effectiveness of HarmonyTM in generating features with reduced influence from image acquisition biases, such as scanner types, while not only maintaining but also improving performance in classifying PD. In addition, our results emphasize that the TM approach has inherent resistance to learning image acquisition biases. This aspect is crucial for developing clinically useful deep learning models with broad applicability. DisclosureThe authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest. Code and Data AvailabilityImage data used were provided in part by the UK Biobank (application number 77508), by the PPMI-a, a public–private partnership funded by the Michael J. Fox Foundation by the OASIS-3 project (principal investigators: T. Benzinger, D. Marcus, J. Morris; NIH P50 AG00561, P30 NS09857781, P01 AG026276, P01 AG003991, R01 AG043434, UL1 TR000448, R01 EB009352), by the OpenfMRI database (accession number ds000245), and by the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a partnership involving multiple centers across North America with the goal of tracking participants through periods of cognitive decline and dementia. Launched in 2003, ADNI continues to evaluate biomarker, neuroimaging, and neuropsychological status in participants. Author ContributionsR.S., E.A.M.S, M.W., and N.D.F. contributed to the study’s conception. R.C., O.M., and Z.I. contributed to data acquisition and data curation. V.G. and J.M. contributed to the software implementation. R.S., E.A.M.S., C.K., N.D.F., and M.W. analyzed the results. R.S. wrote the first draft of the article. All authors critically revised the previous versions of the article and approved the final article. AcknowledgmentsData used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at Ref. 48. This work was supported by the Parkinson Association of Alberta, the Hotchkiss Brain Institute, the Canadian Consortium on Neurodegeneration in Aging (CCNA), the Canadian Open Neuroscience Platform (CONP), the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chairs program, the River Fund at Calgary Foundation, the Canadian Institutes for Health Research, the Tourmaline Chair in Parkinson disease, and the Institut de valorisation des données (IVADO). ReferencesA. Tuladhar, D. Rajashekar and N. D. Forkert,
“Distributed learning in healthcare,”
Integr. Sci., 9 183
–212 https://doi.org/10.1007/978-3-031-11199-0_10
(2022).
Google Scholar
R. Souza et al.,
“Identifying biases in a multicenter MRI database for Parkinson’s disease classification: is the disease classifier a secret site classifier?,”
IEEE J. Biomed. Health Inf., 28 2047
–2054 https://doi.org/10.1109/JBHI.2024.3352513
(2024).
Google Scholar
F. H. Alhazmi et al.,
“The effect of the MR pulse sequence on the regional corpus callosum morphometry,”
Insights Imaging, 11 17 https://doi.org/10.1186/s13244-019-0821-8
(2020).
Google Scholar
J. Jovicich et al.,
“Brain morphometry reproducibility in multi-center 3T MRI studies: a comparison of cross-sectional and longitudinal segmentations,”
NeuroImage, 83 472
–484 https://doi.org/10.1016/j.neuroimage.2013.05.007 NEIMEF 1053-8119
(2013).
Google Scholar
C. L. Tardif, D. L. Collins and G. B. Pike,
“Regional impact of field strength on voxel-based morphometry results,”
Hum. Brain Mapp., 31 943
–957 https://doi.org/10.1002/hbm.20908 HBRME7 1065-9471
(2010).
Google Scholar
C. L. Tardif et al.,
“Sensitivity of voxel-based morphometry analysis to choice of imaging protocol at 3 T,”
Neuroimage, 44 827
–838 https://doi.org/10.1016/j.neuroimage.2008.09.053 NEIMEF 1053-8119
(2009).
Google Scholar
X. Han et al.,
“Reliability of MRI-derived measurements of human cerebral cortical thickness: the effects of field strength, scanner upgrade and manufacturer,”
Neuroimage, 32 180
–194 https://doi.org/10.1016/j.neuroimage.2006.02.051 NEIMEF 1053-8119
(2006).
Google Scholar
B. Glocker et al.,
“Machine learning with multi-site imaging data: an empirical study on the impact of scanner effects,”
(2019). Google Scholar
R. Souza et al.,
“Image-encoded biological and non-biological variables may be used as shortcuts in deep learning models trained on multisite neuroimaging data,”
J. Am. Med. Inf. Assoc., 30 1925
–1933 https://doi.org/10.1093/jamia/ocad171
(2023).
Google Scholar
Y. Xuan et al.,
“Standardization and harmonization of distributed multi-center proteotype analysis supporting precision medicine studies,”
Nat. Commun., 11 5248 https://doi.org/10.1038/s41467-020-18904-9 NCAOBW 2041-1723
(2020).
Google Scholar
S. Saponaro et al.,
“Multi-site harmonization of MRI data uncovers machine-learning discrimination capability in barely separable populations: an example from the ABIDE dataset,”
Neuroimage Clin., 35 103082 https://doi.org/10.1016/j.nicl.2022.103082
(2022).
Google Scholar
C. Marzi et al.,
“Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets,”
Sci. Data, 11 115 https://doi.org/10.1038/s41597-023-02421-7
(2024).
Google Scholar
R. Geirhos et al.,
“Shortcut learning in deep neural networks,”
Nat. Mach. Intell., 2 665
–673 https://doi.org/10.1038/s42256-020-00257-z
(2020).
Google Scholar
F. Hu et al.,
“Image harmonization: a review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization,”
Neuroimage, 274 120125 https://doi.org/10.1016/j.neuroimage.2023.120125 NEIMEF 1053-8119
(2023).
Google Scholar
F. Zhao et al.,
“Harmonization of infant cortical thickness using surface-to-surface cycle-consistent adversarial networks,”
Lect. Notes Comput. Sci., 11767 475
–483 https://doi.org/10.1007/978-3-030-32251-9_52
(2019).
Google Scholar
D. Moyer et al.,
“Scanner invariant representations for diffusion MRI harmonization,”
Magn. Reson. Med., 84 2174
–2189 https://doi.org/10.1002/mrm.28243 MRMEEN 0740-3194
(2020).
Google Scholar
L. Zuo et al.,
“Unsupervised MR harmonization by learning disentangled representations using information bottleneck theory,”
Neuroimage, 243 118569 https://doi.org/10.1016/j.neuroimage.2021.118569 NEIMEF 1053-8119
(2021).
Google Scholar
V. M. Bashyam et al.,
“Deep generative medical image harmonization for improving cross‐site generalization in deep learning predictors,”
J. Mag. Reson. Imaging, 55 908
–916 https://doi.org/10.1002/jmri.27908
(2022).
Google Scholar
M. Liu et al.,
“Style transfer using generative adversarial networks for multi-site MRI harmonization,”
Lect. Notes Comput. Sci., 12903 313
–322 https://doi.org/10.1007/978-3-030-87199-4_30
(2021).
Google Scholar
B. E. Dewey et al.,
“DeepHarmony: a deep learning approach to contrast harmonization across scanner changes,”
Magn. Reson. Imaging, 64 160
–170 https://doi.org/10.1016/j.mri.2019.05.041 MRIMDQ 0730-725X
(2019).
Google Scholar
H. Brendan McMahan et al.,
“Communication-efficient learning of deep networks from decentralized data,”
(2017). Google Scholar
R. Souza et al.,
“An analysis of the effects of limited training data in distributed learning scenarios for brain age prediction,”
J. Am. Med. Inf. Assoc., 30 112
–119 https://doi.org/10.1093/jamia/ocac204
(2022).
Google Scholar
N. K. Dinsdale, M. Jenkinson and A. I. L. Namburete,
“FedHarmony: unlearning scanner bias with distributed data,”
Lect. Notes in Comput. Sci., 13438 695
–704 https://doi.org/10.1007/978-3-031-16452-1_66
(2022).
Google Scholar
M. Jiang et al.,
“HarmoFL: harmonizing local and global drifts in federated learning on heterogeneous medical images,”
Proc. AAAI Conf. Artif. Intell., 36 1087
–1095 https://doi.org/10.1609/aaai.v36i1.19993
(2022).
Google Scholar
S. T. Arasteh et al.,
“Mind the gap: federated learning broadens domain generalization in diagnostic AI models,”
Sci. Rep., 13 22576 https://doi.org/10.1038/s41598-023-49956-8
(2023).
Google Scholar
R. Souza et al.,
“A multi-center distributed learning approach for Parkinson’s disease classification using the traveling model paradigm,”
Front. Artif. Intell., 7 https://doi.org/10.3389/frai.2024.1301997
(2024).
Google Scholar
D. Taruscio et al.,
“The occurrence of 275 rare diseases and 47 rare disease groups in Italy. Results from the National Registry of Rare Diseases,”
Int. J. Environ. Res. Public Health, 15 1470 https://doi.org/10.3390/ijerph15071470
(2018).
Google Scholar
R. Souza, E. A. M. Stanley and N. D. Forkert,
“On the relationship between open science in artificial intelligence for medical imaging and global health equity,”
Lect. Notes Comput. Sci., 14242 289
–300 https://doi.org/10.1007/978-3-031-45249-9_28
(2023).
Google Scholar
N. K. Dinsdale et al.,
“Deep learning-based unlearning of dataset bias for MRI harmonisation and confound removal,”
Neuroimage, 228 117689 https://doi.org/10.1016/j.neuroimage.2020.117689 NEIMEF 1053-8119
(2021).
Google Scholar
D. Wei et al.,
“Structural and functional brain scans from the cross-sectional Southwest University adult lifespan dataset,”
Sci. Data, 5 1
–10 https://doi.org/10.1038/sdata.2018.134
(2018).
Google Scholar
, “To accelerate discovery and deliver cures,”
https://www.mcgill.ca/neuro/open-science/c-big-repository
().
Google Scholar
L. Badea et al.,
“Exploring the reproducibility of functional connectivity alterations in Parkinson’s disease,”
PLoS One, 12 e0188196 https://doi.org/10.1371/journal.pone.0188196 POLNCL 1932-6203
(2017).
Google Scholar
P. J. LaMontagne et al.,
“OASIS-3: longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer disease,”
(2019). Google Scholar
S. Lang et al.,
“Network basis of the dysexecutive and posterior cortical cognitive profiles in Parkinson’s disease,”
Mov. Disord., 34 893
–902 https://doi.org/10.1002/mds.27674 MOVDEA 0885-3185
(2019).
Google Scholar
A. Hanganu et al.,
“Mild cognitive impairment is linked with faster rate of cortical thinning in patients with Parkinson’s disease longitudinally,”
Brain, 137 1120
–1129 https://doi.org/10.1093/brain/awu036 BRAIAK 0006-8950
(2014).
Google Scholar
H. J. Acharya et al.,
“Axial signs and magnetic resonance imaging correlates in Parkinson’s disease,”
Can. J. Neurol. Sci., 34 56
–61 https://doi.org/10.1017/S0317167100005795
(2007).
Google Scholar
S. Duchesne et al.,
“The Canadian dementia imaging protocol: harmonizing national cohorts,”
J. Magn. Reson. Imaging, 49 456
–465 https://doi.org/10.1002/jmri.26197
(2019).
Google Scholar
A. S. Talai et al.,
“Utility of multi-modal MRI for differentiating of Parkinson’s disease and progressive supranuclear palsy using machine learning,”
Front. Neurol., 12 648548 https://doi.org/10.3389/fneur.2021.648548
(2021).
Google Scholar
C. R. Jack et al.,
“The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods,”
J. Magn. Reson. Imaging, 27 685
–691 https://doi.org/10.1002/jmri.21049
(2008).
Google Scholar
C. Sudlow et al.,
“UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age,”
PLoS Med., 12 e1001779 https://doi.org/10.1371/journal.pmed.1001779 1549-1676
(2015).
Google Scholar
F. Isensee et al.,
“Automated brain extraction of multisequence MRI using artificial neural networks,”
Hum. Brain Mapp., 40 4952
–4964 https://doi.org/10.1002/hbm.24750 HBRME7 1065-9471
(2019).
Google Scholar
Y. Xiao et al.,
“Multi-contrast unbiased MRI atlas of a Parkinson’s disease population,”
Int. J. Comput. Assist. Radiol. Surg., 10 329
–341 https://doi.org/10.1007/s11548-014-1068-y
(2015).
Google Scholar
H. Peng et al.,
“Accurate brain age prediction with lightweight deep neural networks,”
Med. Image. Anal., 68 101871 https://doi.org/10.1016/j.media.2020.101871
(2021).
Google Scholar
M. Camacho et al.,
“Explainable classification of Parkinson’s disease using deep learning trained on a large multi-center database of T1-weighted MRI datasets,”
Neuroimage Clin., 38 103405 https://doi.org/10.1016/j.nicl.2023.103405
(2023).
Google Scholar
B. Glocker et al.,
“Algorithmic encoding of protected characteristics in chest X-ray disease detection models,”
EBioMedicine, 89 104467 https://doi.org/10.1016/j.ebiom.2023.104467
(2023).
Google Scholar
BiographyRaissa Souza earned her BSc degree in computer science from São Paulo State University, Brazil, in 2017. During her studies, she spent time at UCSD and UCLA, where she discovered her passion for applying computer science to improve medical care. Now, as a PhD candidate in biomedical engineering at the University of Calgary, her research focuses on developing distributed learning methods for training models on small datasets, with the aim of making machine learning more accessible in under-resourced areas. Emma A. M Stanley received her Bachelor of Applied Science degree in chemical and biological engineering from the University of British Columbia in 2020. She is currently a PhD candidate in biomedical engineering at the University of Calgary, where her research focuses on developing a deeper understanding of bias and fairness in AI for medical image analysis. Vedant Gulve was an undergraduate student at the Indian Institute of Technology (2023) when he joined the Image Processing and Machine Learning Laboratory at the University of Calgary for a summer internship under the supervision of Dr. Nils Forkert. Jasmine Moore completed her BSc degree in physics from the University of British Columbia in 2016 and is now pursuing a PhD in biomedical engineering with a specialization in medical imaging from the University of Calgary. Her project revolves around computationally modeling neurological conditions using deep learning architectures. Chris Kang received his PhD in applied mathematics from Washington State University in 2023, under the supervision of Dr. Nikolaos K. Voulgarakis. He is currently an Alberta Innovates postdoctoral fellow in the Medical Image Processing and Machine Learning Laboratory, mentored by Dr. Nils Forkert. His research interests include Boolean networks, the critical dynamics of complex systems, and Hopfield networks. Richard Camicioli trained in engineering and medicinal chemistry before earning his MD, CM, from McGill University, where he also completed his neurology residency. He pursued fellowship training in geriatric neurology at Oregon Health and Sciences University, joining their faculty in 1994. In 2000, he became an associate professor at the University of Alberta, advancing to full professor in 2008. His research focuses on cognitive dysfunction in Parkinson’s and motor disorders such as gait disturbances in aging and dementia. Oury Monchi earned his PhD in computational neuroscience from King’s College London and completed postdoctoral fellowships in neuroimaging at the Montreal Neurological Institute and CRIUGM. He served as a professor at the Université de Montréal and the University of Calgary, where he held the Canada Research Chair in non-motor symptoms of Parkinson’s disease. Currently, he is the scientific director at CRIUGM and a professor at Université de Montréal, pioneering neuroimaging studies on Parkinson’s disease and dementia prediction. Zahinoor Ismail is a clinician scientist and a professor of Psychiatry, Neurology, Epidemiology, and Pathology at the University of Calgary. Certified in Behavioral Neurology & Neuropsychiatry, and Geriatric Psychiatry, he has over 25 years of clinical experience. His research focuses on neuropsychiatric disease measurement and diagnosis, dementia markers, and biomarkers. He chairs the Canadian Conference on Dementia and has led key advancements in dementia and neuropsychiatric syndrome classification and nosology. Matthias Wilms is an assistant professor at the University of Calgary in the Departments of Paediatrics, Community Health Sciences, and Radiology. He earned his BSc degree in applied computer science and his MSc degree in computer science from the Hamburg University of Applied Sciences and the University of Hamburg, respectively, and his PhD from the University of Luebeck. His research focuses on the development of innovative machine-learning–based medical image analysis methods. Nils D. Forkert is a full professor at the University of Calgary, with appointments in Radiology and Clinical Neurosciences. He obtained his diploma in computer science (2009) and his PhD (2013) from the University of Hamburg, as well as his master’s degree in medical physics (2012) from the Technical University of Kaiserslautern. A Canada Research Chair in Medical Image Analysis, he specializes in developing advanced image processing and predictive algorithms for medical data analysis. |
Scanners
Data modeling
Education and training
Machine learning
Principal component analysis
Diseases and disorders
Data acquisition