medigan: a Python library of pretrained generative models for medical image synthesis

Abstract. Purpose Deep learning has shown great promise as the backbone of clinical decision support systems. Synthetic data generated by generative models can enhance the performance and capabilities of data-hungry deep learning models. However, there is (1) limited availability of (synthetic) datasets and (2) generative models are complex to train, which hinders their adoption in research and clinical applications. To reduce this entry barrier, we explore generative model sharing to allow more researchers to access, generate, and benefit from synthetic data. Approach We propose medigan, a one-stop shop for pretrained generative models implemented as an open-source framework-agnostic Python library. After gathering end-user requirements, design decisions based on usability, technical feasibility, and scalability are formulated. Subsequently, we implement medigan based on modular components for generative model (i) execution, (ii) visualization, (iii) search & ranking, and (iv) contribution. We integrate pretrained models with applications across modalities such as mammography, endoscopy, x-ray, and MRI. Results The scalability and design of the library are demonstrated by its growing number of integrated and readily-usable pretrained generative models, which include 21 models utilizing nine different generative adversarial network architectures trained on 11 different datasets. We further analyze three medigan applications, which include (a) enabling community-wide sharing of restricted data, (b) investigating generative model evaluation metrics, and (c) improving clinical downstream tasks. In (b), we extract Fréchet inception distances (FID) demonstrating FID variability based on image normalization and radiology-specific feature extractors. Conclusion medigan allows researchers and developers to create, increase, and domain-adapt their training data in just a few lines of code. Capable of enriching and accelerating the development of clinical machine learning models, we show medigan’s viability as platform for generative model sharing. Our multimodel synthetic data experiments uncover standards for assessing and reporting metrics, such as FID, in image synthesis studies.


Deep Learning and the Benefits of Synthetic Data
The use of deep learning has increased extensively in the last decade, thanks in part to advances in computing technology (e.g., data storage, graphics processing units) and the digitization of data.In medical imaging, deep learning algorithms have shown promising potential for clinical use due to their capability of extracting and learning meaningful patterns from imaging data and their high performance on clinically-relevant tasks.][9] However, deep learning models need vast amounts of well-annotated data to reliably learn to perform clinical tasks, whereas, at the same time, the availability of public medical imaging datasets remains limited due to legal, ethical, and technical patient data sharing constraints. 9,10n the common scenario of limited imaging data, synthetic images, such as the ones illustrated in Fig. 1, are a useful tool to improve the learning of the artificial intelligence (AI) algorithm, e.g., by enlarging its training dataset. 7,11,12Furthermore, synthetic data can be used to minimize problems associated with domain shift, data scarcity, class imbalance, and data privacy. 7or instance, a dataset can be balanced by populating the less frequent classes with synthetic data during training (class imbalance).Further, as domain-adaptation technique, a dataset can be translated from one domain to another, e.g., from MRI to CT 13 (domain shift).Regarding data privacy, synthetic data can be shared instead of real patient data to improve privacy preservation. 7,14,152 The Need of Reusable Synthetic Data Generators Commonly, generative models are used to produce synthetic imaging data, with generative adversarial networks (GANs) 16 being popular models of choice.However, the adversarial training scheme required by GANs and related networks is known to pose challenges in regard to (i) achieving training stability, (ii) avoiding mode collapse, and (iii) reaching convergence.[17][18][19] Fig. 1 Randomly sampled images generated by five medigan models ranging from (a) synthetic mammograms and (b) brain MRI to (c) endoscopy imaging of polyps, (d) mammogram mass patches, and (e) chest x-ray imaging.The models (a)-(e) correspond to the model IDs in Table 3, where (a) 3, (b) 7, (c) 10, (d) 12, and (e) 19.
Hence, the training process of GANs and generative models at large is nontrivial and requires a considerable time investment for each training iteration as well as specific hardware and a fair amount of knowledge and skills in the area of AI and generative modeling.Given these constraints, researchers and engineers often refrain from generating and integrating synthetic data into their AI training pipelines and experiments.This issue is further exacerbated by the prevailing need of training a new generative model for each new data distribution, which, in practice, often means that a new generative model has to be trained for each new application, use-case, and dataset.

Community-Driven Model Sharing and Reuse
We argue that a feasible solution to this problem is the community-wide sharing and reuse of pretrained generative models.Once successfully trained, such a model can be of value to multiple researchers and engineers with similar needs.For example, researchers can reuse the same model if they work on the same problem, conduct similar experiments, or evaluate their methods on the same dataset.We note that such reusing ideally is subject to previous inspection of generative model limitations with the model's output quality having qualified as suitable for the task at hand.The quality of a model's output data and annotations can commonly be measured via (a) expert assessment, (b) computation of image quality metrics, or (c) downstream task evaluation.In sum, the problem of synthetic data generation calls for a community-driven solution, where a generative model trained by one member of the community can be reused by other members of the community.Motivated by the absence of such a community-driven solution for synthetic medical data generation, we designed and developed medigan to bridge the gap between the need for synthetic data and complex generative model creation and training processes.
2 Background and Related Work

Generative Models
While discriminative models are able to distinguish between data instances of different kinds (label samples), generative models are able to generate new data instances (draw samples).In contrast to modeling decision boundaries in a data space, generative models model how data is distributed within that space.Deep generative models 20 are composed of multihidden layer neural networks to explicitly or implicitly estimate a probability density function (PDF) from a set of real data samples.After approximating the PDF from observed data points (i.e., learning the real data distribution), these models can then sample unobserved new data points from that distribution.In computer vision and medical imaging, synthetic images are generated by sampling such unobserved points from high-dimensional imaging data distributions.Popular deep generative models to create synthetic images in these fields include variational autoencoders, 21 normalizing flows, [22][23][24] diffusion models, [25][26][27] and GANs. 16From these, the versatile GAN framework has seen the most widespread adoption in medical imaging to date. 7We, hence, center our attention on GANs in the remainder of this work but emphasize that contributions of other types of generative models are equally welcome in the medigan library.

Generative Adversarial Networks
The training of GANs comprises two neural networks, the generator network (G) and the discriminator network (D), as illustrated by Fig. 2 for the example of mammography region-ofinterest patch generation.G and D compete against each other in a two-player zero-sum game defined by the value function shown in Eq. (1).Subsequent studies extended the adversarial learning scheme by proposing innovations of the loss function, G and D network architectures, and GAN applications by introducing conditions into the image generation process (1)

GAN loss functions
Goodfellow et al. 16 define the discriminator as a binary classifier classifying whether a sample x is either real or generated.The discriminator is, hence, trained via binary-cross entropy with the objective of minimizing the adversarial loss function shown in Eq. ( 2), which the generator, on the other hand, tries to maximize.In Wasserstein GAN (WGAN), 28 the adversarial loss function is replaced with a loss function based on the Wasserstein-1 distance between real and fake sample distributions estimated by D (alias "critic").Gulrajani et al. 29 resolve the need to enforce a 1-Lipschitz constraint in WGAN via gradient penalty (WGAN-GP) instead of WGAN weight clipping.Equation (3) depicts the WGAN-GP discriminator loss with penalty coefficient λ and distribution P x based on sampled pairs from (a) the real data distribution P data and (b) the generated data distribution P g E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 3 6 1 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 3 1 7 In addition to changes to the adversarial loss, further studies integrate additional loss terms into the GAN framework.For instance, FastGAN 30 uses an additional reconstruction loss in the discriminator, which, for improved regularisation, is trained as self-supervised feature-encoder.

GAN network architectures and conditions
A plethora of different GAN network architectures has been proposed 7,31 starting with a deep convolutional GAN (DCGAN) 32 neural network architecture of both D and G. Later approaches, e.g., include a ResNet-based architecture as backbone 29 and progressively-grow the generator and discriminator networks during training to enable high-resolution image synthesis (PGGAN). 33nother line of research has been focusing on conditioning the output of GANs based on discrete or continuous labels.For example, in cGAN this is achieved by feeding a label to both D and G, 34 whereas in the auxiliary classifier GAN (AC-GAN), the discriminator additionally predicts the label that is provided to the generator. 35ther models condition the generation process on input images [36][37][38][39][40] unlocking image-toimage translation and domain-adaptation GAN applications.A key difference in image-to-image translation methodology is the presence (paired translation) or absence (unpaired translation) of corresponding image pairs in the target and source domain.Using an L1 reconstruction loss between target and source domain alongside the adversarial loss from Eq. ( 2), pix2pix 36 defines a common baseline model for paired image-to-image translation.For unpaired translation, Fig. 2 The GAN framework.In this visual example, the generator network receives random noise vectors, which it learns to map to region-of-interest patches of full-field digital mammograms.
During training, the adversarial loss is not only backpropagated to the discriminator as L D but also to the generator as L G .This particular architecture and loss function was used to train medigan models listed with IDs 1, 2, and 5 in Table 3.
cycleGAN 37 is a popular approach, which also consists of an L1 reconstruction (cycle-consistency) loss between a source (target) image and a source (target) image translated to target (source) and back to source (target) via two consecutive generators.
A further methodological innovation includes SinGAN, 41 which, based on only a single training image, learns to generate multiple synthetic images.This is accomplished via a multi-scale coarse-to-fine pipeline of generators, where a sample is passed sequentially through all generators, each of which also receives a random noise vector as input.

Generative Model Evaluation
One approach of evaluating generative models is by human expert assessment of their generated synthetic data.In medical imaging, such observer studies often enlist board-certified clinical experts such as radiologists or pathologists to examine the quality and/or realism of the synthetic medical images. 42,43However, this approach is manual, laborious and costly, and, hence, research attention has been devoted to automating generative model evaluation, 44,45 including: i. Metrics for automated analysis of the synthetic data and its distribution, such as the inception score (IS) 17 and Fréchet inception distance (FID). 46Both metrics are popular in computer vision, 31 whereas the latter also has seen widespread adoption in medical imaging. 7D is based on a pretrained Inception 47 model (e.g., v1, 48 v3 47 ) to extract features from synthetic and real datasets, which are then fitted to multivariate Gaussians X (e.g., real) and Y (e.g., synthetic) with means μ X and μ Y and covariance matrices Σ X and Σ Y .Next, X and Y are compared via the Wasserstein-2 (Fréchet) distance (FD), as depicted as ii. Metrics that compare a synthetic image with a real reference image such as mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM). 49Given the absence of corresponding reference images, such metrics are not readily applicable for unconditional noise-to-image generation models.iii.Metrics that compare the performance of a model on a surrogate downstream task with and without generative model intervention. 7,14,50,51For instance, training on additional synthetic data can increase a model's downstream task performance, thus, demonstrating the usefulness of the generative model that generated such data.
For the analysis of generative models in the present study, we discard (ii) due to its limitation of requiring specific reference images.We further deprioritize the IS from (i) due to its limited applicability to medical imagery stemming from it missing a comparison between real and synthetic data distributions combined with it having a strong bias on natural images via its ImageNet 52 -pretrained Inception classifier as backbone feature extractor.Therefore, we focus on FID from (i) and downstream task performance (iii) as potential evaluation measures for medical image synthesis models in the remainder of this work.

Image Synthesis Tools and Libraries
Related libraries, such as pygan, 53 torchGAN, 54 vegans, 55 imaginaire, 56 TF-GAN, 57 PyTorch-GAN, 58 keras-GAN, 59 mimicry, 60 and studioGAN, 31 have focused on facilitating the implementation, training, and comparative evaluation of GANs in computer vision (CV).Despite a strong focus on language models, the HuggingFace transformers library and model hub 61 also contain a few pretrained computer vision GAN models.The GAN Lab 62 provides an interactive visual experimentation tool to examine the training process and its data flows in GANs.
Specific to AI in medical imaging, Diaz et al. 63 provided a comprehensive survey of tools, libraries and platforms for privacy preservation, data curation, medical image storage, annotation, and repositories.Compared to CV, fewer GAN and AI libraries and tools exist in medical imaging.4][65] For instance, pretrained generative models from computer vision cannot be readily adapted to produce medical imaging-specific outputs.The TorchIO library 64 addresses the gap between CV and medical image data processing requirements providing functions for efficient loading, augmentation, preprocessing, and patch-based sampling of medical imagery.The medical open network for AI (MONAI) 66 is a PyTorch-based 67 framework that facilitates the development of diagnostic AI models with tutorials for classification, segmentation, and AI model deployment.Further efforts in this realm include NiftyNet, 68 the deep learning tool kit (DLTK), 69 MedicalZooPytorch, 70 and nnDetection. 71The recent RadImageNet initiative 72 shares baseline image classification models pretrained on a dataset designed as the radiology medical imaging equivalent to ImageNet. 52o the best of our knowledge, no open-access software, tool, or library exists that targets reuse and sharing of pretrained generative models in medical imaging.To this end, we expect the contribution of our medigan library to be instrumental in enabling dissemination of generative models and increased adoption of synthetic data into AI training pipelines.As an openaccess plug-and-play solution for generation of multipurpose synthetic data, medigan aims to benefit patients and clinicians by enhancing the performance and robustness of AI-based clinical decision support systems.

Method: The medigan Library
We contribute medigan as an open-source open-access MIT-licensed Python3 library distributed via the Python package index (Pypi) for synthetic medical dataset generation, e.g., via pretrained generative models.The metadata of medigan is summarized in Table 1.medigan accelerates research in medical imaging by flexibly providing (a) synthetic data augmentation and (b) preprocessing functionality, both readily integrable in machine learning training pipelines.It also allows contributors to add their generative models in a thought-through process and provides simplistic functions for end-users to search for, rank, and visualize models.The overview of medigan in Fig. 3 depicts the core functions demonstrating how end-users can (a) contribute a generative model, (b) find a suitable generative model inside the library, and (c) generate synthetic data with that model.

User Requirements and Design Decisions
End-user requirement gathering is recommended for the development of trustworthy AI solutions in medical imaging. 75Therefore, we organized requirement gathering sessions with potential end-users, model contributors, and stakeholders from the EuCanImage Consortium, a large European H2020 project 76 building a cancer imaging platform for enhanced AI in oncology.Upon exploring the needs and preferences of medical imaging researchers and AI developers, respective requirements for the design of medigan were formulated to ensure usability and usefulness.For instance, the users articulated a clear preference for a user interface in the format of an importable package as opposed to a graphical user interface (GUI), web application, database system, or API.Table 2 summarizes key requirements and the corresponding design decisions.

Software Design and Architecture
medigan is built with a focus on simplicity and usability.The integration of pretrained models is designed as internal Python package import and offers simultaneously (a) high flexibility to and (b) low code dependency on these generative models.The latter allows the reuse of the same orchestration functions in medigan for all model packages.Using object-oriented programming, the same model_executor class is used to implement, instantiate, and run all different types of generative model packages.To keep the library maintainable and lightweight, and to avoid limiting interdependencies between library code and generative model code, medigan's models are hosted outside the library (on Zenodo) as independent Python modules.To avoid long initialization times upon library import, lazy loading is applied.A model is only loaded and its model_executor instance is only initialized if a user specifically requests synthetic data generation for that model.To achieve high cohesion, 79 i.e., keeping the library and its functions specific, manageable, and understandable, the library is structured into several modular components.These include the loosely-coupled model_ executor, model_selector, and model_contributor modules.
The generators module is inspired by the facade design pattern 80 and acts as a single point of access to all of medigan's functionalities.As single interface layer between users and library, it reduces interaction complexity and provides users with a clear set of readily extendable library functions.Also, the generators module increases internal code reusability and allows for combination of functions from other modules.For instance, a single function call can run the generation of samples by the model with the highest FID score of all models found in a keyword search.
Fig. 3 Architectural overview of medigan.Users interact with the library by contributing, searching, and executing generative models, the latter shown here exemplified for mammography image generation with models with IDs 1 to 4 described in Table 3.

Table 2
Overview of the key requirements gathered together with potential end-user alongside the respective design decisions taken toward fulfilling these requirements with medigan.

No End-user requirement
Respective design decision 1 Instead of a GUI tool, medigan should be implemented as a platform-independent library importable into users' code.
Implementation of medigan as publicly accessible Python package distributed via PyPI.

2
It should support common frameworks for building generative models, e.g., PyTorch, 67 TensorFlow, 77 Keras. 78digan is built framework-agnostic treating each model as separate Python package with freedom of choice of framework and dependencies. 3 The library should allow different types of generative models and generation processes.

4
The library should support different types of synthetic data.
medigan supports any type of synthetic data ranging from 2D and 3D images to image pairs, masks, and tabular data.7 Despite using large deep learning models, the library should be as lightweight as possible.
Only the user-requested models are downloaded and locally imported.Thus, model dependencies are not part of medigan's dependencies.

8
It should be possible to locally review and adjust a generative model of the library.
After download, a model's code and config are available for end-users to explore and adjust.medigan can also load models from local file systems.9 The library should support both CPU and GPU usage depending on a user's hardware.
Contributed medigan models are reviewed and, if need be, enhanced to run on both GPU and CPU.
10 Version and source of the models that the library load should be transparent to the end-user.13 The risk that the library downloads models that contain malicious code should be minimized.
Zenodo model uploads receive static DOIs.After verification, unsolicited uploads/ changes do not affect medigan, which points to specific DOI.

14
License and authorship of generative model contributors should be clearly stated and acknowledged.
Separation of models and library allows freedom of choice of model license and transparent authorship reported for each model.

15
Each generative model in the library should be documented.
Each available model is listed and described in medigan's documentation, in the readme, and also separately in its Zenodo entry. 16 The library should have minimal dependencies on the user side and should run on common end-user systems.
medigan has a minimal set of Python dependencies, is OS-independent, and avoids system and third-party dependencies.17 Contributing models should be simple and at least partially automated.
medigan's contribution workflow automates local model configuration, testing, packaging, Zenodo upload, and issue creation on GitHub.
18 If different models have the same dependency but with different versions, this should not cause a conflict.
Model dependency versions are specified in the config.medigan's generate method can install unsatisfied dependencies, avoiding conflicts.

19
Any model in the library should be automatically tested and results reported to make sure all models work as designed.
On each commit to main, a CI pipeline automatically builds, formats, and lints medigan before testing all models and core functions.

20
The library should make the results of the models visible with minimal code required by endusers.
medigan's simple visualization feature allows users to adjust a model's input latent vector for intuitive exploration of output diversity and fidelity.

21
The library should support large synthetic dataset generation on user machines with limited random-access memory.
For large synthetic dataset generation, medigan iteratively generates samples via small batches to avoid exceeding users' in-memory storage limits.

22
Users can specify model weights, model inputs, number, and storage location of the synthetic samples.
Diverging from defaults, users can specify (i) weights, (ii) number of samples (iii) return or store, (iv) store location, (v) optional inputs.

Model Search and Ranking
The number of models in medigan is expected to grow over time.Potentially this will lead to the foreseeable issue where users of medigan have a large number of models to choose from.Users likely will be uncertain which model best fits their needs depending on their data, modality, use-case, and research problem at hand and would have to go through each model's metadata to find the most suitable model in medigan.Hence, to facilitate model selection, the model_ selector module implements model search and ranking functionalities.This search workflow is shown in Fig. 4 and triggered by running Code Snippet 1.
The model_selector module contains a search method that takes search operator (i.e OR, AND, or XOR) and a keyword search values list as parameters and recursively searches through the models' metadata.The latter is provided by the config_manager module.The model_selector populates a modelMatchCandidates object with matchedEntry instances each of which represents a potential model match to the search query.The modelMatchCandidates class evaluates which of it is associated model matches should be flagged as true match given the search values and search operator.The method rank_ models_by_performance compares either all or specified models in medigan by a performance indicator such as FID.This indicator commonly is a metric that correlates with diversity, fidelity, or condition adherence to estimate the quality of generative models and/or the data they generate. 7The model_selector looks up the value for the specified performance indicator in the model metadata and returns a descendingly or ascendingly ranked list of models to the user.

Synthetic Data Generation
Synthetic data generation is medigan's core functionality toward overcoming scarcity of (a) training data and (b) reusable generative model in medical imaging.Posing a low entry barrier for nonexpert users, medigan's generate method is both simple and scalable.While a user can run it with only one line of code, it flexibly supports any type of generative model and synthetic data generation process, as illustrated in Table 3 and Fig. 1.

Generate workflow
An example of the usage of the generate method is shown in Code Snippet 2, which triggers the model execution workflow illustrated in Fig. 5.Further parameters of the generate method allow users to specify the number of samples to be generated (num_samples), if samples are returned as a list or stored on a disk (save_images), where they are stored (output_path), and whether model dependencies are automatically installed (install_dependencies).Optional model-specific inputs can be provided via the **kwargs parameter.These include for example, (i) a nondefault path to the model weights, (ii) a path to an input image folder for image-to-image translation models, (iii) a conditional input for class-conditional generative models, or (iv) the input_latent_vector as commonly used as model input in GANs.
Running the generate method triggers the generators module to initialize a mode-l_executor instance for the user-specified generative model.The model is identified via its model_id as unique key in the global.jsonmodel metadata database, parsed and managed by the config_manager module.Using the latter, the model_executor checks if the required Python package dependencies are installed, retrieves the Zenodo URL and downloads, unzips, and imports the model package.It further retrieves the name of the internal data generation function inside the model's __init_ _.py script.As final step before calling this function, its parameters and their default values are retrieved from the metadata and combined with userprovided arguments.These user-provided arguments customize the generation process, which enables handling of multiple image generation scenarios.For instance, the aforementioned provision of the input image folder allows users to point to their own images to transform them using medigan models that are, e.g., pretrained for cross-modality translation.In the case of large dataset generation, the number of samples indicated by num_samples are chunked into smaller-sized batches and iteratively generated to avoid overloading the random-access memory available on the user's machine.Instead of a user manually selecting a model via model_id, a model can also be automatically selected based on the recommendation from the model search and/or ranking methods.For instance, as triggered by Code Snippet 3, the models found in a search for mammography are ranked in ascending order based on FID, with the highest ranking model being selected and executed to generate the synthetic dataset.

Model Visualization
To allow users to explore the generative models in medigan, a novel model visualization module has been integrated into the library.It allows users to examine how changing inputs like the latent variable z and/or the class conditional label y (e.g., malignant/benign) can affect the generation process.Also, the correlation between multiple model outputs, such as the image and corresponding segmentation mask, can be observed and explored.Figure 6 illustrates an example showing an image-mask sample pair from medigan's polyp generating FastGAN model. 51g. 5 The generated workflow.A user specifies a model_id in a request (1) to the generators class, which checks (2) if the model's ModelExecutor class instance is already initialized.If not, a new one is created (3), which (4) gets the model's config from the global.jsondict, (5) loads the model (e.g., from Zenodo), (6) checks its dependencies, and ( 7) unzips and imports it, before running its internal generate function (8).Finally, the generated samples are returned to the user.

This depiction of the graphical user interface (GUI) of the model visualization tool can be recreated by running Code Snippet 4.
Internally, the model_visualizer module retrieves a model's internal generate method as callable from the model_executor and adjusts the input parameters based on user interaction input from the GUI.This interaction further provides insight into a model's performance and capabilities.On one hand, it allows one to assess the fidelity of the generated samples.On the other hand, it also shows the model's captured sample diversity, i.e., as observed output variation over all possible input latent vectors.We leave the automation of manual visual analysis of this output variation to future work.For instance, such future work can use the model_visualizer to measure the variance of a reconstruction/perceptual error computed between pairs of images sampled from fixed-distance pairs of latent space vectors z.The slider controls on the left of the interface allow one to change the latent variable, which for this specific model affects, for instance, polyp size, position, and background.As the size of the latent vector z commonly is relatively large, each n (e.g., 10) variables are grouped into one indexed slider resulting in z m adjustable latent input variables.The seed button on the right allows one to initialize a new set of latent variables, which results in a new generated image.The latent input vector can be adjusted via the sliders, reset via the Reset button, and sampled randomly via the Seed button.

Model Contribution
A core idea of medigan is to provide a platform where researchers can share and access trained models via a standardized interface.We provide in-depth instructions on how to contribute a model to medigan complemented by implementations automating parts of the model contribution process for users.In general, a pretrained model in medigan consists of a Python __ init __.py and, in case the generation process is based on a machine learning model, a respective checkpoint or weights file.The former needs to contain a synthetic data storage method and a data generation method with a set of standardized parameters described in Sec.3.5.1.Ideally, a model package further contains a license file, a metadata.jsonand/or a requirements.txtfile, and a test.shscript to quickly verify the model's functionalities.To facilitate creation of these files, medigan's GitHub repository provides model contributors with reusable templates for each of these files.
Keeping the effort of pretrained model inclusion to a minimum, the generators module contains a contribute function that initializes a ModelContributor class instance dedicated to automating the remainder of the model contribution process.This includes automated (i) validation of the user-provided model_id; (ii) validation of the path to the model's __ init__.py;(iii) test of importlib import of the model as package; (iv) creation of the model's metadata dictionary; (v) adding the model metadata to medigan's global.jsonmetadata; (vi) end-to-end test of model with sample generation via generators.test_model();(vii) upload of zipped model package to Zenodo via API; and (viii) creation of a GitHub issue, which contains the Zenodo link and model metadata, in the medigan repository.Being assigned to this GitHub issue, the medigan development team is notified about the new model, which can then be added via pull request.Code Snippet 5 shows how a user can run the contribute method illustrated in Fig. 7.

Model Testing Pipeline
Each new model contribution is being systematically tested before becoming part of medigan.For instance, on each submitted pull request to medigan's GitHub repository, a CI pipeline automatically builds, formats, lints, and tests medigan's codebase.This includes the automatic verification of each model's package, dependencies, compatibility with the interface, and correct functioning of its generation workflow.This allows one to ensure that all models and their metadata in the global.jsonfile are available and working in a reproducible and standardized manner.medigan facilitates sharing and reusing trained generative models with the medical research community.On one hand, this reduces the need for researchers to retrain their own similar generative models, which can reduce the extensive carbon footprint 94 of deep learning in medical imaging.On the other hand, this provides a platform for researchers and data owners to share their dataset distribution without sharing the real data points of the dataset.Put differently, sharing generative models trained on (and instead of) patient datasets not only is beneficial as data curation step, 14 but also minimizes the need to share images and personal data directly attributable to a patient.In particular, the latter can be quantifiably achieved when the generative model is trained using a differential privacy guarantee 7,95 before being added to medigan.By reducing the barriers posed by data sharing restrictions and necessary patient privacy protection regulation, medigan unlocks a new paradigm of medical data sharing via generative models.This places medigan at the center toward solving the well-known issue of data scarcity 7,9 in medical imaging.
Apart from that, medigan's generative model contributors benefit from an increased exposure, dissemination, and impact of their work, as their generative models become readily usable by other researchers.As Table 3 illustrates, to date, medigan consists of 21 pretrained deep generative models contributed to the community.Among others, these include two conditional DCGAN models, six domain translation CycleGAN models and one mask-to-image pix2pix model.The training data comes from 10 different medical imaging datasets.Various of the models were trained on breast cancer datasets including INbreast, 81 OPTIMAM, 82 BCDR, 83 CBIS-DDSM, 86 and CSAW. 88Models allow one to generate samples of different pixel resolutions ranging from regions-of-interest patches of size 128 × 128 and 256 × 256 to full images of 1024 × 1024 and 1332 × 800 pixels.

Investigating Synthetic Data Evaluation Methods
A further application of medigan is testing the properties of medical synthetic data.For instance, evaluation metrics for generative models can be readily tested in medigan's multiorgan, multimodality, and multimodel synthetic data setting.
Compared to generative modeling, synthetic data evaluation is a less explored research area. 7n particular, in medical imaging the existing evaluation frameworks, such as the FID 46 or the IS, 17 are often limited in their applicability, as mentioned in Sec.2.3.The models in medigan allow one to compare existing and new synthetic data evaluation metrics and their validation in the field of medical imaging.Multimodel synthetic data evaluation allows one to measure the correlation and statistical significance between synthetic data evaluation metrics and downstream task performance metrics.This enables the assessment of clinical usefulness of generative models on one hand and of synthetic data evaluation metrics on the other hand.In that sense, the metric itself can be evaluated including its variations when measured under different settings, datasets, or preprocessing techniques.

FID of medigan Models
We compute the FID to assess the models in medigan and report the results in Table 3.We further note that the FID can be computed not only between a synthetic and a real dataset (rs) but also between two sets of samples of the real dataset (rr).As the FID rr describes the distance within two randomly sampled sets of the real data distribution, it can be used as an estimate of the real data variation and optimal lower bound for the FID rs as shown in Table 3.Given the above, it follows that a high FID rr likely also results in a higher FID rs , which highlights the importance of accounting for the FID rr when discussing the FID rs .To do so, we propose the reporting of a FID ratio r FID to describe the FID rs in terms of the FID rr .
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 3 6 6 Assuming FID rs ≥ FID rr bounds r FID between 0 and 1, r FID the simplifies the comparison of FIDs computed using different models and datasets.A r FID close to 1 indicates that much of the FID rs can be explained by the general variation in the real dataset.The code used to compute the FID scores is available at https://github.com/RichardObi/medigan/blob/main/tests/fid.py.The models in Table 3 yielding the highest ImageNet-based r FID score are the ones with ID 10 (0.677, endoscopy, 256 × 256, FastGAN), ID 13 (0.650, mammography, 1332 × 800, CycleGAN), 14 (0.564, mammography, 1332 × 800, CycleGAN), 20 (0.543, chest x-ray, 1024 × 1024, PGGAN) and 1 (0.497, mammography, DCGAN, 128 × 128).This indicates that the r FID does not depend on the modality, nor on the pixel resolution of the synthetic images.Further, neither image-to-image translation (e.g.CycleGAN) nor noise-to-image models (e.g., PGGAN, DCGAN, FastGAN) seem to have a particular advantage for achieving higher r FID results.
The flow chart in Fig. 8 provides further insight into the comparison between the lower bound FID rr and the model FID rs .The red trend line shows a positive correlation between the FID rr and FID rs , which corroborates our previous assumption that a higher model FID rs is to be expected given a higher lower bound FID rr .Hence, for increased transparency, we motivate further studies to routinely report the lower bound FID rr and the FID ratio r FID apart from the model FID rs .The three-channel RGB endoscopic images represented by orange dots have an FID rr comparable with their grayscale radiologic counterparts.However, both chest x-ray datasets ChestX-ray14 90 and Node21 89 represented by green dots show a slightly lower FID rr than other modalities.The model FID rs shows a high variation across models without readily observable dependence on modality, generative model, or image size.

Analysing potential sources of bias in FID
The popular FID metric is computed based on the features of an Inception classifier (e.g., v1, 48 v3 47 ) trained on ImageNet 52 -a database of natural images inherently different from the domain of medical images.This potentially limits the applicability of the FID to medical imaging data.Furthermore, the FID has been observed to vary based on the input image resizing methods and ImageNet backbone feature extraction model types. 31Based on this, we further hypothesize a susceptibility of the FID to variation due to (a) different backbone feature extractor weights and random seed initializations, (b) different medical and nonmedical backbone model pretraining datasets, (c) different image normalization procedures for real and synthetic dataset, (d) nuances between different frameworks and libraries used for FID calculation, and (f) the dataset sizes used to compute the FID.
Such variations can obstruct a reliable comparison of synthetic images generated by different generative models.Illustrating the potential of medigan to analyze such variations, we report and experiment with the FID.In particular, we subject the FID to variations in (i) the pretraining dataset of its backbone feature extractor and by (ii) testing the effects of image normalization across a set of medigan models.We experiment with the Inception v3 model trained on the recent RadImageNet dataset 72 released as radiology-specific alternative to the ImageNet database. 52he RadImageNet-pretrained Inception v3 model weights we used are available at https:// github.com/BMEII-AI/RadImageNet. We further compute the FID rs and FID rr with and without normalization to analyze the respective impact on results.
In Table 4, the FID results are summarized allowing for cross-analysis between variations due to image normalization and/or due to the pretraining dataset of the FID feature extraction model.We observe generally lower FID values (1.15 to 7.32) for RadImageNet compared to ImageNet as FID model pretraining datasets (52.17 to 225.85).To increase FID comparability, we compute, as before, the FID ratio r FID .The RadImageNet-based model results in notably lower r FID values for both normalized and non-normalized images.Notably, an exception to this are models with ID 5 (mammography, 128 × 128, DCGAN) and 6 (mammography, 128 × 128, WGAN-GP) achieving respective RadImageNet-based r FID scores of 0.593 and 0.550.In general, the RadImageNet-based model seems more robust at detecting if two sets of data originate from the same distribution resulting in low FID rr values.Overall, for most models, the FID is explained only by a limited amount by the variation in the real dataset and r FID < 0.7 for all ImageNet and RadImageNet-based FIDs.The scatter plot in Fig. 9 further compares the RadImagnet-based FID with the ImageNet-FID for the models from Table 4. Noticeably, the difference between non-normalized and normalized images is surprisingly high for several models for both ImageNet and RadImageNet FIDs (e.g., models with IDs 6 and 8) while negligible for others (e.g., models with ID 1, 10, 13-16, and 19-21).Another observation is the relatively modest correlation between RadImageNet and ImageNet FID indicated by the slope of the red Fig. 8 Scatter plot illustrating the FID rs of medigan's models (real-synthetic) compared to the lower bound FID rr between two sets of the model's respective training dataset (real-real).The lower bound can represent an optimally achievable model and, as such, facilitates interpretation.Each model is represented by a dot below its model ID.The dots' color encoding depicts model modality, where blue: mammography, orange: endoscopy, green: chest x-ray, and pink: brain MRI.The red regression line illustrates the trend across all data points/models.Table 4 Normalized (left) and non-normalized (right) FID scores.This table measures the normalization impact on FID scores based on a promising set of medigan's deep generative models.Synthetic samples were randomly-drawn for each model matching the number of available real samples.The lower bound FID rr is computed between a pair of randomly sampled sets of real data (real-real), whereas the model FID rs is computed between two randomly sampled sets of real and synthetic data (real-syn).The results for model 7 (Flair, T1, T1c, T2) and 21 (T1, T2) are averaged across modalities.Given the demonstrated high impact of backbone model training set and image normalization on FID, it is to be recommended that studies specify the exact model used for FID calculation and any applied data preprocessing and normalization steps.Further, where possible, reporting the RadImageNet-based FID allows for reporting a radiology domain-specific FID.The latter is seemingly less susceptible to variation in the real datasets than the ImageNet-based FID while also being capable of capturing other, potentially complementary, patterns in the data.

Improving Clinical Medical Image Analysis
A high-impact clinical application of synthetic data is the improvement of clinical downstream task performance such as classification, detection, or treatment response estimation.This can be achieved by using image synthesis for data augmentation, domain adaptation, and data curation (e.g., artifact removal, noise reduction, super-resolution) 7,63 to enhance the performance of clinical decision support systems such as computer-aided diagnosis (CADx) and detection (CADe) software.
In Table 5, the capability of improving the clinical downstream task performance is demonstrated for various medigan models and modalities.Downstream task models trained on a combination of real and synthetic imaging data achieve promising results surpassing the alternative results achieved from training only on real data.The results are taken from the respective publications 11,14,50,84 and indicate that image synthesis can further improve the promising performance demonstrated by deep learning-based CADx and CADe systems, e.g., in mammography 96 and brain MRI. 85For downstream task evaluation, we generally note the importance of avoiding data leakage between training, validation, and test sets by training the The approaches displayed in Table 6 represent the application, where synthetic data is used instead of real data to train downstream task models.Despite an observable performance decrease when training on synthetic data only, the results 51,91,92 demonstrate the usefulness of synthetic data if none or only limited real training data is available or shareable.For example, if labels or annotations in a target domain are scarce but present in a source domain, a generative model can translate annotated data from the source domain to the target domain to enable supervised training of downstream task models. 92,93Discussion and Future Work In this work, we introduced medigan, an open-source Python library, which allows one to share pretrained generative models for synthetic medical image generation.The package is easily integrable into other packages and tools, including commercial ones.Synthetic data can enhance the performance, capabilities, and robustness of data-hungry deep learning models as well as to mitigate common issues such as domain shift, data scarcity, class imbalance, and data privacy restrictions.Training one's own generative network can be complex and expensive since it requires a considerable amount of time, effort, specific dedicated hardware, carbon emissions, as well as knowledge and applied skills in generative AI.An alternative and complementary solution is the distribution of pretrained generative models to allow their reuse by AI researchers and engineers worldwide.
medigan can help to reduce the time to run synthetic data experiments and can readily be added as a component, e.g., as a dataloader as discussed in Sec.3.5.2, in AI training pipelines.As such, the generated data can be used to improve supervised learning models as described in Sec.4.3 via training or fine-tuning but can also serve as plug-and-play data source for self/semisupervised learning, e.g., to pretrain clinical downstream task models.
Furthermore, studies that use additional synthetic training data for training deep learning models often do not report all the specifics about their underlying generative model. 7,75Within medigan, each generative model is documented, openly accessible, and reusable.This increases the reproducibility of studies that use synthetic data and makes it more transparent where the data or parts thereof originated from.This can help to achieve the traceability objectives outlined in the FUTURE-AI consensus guiding principles toward AI trustworthiness in medical imaging. 75medigan's currently 21 generative models are illustrated in Table 3 and developed and validated by AI researchers and/or specialized medical doctors.Furthermore, each model contains traceable 75 and version-controlled metadata in medigan's global.jsonfile, as outlined in To assess model suitability, users are recommended to first (i) ensure the compatibility between their planned downstream task (e.g., mammogram region-of-interest classification) and a candidate medigan model (e.g., mammogram region-of-interest generator).Second, (ii) a user's real (test) data and the model's synthetic data should be compatible corresponding, for instance, in domain, organ, or disease manifestation.If the awareness of the domain shifts between real and synthetic data remains limited after this qualitative analysis, (iii) a quantitative assessment (e.g., via FID) is recommended.Finally, (iv) it is to be assessed if a downstream task improvement is plausible.This depends, among others, on the tested scenario and the task at hand, but also on the amount, domain, task specificity and quality of the available real data, and the generative model's capabilities as indicated by its reported evaluation metrics from previous studies.If a positive impact of synthetic data on downstream task performance is plausible, users are recommended to proceed toward empirical verification.
The exploration and multimodel evaluation of the properties of generative models and synthetic data is a further application of medigan.medigan's visualization tool (see Sec. 3.6) intuitively allows the user to explore and adjust the input latent vector of generative models to visually evaluate, e.g., its inherent diversity and condition adherence 7 (i.e., how well does a given mask or label fit the generated image).The evaluation of synthetic data by human experts, such as radiologists, is a costly and time-consuming task, which motivates the usage of automated metricbased evaluation such as the FID.Our multimodel analysis reveals sources of bias in FID reporting.We show the susceptibility of FID to vary substantially based on changes in input image normalization or in the choice of the pretraining dataset of the FID feature extractor.This finding highlights the need to report the specific models, preprocessing, and implementations used to compute the FID alongside the FID ratio r FID proposed in Sec.

Legal Frameworks for Sharing of Synthetic and Real Patient Data
Many countries have enacted regulations that govern the use and sharing of data related to individuals.The two most recognized legal frameworks are the Health Insurance Portability and Accountability Act (HIPAA) 97 from the United States (U.S.) and the General Data Protection Regulation (GDPR) 98 from the European Union (E.U.).][101][102] Conceptually, synthetic data is not real data about any particular individual and conversely to real data, synthetic data can be generated at high volumes and potentially shared without restriction.In this sense, under both GDPR and HIPAA regulation, the rules govern the use of real data for the generation and evaluation of synthetic datasets, as well as the sharing of the original dataset.However, once fully synthetic data is generated, this new dataset falls outside the scope of the current regulations based on the argument that there is no direct correlation between the original subjects and the synthetic subjects.A common interpretation is that as long as the real data remains in a secure environment during the generation of synthetic data, there is little to no risk to the original subjects. 103s a consequence, the use of synthetic data can help prevent researchers from inadvertently using and possibly exposing patients identifiable data.Synthetic data can also lessen the controls imposed by Institutional Review Boards (IRBs) and based on international regulations by ensuring data is never mapped to real individuals. 104There are multiple methods of generating synthetic data, some of which include building models from real data, which can create a set statistically similar to real data.How similar the synthetic data is to real-world data often defines its "utility," which will vary depending on the synthesis methods used and the needs of the study at hand.If the utility of the synthetic data is high enough then evaluation results are expected to be similar to those that use real data. 103Being built based on real data, a common concern is patient reidentification and leaking of patient-specific features in generative models. 7,15Despite the arguably permissive aforementioned regulations, deidentification 63 of the training data prior to generative model training is to be recommended.This can minimize the possibility of generative models leaking sensitive patient data during inference and after sharing.A further recommended and mathematically-proven tool for privacy preservation is differential privacy (DP). 95DP can be included in the training of deep generative model, among other setups, by adding DP noise to the gradients.

Expansion of Available Models
In the future, further generative models across medical imaging disciplines, modalities, and organs can be integrated into medigan.The capabilities of additional models can range from privatising or translating the user's data from one domain to another, balancing or debiasing imbalanced datasets, reconstructing, denoising or removing artifacts in medical images, or resizing images, e.g., using image super-resolution techniques.Despite medigan's current focus on models based on GANs, 16 the inclusion of different additional types of generative models is desirable and will enable insightful comparisons.In particular, this is to be further emphasized considering the recent successes of diffusion models, [25][26][27] variational autoencoders, 21 and normalizing flows [22][23][24] in the computer vision and medical imaging [105][106][107] domains.Before integrating and testing a new model via the pipeline described in Sec.3.8, we assess whether a model is to become a candidate for inclusion into medigan.This threefold assessment is based on the SynTRUST framework 7 and reviews whether (1) the model is well-documented (e.g., in a respective publication), (2) the model or its synthetic data is applicable to a task of clinical relevance, and (3) whether the model has been methodically validated.

Synthetic DICOM Generation
Since the dominant data format used for medical imaging is Digital Imaging and Communications in Medicine (DICOM), we plan to enhance medigan by integrating the generation of DICOM compliant files.DICOM consists of two main components, pixel data and the DICOM header.The latter can be described as an embedded dataset rich with information related to the pixel data such as the image sequence, patient, physicians, institutions, treatments, observations, and equipment. 63Future work will explore combining our GAN generated images with synthetic DICOM headers.The latter will be created from the same training images from which the medigan models are trained to create synthetic DICOM data with high statistical similarity to real-world data.In this regard, a key research focus will be the creation of an appropriate and DICOMcompliant description of the image acquisition protocol for a synthetic image.The design and development of an open-source software package for generating DICOM files based on synthesized DICOM headers associated to (synthetic) images will extend prior work 108 that demonstrated the generation of synthetic headers for the purpose of evaluating deidentification methods.

Conclusion
We presented the open-source medigan package, which helps research in medical imaging to rapidly create synthetic datasets for a multitude of purposes such as AI model training and benchmarking, data augmentation, domain adaptation, and intercentre data sharing.medigan provides simple functions and interfaces for users, allowing one to automate generative model search, ranking, synthetic data generation, and model contribution.By reuse and dissemination of existing generative models in the medical imaging community, medigan allows researchers to speed up their experiments with synthetic data in a reproducible and transparent manner.
We discuss three key applications of medigan, which include (i) sharing of restricted datasets, (ii) improving clinical downstream task performance, and (iii) analyzing the properties of generative models, synthetic data, and associated evaluation metrics.Ultimately, the aim of medigan is to contribute to benefiting patients and clinicians, e.g., by increasing the performance and robustness of AI models in clinical decision support systems.

Fig. 4
Fig. 4 The search workflow.A user sends a search query (1) to the generators class, which triggers a search (2) via the ModelSelector class.The latter retrieves the global.jsonmodel metadata/ config dict (3), in which it searches for query values finding matching models (4).Next, the matched models are optionally also ranked based on a user-defined performance indicator (5) before being returned as list to the user.

Fig. 6
Fig.6Graphical user interface of medigan's model visualization tool on the example of model 10, a FastGAN that synthesizes endoscopic polyp images with respective masks.51The latent input vector can be adjusted via the sliders, reset via the Reset button, and sampled randomly via the Seed button.

1
Community-Wide Data Access: Sharing the Essence of Restricted Datasets

Fig. 7
Fig. 7 Model contribution workflow.After model preparation (1), a user provides the model's id and metadata (2) to the generators class to (3) initialize a ModelContributor instance, which (4) validates and (5) extends the metadata.Next, (6) the model's sample generation capability is tested after (7) integration into medigan's global.jsonmodel metadata.If successful, (8) the model package is prepared and (9-13) pushed to Zenodo via API.Lastly, (14 and 15) a GitHub issue containing the model metadata is created, assigned, and pushed to the medigan repository.
4.2.1 to account for the variation immanent in the real dataset.With medigan model experiments demonstrably leading to insights in synthetic data evaluation, future research can use medigan as a tool to accelerate, test, analyze, and compare new synthetic data and generative model evaluation and exploration techniques.

Table 1
Overview of medigan library information.
Model contribution is traceable via version control.Adding models to medigan requires a config change via pull request.
75del MetadataThe FID score and all other model information such as dependencies, modality, type, zenodo link, associated publications, and generate function parameters are stored in a single comprehensive model metadata json file.Alongside its searchability, readability, and flexibility, the choice of json as file format is motivated by its extendability to a nonrelational database.As a single source of model information, the global.jsonfileconsists of an array of model IDs, where under each model id the respective model metadata is stored.Toward ensuring model traceability as recommended by the FUTURE-AI consensus guidelines,75each model (on Zenodo) and its global.jsonmetadata(onGitHub) are version-controlled with the latter being structured into the following objects.i.execution: contains the information needed to download, package, and run the model resources.ii.selection: contains model evaluation metrics and further information used to search, compare, and rank models.iii.description:contains general information and main details about the model such as title, training dataset, license, date, and related publications.This global.jsonmetadata file is retrieved, provided, and handled by the config_manager module once a user imports the generators module.This facilitates rapid access to a model's metadata given its model_id and allows one to add new models or model versions to medigan via pull request without requiring a new release of the library.

Table 3
Models currently available in medigan.Also, computed FID scores for each model in medigan are shown.The number of real samples used for FID calculation is indicated by #imgs.The lower bound FID rr is computed between a pair of randomly sampled sets of real data (real-real), whereas the model FID rs is computed between two randomly sampled sets of real and synthetic data (real-syn).The results for model 7 (Flair, T1, T1c, T2) and 21 (T1, T2) are averaged across modalities.Osuala et al.: medigan: a Python library of pretrained generative models for enriched data access. . .As a further alternative, a torch 67 dataset or dataloader can be returned for any model in medigan running get_as_torch_dataset or get_as_torch_ dataloader, respectively.This further increases the versatility with which users can introduce medigan's data synthesis capabilities into their AI model training and data preprocessing pipelines.
• Vol.10(6) Downloaded From: https://www.spiedigitallibrary.org/journals/Journal-of-Medical-Imaging on 08 Oct 2023 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use3.5.2 Generate workflow extensionsApart from storing or returning samples, a callable of the model's internal generate function can be returned to the user by setting is_gen_function_returned.This function with prepared but adjustable default arguments enables integration of the generate method into other workflows within medigan (e.g., model visualization) or outside of medigan (e.g., a user's AI model training).
47atter plot demonstrating the FID rs (real-synthetic) of medigan models from Table4.The FID rs is based on the features of two different inception classifiers,47one trained on ImageNet 52 (x -axis) and the other trained on RadImageNet 72 (y -axis).Each model is represented by a dot below its model ID.A black dot indicates an FID calculated from normalized (Norm∕N) images, e.g., with pixel values scaled between 0 and 1, as opposed to a blue dot indicating an FID calculated from images without previous normalization.The dots that correspond to the same model IDs (normalized and non-normalized) are connected via black lines.The red regression line illustrates the trend across all data points.regressionline.Counterexamples for this correlation include model 2 (normalized), which has a low ImageNet-based FID (80.51) compared to a high RadImageNet-based FID(6.19), and model 6 (normalized), which, in contrast, has a high ImageNet-based FID (221.30) and a low RadImageNet-based FID(1.80).With a low ImageNet-based FID (63.99), but surprisingly high RadImageNet-based FID (7.32), model 10 (both normalized and non-normalized) is a further counterexample.The example of model 10 is of particular interest, as it indicates limited applicability of the Radiology-specific RadImageNet-based FID for out-of-domain data, such as three-channel endoscopic images.

Table 5
Examples of the impact of synthetic data generated by medigan models on downstream task performance.Based on real test data, we compare the performance metrics of a model trained only on real data with a model trained on real data augmented with synthetic data.The metrics are taken from the respective publications describing the models.Osuala et al.: medigan: a Python library of pretrained generative models for enriched data access...generative model either on only the dataset partition used to train the respective downstream task model (e.g.,IDs 2, 3, 7, 14, 15)or to train the generative models on an entirely different dataset (e.g.,IDs 5, 6).

Table 6
Examples of the impact of synthetic data generated by medigan models on downstream task performance.Based on real test data, we compare the performance metrics of a model trained only on real data with a model trained only on synthetic data.The metrics are taken from the respective publications describing the models.n.a.refers to the case where only synthetic data can be used, as no annotated real training data is available.Sec.3.3.The searchable (see Sec. 3.4) metadata allows one to choose a suitable model for a user's task at hand and includes, among others, the dataset used during the training process, the trained date, publication, modality, input arguments, model types, and comparable evaluation metrics.