This study utilizes a novel method for learning representations from chest X-rays using a memory-driven transformerbased approach. The model is trained on a low-quality version of the MIMIC-CXR dataset, utilizing 17,783 chest X-rays that contain at most 3 views. The model uses a relational memory to record crucial information during the generation process and a memory-driven conditional layer normalization technique to integrate this memory into the transformer's decoder. The dataset is divided into distinct sets for training, validation, and testing. We aim to establish an intuitively comprehensible quantitative metric, through vectorization of the radiology report. This metric leverages the learned representations from our model to classify 14 unique lung pathologies. The F1-score measures classification accuracy, indicating the model's viability in diagnosing lung diseases. We also have introduced the use of Large Language Models (LLMs) for evaluation of the generated reports accuracy. The model's potential applications extend to more robust performance in radiology report generation.
|