Presentation + Paper
15 February 2021 Automated video summarization and label assignment for otoscopy videos using deep learning and natural language processing
Author Affiliations +
Abstract
Tympanic membrane (TM) diseases are among the most frequent pathologies, affecting the majority of the pediatric population. Video otoscopy is an effective tool for diagnosing TM diseases. However, access to Ear, Nose, and Throat (ENT) physicians is limited in many sparsely-populated regions worldwide. Moreover, high inter- and intra-reader variability impair accurate diagnosis. This study proposes a digital otoscopy video summarization and automated diagnostic label assignment model that benefits from the synergy of deep learning and natural language processing (NLP). Our main motivation is to obtain the key visual features of TM diseases from their short descriptive reports. Our video database consisted of 173 otoscopy records from three different TM diseases. To generate composite images, we utilized our previously developed semantic segmentation-based stitching framework, SelectStitch. An ENT expert reviewed these composite images and wrote short reports describing the TM's visual landmarks and the disease for each ear. Based on NLP and a bag-of-words (BoW) model, we determined the five most frequent words characterizing each TM diagnostic category. A neighborhood components analysis was used to predict the diagnostic label of the test instance. The proposed model provided an overall F1-score of 90.2%. This is the first study to utilize textual information in computerized ear diagnostics to the best of our knowledge. Our model has the potential to become a telemedicine application that can automatically make a diagnosis of the TM by analyzing its visual descriptions provided by a healthcare provider from a mobile device.
Conference Presentation
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hamidullah Binol, M. Khalid Khan Niazi, Charles Elmaraghy, Aaron C. Moberly, and Metin N. Gurcan "Automated video summarization and label assignment for otoscopy videos using deep learning and natural language processing", Proc. SPIE 11601, Medical Imaging 2021: Imaging Informatics for Healthcare, Research, and Applications, 116010S (15 February 2021); https://doi.org/10.1117/12.2582009
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Video processing

Diagnostics

Ear

Visualization

Composites

Image segmentation

Back to Top