Selecting training samples for ovarian cancer classification via a semi-supervised clustering approach

Jennifer Salguero L.; Prateek Prasanna; Germán Corredor; Angel Cruz-Roa; David Becerra; Eduardo Romero

doi:10.1117/12.2612984

4 April 2022 Selecting training samples for ovarian cancer classification via a semi-supervised clustering approach

Jennifer Salguero L., Prateek Prasanna, Germán Corredor, Angel Cruz-Roa, David Becerra, Eduardo Romero

Proceedings Volume 12039, Medical Imaging 2022: Digital and Computational Pathology; 1203906 (2022) https://doi.org/10.1117/12.2612984
Event: SPIE Medical Imaging, 2022, San Diego, California, United States

Abstract

Machine learning techniques have shown great promise in digital pathology. However, a major bottleneck is the difficulty of annotating necessary amount of tissue to deal with several variability factors, namely chemical fixation, sample slicing, or staining. Usually, models are trained using sets of annotated small image patches, but then, the number of required patches may increase exponentially and yet they must represent such variability. This paper presents a method for automatic sample selection to train a classifier for ovarian cancer by integrating a novel soft clustering strategy. The method starts by classifying a large set of patches with a previously trained classifier and divide patches from the cancer class as highly and moderately confident. An unsupervised selection of moderately confident patches by a Probabilistic Latent Semantic Analysis (PLSA), picks samples from relevant and meaningful groups with maximum within-group variance. A new model is re-trained using the highly confident patches together with patches obtained from the associated PLSA. This strategy outperforms a model trained with a larger set of annotated patches while the training times and the number of samples are much more smaller. The strategy was evaluated in a set of patches from 18 patients with Serous Ovarian Cancer, obtaining a reduction of 54.62% in the training time and 73.66% in the number of samples, while recall rate improved from 0.69 to 0.73.

Conference Presentation

Citation Download Citation

Jennifer Salguero L., Prateek Prasanna, Germán Corredor, Angel Cruz-Roa, David Becerra, and Eduardo Romero "Selecting training samples for ovarian cancer classification via a semi-supervised clustering approach", Proc. SPIE 12039, Medical Imaging 2022: Digital and Computational Pathology, 1203906 (4 April 2022); https://doi.org/10.1117/12.2612984

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available