The focus of this paper is to illustrate how computational image processing and machine learning can help address
two of the challenges of histological image analysis, namely, the cellular heterogeneity, and the imprecise labeling.
We propose an unsupervised method of generating representative image signatures based on an autoencoder
architecture which reduces the dependency on labels that tend to be imprecise and tedious to get. We have
modified and enhanced the architecture to simultaneously produce representative image features as well as
perform dictionary learning on these features to enable robust characterization of the cellular phenotypes. We
integrate the extracted features in a disease grading framework, test it in prostate tissues immunostained for
different protein visualization and show significant improvement in terms of grading accuracy compared to
alternative supervised feature-extraction methods.
Images of tissue specimens enable evidence-based study of disease susceptibility and stratification. Moreover, staining technologies empower the evidencing of molecular expression patterns by multicolor visualization, thus enabling personalized disease treatment and prevention. However, translating molecular expression imaging into direct health benefits has been slow. Two major factors contribute to that. On the one hand, disease susceptibility and progression is a complex, multifactorial molecular process. Diseases, such as cancer, exhibit cellular heterogeneity, impeding the differentiation between diverse grades or types of cell formations. On the other hand, the relative quantification of the stained tissue selected features is ambiguous, tedious and time consuming, prone to clerical error, leading to intra- and inter-observer variability and low throughput. Image analysis of digital histopathology images is a fast-developing and exciting area of disease research that aims to address the above limitations. We have developed a computational framework that extracts unique signatures using color, morphological and topological information and allows the combination thereof. The integration of the above information enables diagnosis of disease with AUC as high as 0.97. Multiple staining show significant improvement with respect to most proteins, and an AUC as high as 0.99.