Quantitative microscopy has been extensively used in biomedical research and has provided significant insights into structure and dynamics at the cell and tissue level. The entire procedure of quantitative microscopy is comprised of specimen preparation, light absorption/reflection/emission from the specimen, microscope optical processing, optical/electrical conversion by a camera or detector, and computational processing of digitized images. Although many of the latest digital signal processing techniques have been successfully applied to compress, restore, and register digital microscope images, automated approaches for recognition and understanding of complex subcellular patterns in light microscope images have been far less widely used. We describe a systematic approach for interpreting protein subcellular distributions using various sets of subcellular location features (SLF), in combination with supervised classification and unsupervised clustering methods. These methods can handle complex patterns in digital microscope images, and the features can be applied for other purposes such as objectively choosing a representative image from a collection and performing statistical comparisons of image sets.
The central goal of proteomics is to clarify the mechanism by which each protein in a given cell type carries out its function. Automated protein subcellular location determination by fluorescence microscopy can play an important role in fulfilling this goal. The subcellular location of a protein is critical to understanding its function because each subcellular compartment has a unique biochemical environment. We have previously shown that neural network classifiers using sets of numerical features computed from fluorescence microscope images were able to recognize all major subcellular location patterns with reasonable accuracy. Current classifiers are limited by under-determined classification boundaries due to the limited number of available images compared to the number of features. In this paper, we compare various feature reduction methods that can address this problem. Specifically, principal component analysis, kernel principal component analysis, nonlinear principal component analysis, independent component analysis, classification trees, fractal dimensionality reduction, stepwise discriminant analysis, and genetic algorithms are used to select feature subsets that are evaluated using support vector machine classifiers. The best results were obtained using stepwise discriminant analysis and we found that as few as eight features can provide good classification accuracy for all major subcellular patterns in HeLa cells.