Quantitative histomorphometry (QH) is the process of computerized extraction of features from digitized tissue slide images. Typically these features are used in machine learning classifiers to predict disease presence, behavior and outcome. Successful robust classifiers require features that both discriminate between classes of interest and are stable across data from multiple sites. Feature stability may be compromised by variation in slide staining and scanning procedures. These laboratory specific variables include dye batch, slice thickness and the whole slide scanner used to digitize the slide. The key therefore is to be able to identify features that are not only discriminating between the classes of interest (e.g. cancer and non-cancer or biochemical recurrence and non- recurrence) but also features that will not wildly fluctuate on slides representing the same tissue class but from across multiple different labs and sites. While there has been some recent efforts at understanding feature stability in the context of radiomics applications (i.e. feature analysis of radiographic images), relatively few attempts have been made at studying the trade-off between feature stability and discriminability for histomorphometric and digital pathology applications. In this paper we present two new measures, preparation-induced instability score (PI) and latent instability score (LI), to quantify feature instability across and within datasets. Dividing PI by LI yields a ratio for how often a feature for a specific tissue class (e.g. low grade prostate cancer) is different between datasets from different sites versus what would be expected from random chance alone. Using this ratio we seek to quantify feature vulnerability to variations in slide preparation and digitization. Since our goal is to identify stable QH features we evaluate these features for their stability and thus inclusion in machine learning based classifiers in a use case involving prostate cancer. Specifically we examine QH features which may predict 5 year biochemical recurrence for prostate cancer patients who have undergone radical prostatectomy from digital slide images of surgically excised tissue specimens, 5 year biochemical recurrence being a strong predictor of disease recurrence. In this study we evaluated the ability of our feature robustness indices to identify the most stable and predictive features of 5 year biochemical recurrence using digitized slide images of surgically excised prostate cancer specimens from 80 different patients across 4 different sites. A total of 242 features from 5 different feature families were investigated to identify the most stable QH features from our set. Our feature robustness indices (PI and LI) suggested that five feature families (graph, shape, co-occurring gland tensors, gland sub-graphs, texture) were susceptible to variations in slide preparation and digitization across various sites. The family least affected was shape features in which 19.3% of features varied across laboratories while the most vulnerable family, at 55.6%, was the gland disorder features. However the disorder features were the most stable within datasets being different between random halves of a dataset in an average of just 4.1% of comparisons while texture features were the most unstable being different at a rate of 4.7%. We also compared feature stability across two datasets before and after color normalization. Color normalization decreased feature stability with 8% and 34% of features different between the two datasets in two outcome groups prior to normalization and 49% and 51% different afterwards. Our results appear to suggest that evaluation of QH features across multiple sites needs to be undertaken to assess robustness and class discriminability alone should not represent the benchmark for selection of QH features to build diagnostic and prognostic digital pathology classifiers.