Computational histomorphometric approaches typically use low-level image features for building machine learning classifiers. However, these approaches usually ignore high-level expert knowledge. A computational model (M_im) combines low-, mid-, and high-level image information to predict the likelihood of cancer in whole slide images. Handcrafted low- and mid-level features are computed from area, color, and spatial nuclei distributions. High-level information is implicitly captured from the recorded navigations of pathologists while exploring whole slide images during diagnostic tasks. This model was validated by predicting the presence of cancer in a set of unseen fields of view. The available database was composed of 24 cases of basal-cell carcinoma, from which 17 served to estimate the model parameters and the remaining 7 comprised the evaluation set. A total of 274 fields of view of size
were extracted from the evaluation set. Then 176 patches from this set were used to train a support vector machine classifier to predict the presence of cancer on a patch-by-patch basis while the remaining 98 image patches were used for independent testing, ensuring that the training and test sets do not comprise patches from the same patient. A baseline model (M_ex) estimated the cancer likelihood for each of the image patches. M_ex uses the same visual features as M_im, but its weights are estimated from nuclei manually labeled as cancerous or noncancerous by a pathologist. M_im achieved an accuracy of 74.49% and an
-measure of 80.31%, while M_ex yielded corresponding accuracy and F-measures of 73.47% and 77.97%, respectively.