Recently in the field of digital pathology, there have been promising advances with regards to deep learning for pathological images. These methods are often considered “black boxes”, where tracing inputs to outputs and diagnosing errors is a difficult task. This is important as neural networks are fragile, and dataset variation, which in digital pathology is attributed to biological variance, can cause low accuracy. In deep learning, this is typically addressed by adding data to the training set. However, training is costly and time-consuming to create and may not address all variation seen in these images. Digitized histology carries a great deal of variation across many dimensions (color / stain variation, lighting intensity, presentation of a disease, etc.), and some of these “low-level” image variations may cause a deep network to break due to their fragility. In this work, we use a unique dataset – cases of serially-registered H and E tissue samples from oral cavity cancer (OCC) patients – to explore the errors of a classifier trained to identify and segment different tissue types. Registered serial sections allow us to eliminate variability due to biological structure and focus on image variability including staining and lighting, and try to identify sources of error that may cause deep learning to fail. We find that perceptually-insignificant changes in an image (minor lighting and color shifts) can result in extremely poor classification performance, even when the training process tries to prevent overfitting. This suggests that great care must be taken to augment and normalize datasets to prevent errors.