The growing popularity of black box machine learning methods for medical image analysis makes their interpretability to a crucial task. To make a system, e.g. a trained neural network, trustworthy for a clinician, it needs to be able to explain its decisions and predictions. In this work, we tackle the problem of generating plausible explanations for the predictions of medical image classifiers, that are trained to differentiate between different types of pathologies and healthy tissues. An intuitive solution to determine which image regions influence the trained classifier is to find out whether the classification results change when those regions are deleted. This idea can be formulated as a minimization problem and thus efficiently implemented. However, the meaning of “deletion” of image regions, in our case pathologies in medical images, is not defined. We contribute by defining the deletion of pathologies, as the replacement by their healthy looking equivalent generated using variational autoencoders. The experiments with a classification neural network on OCT (Optical Coherence Tomography) images and brain lesion MRIs show that a meaningful replacement of “deleted” image regions has significant impact on the reliability of the generated explanations. The proposed deletion method is proven to be successful since our approach delivers the best results compared to four other established methods.