Digitization of full biopsy slides using the whole slide imaging technology has provided new opportunities for understanding the diagnostic process of pathologists and developing more accurate computer aided diagnosis systems. However, the whole slide images also provide two new challenges to image analysis algorithms. The first one is the need for simultaneous localization and classification of malignant areas in these large images, as different parts of the image may have different levels of diagnostic relevance. The second challenge is the uncertainty regarding the correspondence between the particular image areas and the diagnostic labels typically provided by the pathologists at the slide level. In this paper, we exploit a data set that consists of recorded actions of pathologists while they were interpreting whole slide images of breast biopsies to find solutions to these challenges. First, we extract candidate regions of interest (ROI) from the logs of pathologists' image screenings based on different actions corresponding to zoom events, panning motions, and fixations. Then, we model these ROIs using color and texture features. Next, we represent each slide as a bag of instances corresponding to the collection of candidate ROIs and a set of slide-level labels extracted from the forms that the pathologists filled out according to what they saw during their screenings. Finally, we build classifiers using five different multi-instance multi-label learning algorithms, and evaluate their performances under different learning and validation scenarios involving various combinations of data from three expert pathologists. Experiments that compared the slide-level predictions of the classifiers with the reference data showed average precision values up to 62% when the training and validation data came from the same individual pathologist's viewing logs, and an average precision of 64% was obtained when the candidate ROIs and the labels from all pathologists were combined for each slide.