The number of health-care associated infections is increasing worldwide. Hand hygiene has been identified as one of the most crucial measures to prevent bacteria from spreading. However, compliance with recommended procedures for hand hygiene is generally poor, even in modern, industrialized regions. We present an optical assistance system for monitoring the hygienic hand disinfection procedure which is based on machine learning. Firstly, each hand and underarm of a person is detected in a down-sampled 96 px x 96 px depth video stream by pixelwise classification using a fully convolutional network. To gather the required amount of training data, we present a novel approach in automatically labeling recorded data using colored gloves and a color video stream that is registered to the depth stream. The colored gloves are used to segment the depth data in the training phase. During inference, the colored gloves are not required. The system detects and separates detailed hand parts of interacting, self-occluded hands within the observation zone of the sensor. Based on the location of the segmented hands, a full resolution region of interest (ROI) is cropped. A second deep neural network classifies the ROI into ten separate process steps (gestures), with nine of them based on the recommended hand disinfection procedure of the World Health Organization, and an additional error class. The combined system is cross-validated with 21 subjects and predicts with an accuracy of 93.37% (± 2.67%) which gesture is currently executed. The feedback is provided with 30 frames per second.