In the field of pedagogy or educational psychology, emotions are treated as very important factors, which are closely associated with cognitive processes. Hence, it is meaningful for teachers to analyze students’ emotions in classrooms, thus adjusting their teaching activities and improving students ’ individual development. To provide a benchmark for different expression recognition algorithms, a large collection of training and test data in classroom environment has become an acute problem that needs to be resolved. In this paper, we present a multimodal spontaneous database in real learning environment. To collect the data, students watched seven kinds of teaching videos and were simultaneously filmed by a camera. Trained coders made one of the five learning expression labels for each image sequence extracted from the captured videos. This subset consists of 554 multimodal spontaneous expression image sequences (22,160 frames) recorded in real classrooms. There are four main advantages in this database. 1) Due to recorded in the real classroom environment, viewer’s distance from the camera and the lighting of the database varies considerably between image sequences. 2) All the data presented are natural spontaneous responses to teaching videos. 3) The multimodal database also contains nonverbal behavior including eye movement, head posture and gestures to infer a student ’ s affective state during the courses. 4) In the video sequences, there are different kinds of temporal activation patterns. In addition, we have demonstrated the labels for the image sequences are in high reliability through Cronbach's alpha method.