In this paper, we propose a highlight generation method using contextual information and perception. The proposed method consists of three steps. In the first step, a long video is segmented into shots which are generated by an uninterrupted camera operation. In the second step, the contextual information is computed from video shots. We divide the contextual information into local and global contextual information. We represent the local contextual information in the shot with foreground information, shot activity, and background information. The global contextual information of a shot is represented by shots' interaction and coherency with other shots. Based on the contextual information, the story unit boundaries are detected. For each story unit, we determine meaningful shot candidates by computing shot length, shot activity, contrast value, and foreground object size. Finally, from the candidates, the meaningful shots are selected by applying perceptual grouping rule inversely. By concatenating selected shots, video highlights are generated.