Various approaches have been proposed for video anomaly detection. Yet these approaches typically suffer from one or more limitations: they often characterize the pattern using its internal information, but ignore its external relationship which is important for local anomaly detection. Moreover, the high-dimensionality and the lack of robustness of pattern representation may lead to problems, including overfitting, increased computational cost and memory requirements, and high false alarm rate. We propose a video anomaly detection framework which relies on a heterogeneous representation to account for both the pattern’s internal information and external relationship. The internal information is characterized by slow features learned by slow feature analysis from low-level representations, and the external relationship is characterized by the spatial contextual distances. The heterogeneous representation is compact, robust, efficient, and discriminative for anomaly detection. Moreover, both the pattern’s internal information and external relationship can be taken into account in the proposed framework. Extensive experiments demonstrate the robustness and efficiency of our approach by comparison with the state-of-the-art approaches on the widely used benchmark datasets.