In this paper, we propose a new effective and robust framework to recognize human actions from depth map sequence. Firstly, 3D motion trail model (3DMTM) is extracted to represent the temporal motion information. Then, two effective heterogeneous features are proposed to descried actions more comprehensive based on 3DMTM. By computing Multilayer Histograms of Oriented Gradient (MHOG) on 3DMTM, 3DMTM-MHOG is obtained to describe local detail information of different actions. Combining Gist and 3DMTM, we can get 3DMTM-Gist to model holistic structural feature of actions. The feature-level fusion method is utilized to merge two descriptors to form the final feature. Lastly, support vector machine (SVM) classification is used for multi-class action recognition. Experimental results on public depth action dataset (MSR Action3D dataset) show that our method is superior to the state-of-the-art methods.