Feature extraction is very important to pattern recognition. For many image recognition tasks, it is very hard to directly extract the explicit geometrical features of the images. In this case, global feature extraction is often used. Principal Component Analysis (PCA) is a typical global feature extraction method. However, PCA assumes the image population as Gaussian distribution and produces a set of compact features, which are the coefficients of the basis functions with largest eigenvalues. Compared with compact features of PCA, sparse features seem more attractive for recognition tasks. In this paper, three algorithms that produce sparse feature are studied. Independent Component Analysis (ICA) and sparse coding (SP) can describe non-Gaussian distribution. The discriminatory sparse coding (DSP) is a variation of SP, which incorporates class label information of the training samples. Experiments results of face recognition show sparse features have more advantage over compact features. DSP gets the best results for its clustering property of the features.