Currently, considerable efforts have been devoted to devise image representation. However, handcrafted methods need strong domain knowledge and show low generalization ability, and conventional feature learning methods require enormous training data and rich parameters tuning experience. A lightened feature learner is presented to solve these problems with application to face recognition, which shares similar topology architecture as a convolutional neural network. Our model is divided into three components: cascaded convolution filters bank learning layer, nonlinear processing layer, and feature pooling layer. Specifically, in the filters learning layer, we use K-means to learn convolution filters. Features are extracted via convoluting images with the learned filters. Afterward, in the nonlinear processing layer, hyperbolic tangent is employed to capture the nonlinear feature. In the feature pooling layer, to remove the redundancy information and incorporate the spatial layout, we exploit multilevel spatial pyramid second-order pooling technique to pool the features in subregions and concatenate them together as the final representation. Extensive experiments on four representative datasets demonstrate the effectiveness and robustness of our model to various variations, yielding competitive recognition results on extended Yale B and FERET. In addition, our method achieves the best identification performance on AR and labeled faces in the wild datasets among the comparative methods.