Smile detection plays an important role in human emotion analysis and has wide applications. However, there is still a gap between the performance of the current smile detection algorithms and real-world applications, due to variations of head pose and environment noise. We propose a robust framework based on convolutional neural networks (CNNs) for smile detection. To alleviate the influence of head pose variations and improve performance, the proposed framework customizes two-feature learning layers such as (1) smile feature extraction layer is constructed by hidden factor analysis for learning head pose-insensitive smile features; (2) smile feature discrimination layer is constructed by marginal Fisher analysis, and it is used to learn discriminative features for further enhancing the discrimination between smile and nonsmile. The two layers both work as fully connected layers, and they are connected layer by layer to a backbone CNN network. Experiments have been performed on two publicly available datasets, and the results show that the proposed framework delivers promising performance (95.45% on GENKI4K and 93.62% on labeled faces in the wild attribute) and outperforms the state of the art.