Face recognition in surveillance is a hot topic in computer vision due to the strong demand for public security and remains a challenging task owing to large variations in viewpoint and illumination of cameras. In surveillance, image sets are the most natural form of input by incorporating tracking. Recent advances in set-based matching also show its great potential for exploring the feature space for face recognition by making use of multiple samples of subjects. In this paper, we propose a novel method that exploits the salient features (such as eyes, noses, mouth) in set-based matching. To represent image sets, we adopt the affine hull model, which can general unseen appearances in the form of affine combinations of sample images. In our proposal, a robust part detector is first used to find four salient parts for each face image: two eyes, nose, and mouth. For each part, we construct an affine hull model by using the local binary pattern histograms of multiple samples of the part. We also construct an affine model for the whole face region. Then, we find the closest distance between the corresponding affine hull models to measure the similarity between parts/face regions, and a weighting scheme is introduced to combine the five distances (four parts and the whole face region) to obtain the final distance between two subjects. In the recognition phase, a nearest neighbor classifier is used. Experiments on the public ChokePoint dataset and our dataset demonstrate the superior performance of our method.