We propose a coarse-to-fine method automatically detecting facial landmarks for both rigid and nonrigid facial deformations. For an input 3-D face, first, we roughly detect the nose by employing 3-D local shape descriptors and further detect the tip of the nose with the tip features, e.g., the symmetry of the face. Second, we localize the eyes and mouth according to the distribution of human facial features and use a convolutional neural network to minimize the combined loss and to provide candidates for the corners of the eyes and mouth. Finally, for accurately detecting the landmarks of the eyes and mouth, we iteratively update the candidates by maximizing the similarity of the candidates and the landmarks based on the features of the candidate and its neighbors. Over Bosphorus and CASIA datasets, we evaluate the proposed method. Experiments show that compared with the state-of-the-art methods, our method detects the corners of eyes and mouth more accurately and robustly.