Estimating the number of vehicles in traffic videos is an important and challenging task in traffic surveillance, especially with a high level of occlusions between vehicles, e.g.,in crowded urban area with people and/or motorbikes. In such the condition, the problem of separating individual vehicles from foreground silhouettes often requires complicated computation . Thus, the counting problem is gradually shifted into drawing statistical inferences of target objects density from their shape , local features , etc. Those researches indicate a correlation between local features and the number of target objects. However, they are inadequate to construct an accurate model for vehicles density estimation. In this paper, we present a reliable method that is robust to illumination changes and partial affine transformations. It can achieve high accuracy in case of occlusions. Firstly, local features are extracted from images of the scene using Speed-Up Robust Features (SURF) method. For each image, a global feature vector is computed using a Bag-of-Words model which is constructed from the local features above. Finally, a mapping between the extracted global feature vectors and their labels (the number of motorbikes) is learned. That mapping provides us a strong prediction model for estimating the number of motorbikes in new images. The experimental results show that our proposed method can achieve a better accuracy in comparison to others.