Humans have always had a keen interest in understanding activities and the surrounding environment for mobility, communication, and survival. Thanks to recent progress in photography and breakthroughs in aviation, we are now able to capture tens of megapixels of ground imagery, namely Wide Area Motion Imagery (WAMI), at multiple frames per second from unmanned aerial vehicles (UAVs). WAMI serves as a great source for many applications, including security, urban planning and route planning. These applications require fast and accurate image understanding which is time consuming for humans, due to the large data volume and city-scale area coverage. Therefore, automatic processing and understanding of WAMI imagery has been gaining attention in both industry and the research community. This paper focuses on an essential step in WAMI imagery analysis, namely vehicle classiﬁcation. That is, deciding whether a certain image patch contains a vehicle or not. We collect a set of positive and negative sample image patches, for training and testing the detector. Positive samples are 64 × 64 image patches centered on annotated vehicles. We generate two sets of negative images. The ﬁrst set is generated from positive images with some location shift. The second set of negative patches is generated from randomly sampled patches. We also discard those patches if a vehicle accidentally locates at the center. Both positive and negative samples are randomly divided into 9000 training images and 3000 testing images. We propose to train a deep convolution network for classifying these patches. The classiﬁer is based on a pre-trained AlexNet Model in the Caﬀe library, with an adapted loss function for vehicle classiﬁcation. The performance of our classiﬁer is compared to several traditional image classiﬁer methods using Support Vector Machine (SVM) and Histogram of Oriented Gradient (HOG) features. While the SVM+HOG method achieves an accuracy of 91.2%, the accuracy of our deep network-based classiﬁer reaches 97.9%.