Traditional motion-based trackers often fail in maritime environments due to a lack of image features to help stabilize video. In this paper, we describe a computationally efficient approach which automatically detects, tracks and classifies different objects within aerial full motion video (FMV) sequences in the maritime domain. A multi-layered saliency detector is utilized to first remove any image regions likely belonging to background categories (ie, calm water) followed-by progressively pruning out distractor categories such as wake, debris, and reflection. This pruning stage combines features generated at the level of each individual pixel, with 2D descriptors formulated around the outputs of prior stages grouped into connected components. Additional false positive reduction is performed via aggregating detector outputs across multiple frames, by formulating object tracks from these detections and, lastly, by classifying the resultant tracks using machine learning techniques. As a by-product, our system also produces image descriptors specific to each individual object, which are useful in later pipeline elements for appearance-based indexing and matching.