In this work, we aim to address the needs of human analysts to automatically summarize the content of large swaths of overhead imagery. We present our approach to this problem using deep neural networks, providing detection and segmentation information to enable fine-grained description of scene content for human ingestion. Four different perception systems were run on blocks of large-scale satellite imagery: (1) semantic segmentation of roads, buildings, and vegetation; (2) zone segmentation to identify commercial, industrial, residential, and airport zones; (3) classification of objects such as helipads, silos, and water towers; and (4) object detection to find vehicles. Results are filtered based on a user's zoom level in the swath, and subsequently summarized as textual bullets and statistics. Our framework blocks the image swaths at a resolution of approximately 30cm for each perception system. For semantic segmentation, overlapping imagery is processed to avoid edge artifacts and improve segmentation results by voting for the category label of each pixel in the scene visible from multiple chips. Our approach to zone segmentation is based on classification models that vote for a chip belonging to a particular zone type. Regions surrounded by chips classified as a particular category are assigned a higher score. We also provide an overview of our experience using OpenStreetMap (OSM) for pixel-wise annotation (for semantic segmentation), image-level labels (for classification), and end-to-end captioning methods (image to text). These capabilities are envisioned to aid the human analyst through an interactive user interface, whereby scene content is automatically summarized and updated as the user pans and zooms within the imagery.