We describe the development, experimentation, collected data, and results of research designed to gain an understanding of the temporal and spatial image collection guidelines for tracking humans. More specifically, we seek a quantitative understanding of the relationship between human observer performance and the spatial and temporal resolutions. We measure performance as a function of the number of video frames per second, the imager spatial resolution, and the ability of the observer to accurately determine the destination of a moving human target. Our research is restricted to data and imagery collected from typical modern, low- to mid-altitude, persistent surveillance platforms using a wide field of view. The ability of the human observer to track a human target unaided was determined by the observer's completion of carefully designed perception experiments. In these experiments, the observers were presented with simulated imagery from the U.S. Army Night Vision and Electronic Sensor Directorate's EOSim urban terrain simulator. The details of the simulated targets and backgrounds, and the design of the experiments as well as their associated results, are included in this treatment.