The goal of the DARPA Video Verification of Identity (VIVID) program is to develop an automated video-based ground targeting system for unmanned aerial vehicles that significantly improves operator combat efficiency and effectiveness while minimizing collateral damage. One of the key components of VIVID is the Multiple Target Tracker (MTT), whose main function is to track many ground targets simultaneously by slewing the video sensor from target to target and zooming in and out as necessary. The MTT comprises three modules: (i) a video processor that performs moving object detection, feature extraction, and site modeling; (ii) a multiple hypothesis tracker that processes extracted video reports (e.g. positions, velocities, features) to generate tracks of currently and previously moving targets and confusers; and (iii) a sensor resource manager that schedules camera pan, tilt, and zoom to support kinematic tracking, multiple target track association, scene context modeling, confirmatory identification, and collateral damage avoidance. When complete, VIVID MTT will enable precision tracking of the maximum number of targets permitted by sensor capabilities and by target behavior. This paper describes many of the challenges faced by the developers of the VIVID MTT component, and the solutions that are currently being implemented.