A video-stream associated with an Unmanned System or Full Motion Video can support the extraction of ground
coordinates of a target of interest. The sensor metadata associated with the video-stream includes a time series of
estimates of sensor position and attitude, required for down-stream single frame or multi-frame ground point extraction,
such as stereo extraction using two frames in the video-stream that are separated in both time and imaging geometry.
The sensor metadata may also include a corresponding time history of sensor position and attitude estimate accuracy
(error covariance). This is required for optimal down-stream target extraction as well as corresponding reliable
predictions of extraction accuracy. However, for multi-frame extraction, this is only a necessary condition. The
temporal correlation of estimate errors (error cross-covariance) between an arbitrary pair of video frames is also
required. When the estimates of sensor position and attitude are from a Kalman filter, as typically the case, the
corresponding error covariances are automatically computed and available. However, the cross-covariances are not.
This paper presents an efficient method for their exact representation in the metadata using additional, easily computed,
data from the Kalman filter. The paper also presents an optimal weighted least squares extraction algorithm that
correctly accounts for the temporal correlation, given the additional metadata. Simulation-based examples are presented
that show the importance of correctly accounting for temporal correlation in multi-frame extraction algorithms.