In this paper, we concentrate on video watermarking for forensics applications and consider the temporal synchronization problem, which has been overlooked in the literature so far. As a result,
we propose a system that provides temporal synchronization in video
watermarking by using side information at the receiver. Short perceptually-robust representations (also known as robust hash values) of randomly selected frames from the watermarked video regions is derived at the encoder and transmitted to the decoder. Synchronization is then achieved by computing perceptually-representative information of all frames of the received video
at the receiver and finding the "best matching region" via solving
a combinatorial optimization problem efficiently using dynamic programming techniques. A suitably-chosen "robust image hash" function is used to derive the necessary representative information of the video frames; the resulting hash values possess properties of being short in length, computable in real time, and similar (resp. different) for perceptually similar (resp. different) video frames with high probability. We experimentally illustrate the effectiveness of our method against several attacks, which include frame-wise geometric attacks, as well as temporal de-synchronization attacks, such as random temporal interpolation, scene editing, cutting and swapping.