## 1.

## Introduction

Temporal alignment of video sequences is important in applications such as superresolution imaging,^{1} robust multiview surveillance,^{2} and mosaicking. In some applications, it is required to align video sequences from two similar scenes, where analogous motions have different trajectories through the video sequence. Figure 1 illustrates two similar motions occurring in related 3-D planar scenes with respect to time. Camera 1 views 3-D scene [TeX:] $X(X_1 ,Y_1 ,Z_1 ,t_1 )$$X({X}_{1},{Y}_{1},{Z}_{1},{t}_{1})$ in view 1(ν_{1}) and acquires video [TeX:] $I_1 (x_1 ,y_1 ,t_1 )$${I}_{1}({x}_{1},{y}_{1},{t}_{1})$. Camera 2 views another 3-D scene[TeX:] $X(X_2 ,Y_2 ,Z_{2,} \,t_1 )$$X({X}_{2},{Y}_{2},{Z}_{2,}\phantom{\rule{0.16em}{0ex}}{t}_{1})$in view 2 (ν_{2}) and acquires video [TeX:] $I_2 (x_2 ,y_2 ,t_2 )$${I}_{2}({x}_{2},{y}_{2},{t}_{2})$. Note that the motions in these two scenes are similar but have dynamic time shift. The homography matrix **H** is typically used to represent the spatial relationship between these two views.

A typical schematic for temporal alignment is shown in Fig. 2. Note that for the sake of correlating two videos and representing the motions, features are extracted and tracked separately in each video. Robust view-invariance tracker methods are used to generate feature trajectories [TeX:] ${\cal F}_1 (x_1 ,y_1 ,t_1 )$${\mathcal{F}}_{1}({x}_{1},{y}_{1},{t}_{1})$ and [TeX:] ${\cal F}_2 (x_2 ,y_2 ,t_2 )$${\mathcal{F}}_{2}({x}_{2},{y}_{2},{t}_{2})$ from video [TeX:] $I_1$${I}_{1}$ and [TeX:] $I_2$${I}_{2}$, respectively.

Existing techniques vary on how to compute the temporal alignments. Giese and Poggio^{3} computed the temporal alignment of activities of different people using dynamic time warping (DTW) between the feature trajectories, but limited their technique to a fixed viewpoint. Rao ^{4} used a rank-constraint-based technique (RCB) in DTW to calculate the synchronization. Such techniques only consider unidirectional alignment,^{3, 4} i.e., they project the trajectory from one scene to the other, which designates one view as the reference for computing the temporal alignment. Such techniques introduce the bias toward the reference trajectory, i.e., due to the noise and imperfection of the obtained reference trajectory, such a technique will produce erroneous alignment. Therefore, for the sake of minimizing the bias, one should consider computing the alignment in a symmetric way. Singh ^{5} formulated a symmetric transfer error (STE) as a functional of regularized temporal warp. The technique determines the time warp that has the smallest STE. It then chooses one of the symmetric warps as the final temporal alignment. The STE technique provides better results than unidirectional alignment schemes. The accuracy of the temporal alignment can be improved further, since the STE technique does not really eliminate the reference-view bias between two sequences.

In this work, we propose an unbiased bidirectional dynamic time warping (UBDTW) technique that can remove biasing and provide more accurate results.

## 2.

## Proposed Technique

The schematic of the proposed temporal alignment technique is shown in Fig. 3. The technique consists of three steps which are explained in the following sections.

## 2.1.

### Bidirectional Projections

Since feature trajectories represent the activities in the video sequences, we compute the projections of the feature trajectories [TeX:] ${\cal F}_1$${\mathcal{F}}_{1}$ from scene 1 to 2 and [TeX:] ${\cal F}_2$${\mathcal{F}}_{2}$ from scene 2 to 1 using Eq. 1 as follows:

## Eq. 1

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\cal F}^p _1 (x^\prime _2 ,y'_2 ,t_2 ) = H_{2 \to 1} \,\cdot\,{\cal F}_2 (x_2 ,y_2 ,t_2 ), \end{equation}\end{document}$${\mathcal{F}}_{1}^{p}({x}_{2}^{\prime},{y}_{2}^{\prime},{t}_{2})={H}_{2\to 1}\phantom{\rule{0.16em}{0ex}}\xb7\phantom{\rule{0.16em}{0ex}}{\mathcal{F}}_{2}({x}_{2},{y}_{2},{t}_{2}),$$^{6}

## 2.2.

### Computation of Symmetric Warps

Once we obtain two pairs of feature trajectories, [TeX:] $({\cal F}_1 ,{\cal F}_2 ^p )$$({\mathcal{F}}_{1},{\mathcal{F}}_{2}^{p})$ and [TeX:] $({\cal F}_1 ^p ,{\cal F}_2 )$$({\mathcal{F}}_{1}^{p},{\mathcal{F}}_{2})$, we compute the symmetric warps [TeX:] ${\cal W}_{1,2p}$${\mathcal{W}}_{1,2p}$ and [TeX:] ${\cal W}_{1p,2}$${\mathcal{W}}_{1p,2}$ using regularized DTW. We construct the warp [TeX:] ${\cal W}$$\mathcal{W}$ as follows:

## Eq. 2

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\cal W} = w_1 ,w_2 ,\ldots,w_{L\,} \,\,\max ({\cal L}_1 ,{\cal L}_2 ) \le L < {\cal L}_1 + {\cal L}_2 , \end{equation}\end{document}$$\mathcal{W}={w}_{1},{w}_{2},...,{w}_{L\phantom{\rule{0.16em}{0ex}}}\phantom{\rule{0.16em}{0ex}}\phantom{\rule{0.16em}{0ex}}\mathrm{max}({\mathcal{L}}_{1},{\mathcal{L}}_{2})\le L<{\mathcal{L}}_{1}+{\mathcal{L}}_{2},$$*L*'th element of the warp [TeX:] ${\cal W}$$\mathcal{W}$ is [TeX:] $w_L = (i,j)$${w}_{L}=(i,j)$, where

*i*and

*j*are the time indices of [TeX:] ${\cal F}_1$${\mathcal{F}}_{1}$ and [TeX:] ${\cal F}_2$${\mathcal{F}}_{2}$, respectively. The optimal warp is the minimum distance warp, where the distance of a warp is defined as follows:

## Eq. 3

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\rm dist}({\cal W}) = \sum\limits_{k = 1}^L {{\rm dist}[{\cal F}(i_k ),{\cal F}^p (j_k )],} \end{equation}\end{document}$$\mathrm{dist}\left(\mathcal{W}\right)=\sum _{k=1}^{L}\mathrm{dist}[\mathcal{F}\left({i}_{k}\right),{\mathcal{F}}^{p}\left({j}_{k}\right)],$$*i*,

*j*) in the

*k*'th element of the warp. We propose a regularized distance metric function as follows:

## Eq. 4

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\rm dist}[{\cal F}(i),{\cal F}^p (j)] = ||{\cal F}(i) - {\cal F}^p (j)||^2 + w\,\,{\rm reg}, \end{equation}\end{document}$$\mathrm{dist}[\mathcal{F}\left(i\right),{\mathcal{F}}^{p}\left(j\right)]=\left|\right|\mathcal{F}\left(i\right)-{\mathcal{F}}^{p}\left(j\right){\left|\right|}^{2}+w\phantom{\rule{0.16em}{0ex}}\phantom{\rule{0.16em}{0ex}}\mathrm{reg},$$## Eq. 5

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\rm reg} = ||\partial {\cal F}(i) - \partial {\cal F}^p (j)||^2 + ||\partial ^2 {\cal F}(i) - \partial ^2 {\cal F}^p (j)||^2 , \end{equation}\end{document}$$\mathrm{reg}=\left|\right|\partial \mathcal{F}\left(i\right)-\partial {\mathcal{F}}^{p}{\left(j\right)\left|\right|}^{2}+\left|\right|{\partial}^{2}\mathcal{F}\left(i\right)-{\partial}^{2}{\mathcal{F}}^{p}\left(j\right){\left|\right|}^{2},$$*w*is the weight (normally,

*w*= 25).

To find the optimal warp, an accumulated distance matrix is created. The value of the element in the accumulated distance matrix is:

## Eq. 6

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\bm {\cal D}}(i,j) = {\rm dist}[{\cal F}(i),{\cal F}^p (j)] + \min (\phi ), \end{equation}\end{document}$$\mathcal{D}(i,j)=\mathrm{dist}[\mathcal{F}\left(i\right),{\mathcal{F}}^{p}\left(j\right)]+\mathrm{min}\left(\phi \right),$$## Eq. 7

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \phi = [{\bm {\cal D}}(i - 1,j),{\bm {\cal D}}(i - 1, \,j - 1),{\bm {\cal D}}(i,j - 1)]. \end{equation}\end{document}$$\phi =\left[\mathcal{D}\right(i-1,j),\mathcal{D}(i-1,\phantom{\rule{0.16em}{0ex}}j-1),\mathcal{D}(i,j-1\left)\right].$$## 2.3.

### Optimal Warp Calculation

Note that we calculated symmetric warps [TeX:] ${\cal W}_{1,2p}$${\mathcal{W}}_{1,2p}$ and [TeX:] ${\cal W}_{1p,2}$${\mathcal{W}}_{1p,2}$, and corresponding distance matrixes [TeX:] ${\bm {\cal D}}_{1,2p}$${\mathcal{D}}_{1,2p}$ and [TeX:] ${\bm {\cal D}}_{1p,2}$${\mathcal{D}}_{1p,2}$ in the last step. However, the warps still have bias ([TeX:] ${\cal W}_{1,2p}$${\mathcal{W}}_{1,2p}$ biased toward [TeX:] ${\cal F}_1$${\mathcal{F}}_{1}$ while [TeX:] ${\cal W}_{1p,2}$${\mathcal{W}}_{1p,2}$ is biased toward [TeX:] ${\cal F}_2$${\mathcal{F}}_{2}$). To minimize the effect of biasing on alignment, we first combine [TeX:] ${\bm {\cal D}}_{1,2p}$${\mathcal{D}}_{1,2p}$ and [TeX:] ${\bm {\cal D}}_{1p,2}$${\mathcal{D}}_{1p,2}$ to make a new distance matrix [TeX:] ${\bm {\cal D}}_c$${\mathcal{D}}_{c}$ as follows:

## Eq. 8

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\bm {\cal D}}_c = {\bm {\cal D}}_{1,2p} + {\bm {\cal D}}_{1p,2} . \end{equation}\end{document}$${\mathcal{D}}_{c}={\mathcal{D}}_{1,2p}+{\mathcal{D}}_{1p,2}.$$## Eq. 9

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \min ({\cal W}_{1,2} ,{\cal W}_{2,1} ) \le {\cal W}_c \le \max ({\cal W}_{1,2} ,{\cal W}_{2,1} ). \end{equation}\end{document}$$\mathrm{min}({\mathcal{W}}_{1,2},{\mathcal{W}}_{2,1})\le {\mathcal{W}}_{c}\le \mathrm{max}({\mathcal{W}}_{1,2},{\mathcal{W}}_{2,1}).$$## Eq. 10

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} {\cal W}_{\rm copt} = \arg _{{\cal W}_c } \min [{\rm dist}({\cal W}_c )]. \end{equation}\end{document}$${\mathcal{W}}_{\mathrm{copt}}={\mathrm{arg}}_{{\mathcal{W}}_{c}}\mathrm{min}\left[\mathrm{dist}\left({\mathcal{W}}_{c}\right)\right].$$## 3.

## Experiments and Comparative Analysis

We evaluated our technique using both synthetic and real videos and compared it with RCB^{4} and STE techniques.^{5}

## 3.1.

### Synthetic Data Evaluation

In the synthetic data evaluation, we generate planar trajectories 100 frames long using a pseudorandom number gen- erator. These trajectories are then projected onto two image planes using user-defined camera projection matrices. A 60-frames-long time warp is then applied to a section of one of the trajectory projections. The temporal alignment techniques are then applied to the synthetic trajectories. The test was repeated on 100 different synthetic trajectories and 100 similar trajectories with noise added. The added noise was a normally distributed random variate, with zero mean and variance [TeX:] $\sigma ^2 = 0.1$${\sigma}^{2}=0.1$. The mean absolute error between the warp obtained by different techniques and the ground truth is computed as the evaluation metric. The results are shown in Table 1. The percentage in the parentheses represents theimprovement obtained by an alignment technique with respect to the original error. Figure 5 shows a synchronization result with a synthetic trajectory. The performance of the RCB, STE and the proposed techniques are compared. It is clear that the proposed technique outperformed the other techniques.

Performance improvement of the proposed technique over existing techniques

Origin | RCB | STE | Proposed | |
---|---|---|---|---|

DynDate | 25.50 | 7.33(71.25%) | 3.11(87.80%) | 1.98(92.24%) |

SynData with noise | 25.50 | 17.18(32.63%) | 3.45(86.47%) | 2.97(88.35%) |

## 3.2.

### Real Data Evaluation

For the real video test, we use two videos (54 frames and 81 frames long, respectively) capturing the activity of lifting a coffee cup by different people. We tracked the coffee cup that can represent the activity in a video to generate feature trajectories. Since ground-truth information is not available, we used visual judgement to assess whether the alignment was correct or not.

Figure 6 shows some representative aligned frames in the 4th, 8th, and 12th elements of the alignment warp computed using the STE and the proposed technique. Note that if the coffee cup is at the same position in two frames, we marked it as “matched,” otherwise, “mismatched.” In the results obtained using the STE technique, only one pair of frames is matched, indicating that such technique can often result in erroneous alignments. The performance of the proposed technique is shown in the last two rows. It is observed that all the alignments are correct.

## 4.

## Conclusions

An efficient technique is proposed to synchronize video sequences captured from planar scenes and related by varying temporal offsets. The proposed UBDTW technique is able to remove the biasing and lead to accurate temporal alignment. In the future, we would like to extend this work to more general scenes.

## References

**,” IEEE Trans. Pattern Anal. Mach. Intell, 24 (11), 1409 –1424 (2002). https://doi.org/10.1109/TPAMI.2002.1046148 Google Scholar**

*Spatio-temporal alignment of sequences***,” IEEE Trans. Pattern Anal. Mach. Intell, 22 (8), 758 –767 (2000). https://doi.org/10.1109/34.868678 Google Scholar**

*Monitoring activities from multiple video streams: establishing a common coordinate frame***,” Int. J. Comput. Vis., 38 (1), 59 –73 (2000). https://doi.org/10.1023/A:1008118801668 Google Scholar**

*Morphable models for the analysis and synthesis of complex motion patterns***,” Proc. ICCV03, 939 –945 2003). Google Scholar**

*View-invariant alignment and matching of video sequences***,” Proc. ECCV03, 554 –567 2008). Google Scholar**

*Optimization of symmetric transfer error for sub-frame video synchronization*