## 1.

## Introduction

For multicamera systems, synchronization is a must to provide accurate temporal correlation for incorporating image information from multiple viewpoints.

Synchronicity can be achieved through real-time hardware synchronization^{1} or by establishing a time relationship between sequences recorded by unsynchronized video cameras.^{2} While ensuring high precision synchronization, hardware solutions are costly and complex.

In a scenario where synchronous video sequences provided by hardware are not feasible, it is still possible to obtain synchronicity using image features.^{3, 4, 5} These feature-based methods depend on the existence of salient and robust features in the scene. The failure of such features to exist in the scene and the error of detecting, tracking, and matching them would lead to incorrect synchronization.

In this paper, we present a simple and yet effective method, termed random on-off light source (ROOLS), to recover the temporal offset at subframe accuracy. It utilizes an auxiliary light source such as an LED to provide temporal cues. Compared to special-purpose hardware approaches, our method is far less complex and is inexpensive. Compared to feature-based approaches, ROOLS is more robust since it is completely independent of scene properties.

## 2.

## Problem Statement

Without loss of generality, we consider the case of two video cameras. Let the time instances of the video frames taken by the $\alpha $ ’th camera be denoted by

## Eq. 1

$${\mathbf{T}}^{\alpha}=({t}_{1}^{\alpha},{t}_{2}^{\alpha},\dots ,{t}_{{N}_{\alpha}}^{\alpha}),\phantom{\rule{1em}{0ex}}\alpha \u220a\{1,2\},\phantom{\rule{1em}{0ex}}{N}_{\alpha}\u220a\mathbb{N},$$In a typical situation, identical video cameras of constant frame interval $\Delta T$ are used, when

Synchronizing two video sequences in such situation is equivalent to measuring the temporal offset between their initial frames## 3.

## Proposed Method

## 3.1.

### Formulation

We propose to use a single temporally coded light source such as an LED as the signal to be captured by the cameras for synchronization. The light signal is essentially a time-continuous binary-valued function denoted as

It is sampled at ${\mathbf{T}}^{\alpha}$ by the $\alpha $ ’th camera, producing a time-discrete binary-valued sequence## Eq. 5

$${\left[{f}_{\mathrm{cam}}^{\alpha}\left(n\right)\right]}_{n=1}^{{N}_{\alpha}}={\left[f\left({t}_{n}^{\alpha}\right)\right]}_{n=1}^{{N}_{\alpha}}={\left\{f[(n-1)\Delta T+{t}_{1}^{\alpha}]\right\}}_{n=1}^{{N}_{\alpha}}.$$## Eq. 8

$${\mathbf{\Phi}}^{\alpha}=[{t}_{{\mathbf{I}}^{\alpha}\left(1\right)}^{\alpha},{t}_{{\mathbf{I}}^{\alpha}\left(2\right)}^{\alpha},\dots ,{t}_{{\mathbf{I}}^{\alpha}\left(N\right)}^{\alpha}],$$## Eq. 9

$${f}_{\mathrm{cam}}^{\alpha}\left[{\mathbf{I}}^{\alpha}\left(k\right)\right]\oplus {f}_{\mathrm{cam}}^{\alpha}[{\mathbf{I}}^{\alpha}\left(k\right)-1]=1.$$## 3.2.

### Achieving Sub-Frame-Interval Precision

Given $\mathbf{\Phi}$ and ${\mathbf{\Phi}}^{\alpha}$ , consider the difference between a pair of corresponding transition events:

## Eq. 10

$${\delta}_{k}^{\alpha}={\mathbf{\Phi}}_{k}^{\alpha}-{\mathbf{\Phi}}_{k}={t}_{{\mathbf{I}}^{\alpha}\left(k\right)}^{\alpha}-{\varphi}_{k}.$$## Eq. 11

$$\overline{{\delta}_{\alpha}}=\frac{1}{N}\sum _{k=1}^{N}[{t}_{{\mathbf{I}}^{\alpha}\left(k\right)}^{\alpha}-{\varphi}_{k}]=\overline{{t}_{{\mathbf{I}}^{\alpha}}}-\overline{\varphi},$$^{6}the averaged sum of a sufficiently large number of i.i.d. random variables each with finite mean and variance approximates normal distribution. Hence, $\overline{{\delta}_{\alpha}}$ takes a normal distribution with mean of $\mu $ and variance of $\Delta {T}^{2}\u221512N$ . From Eq. 11 we obtain

## Eq. 12

$$\overline{{t}_{{\mathbf{I}}^{1}}^{1}}-\overline{{t}_{{\mathbf{I}}^{2}}^{2}}=\overline{{\delta}_{1}}-\overline{{\delta}_{2}}.$$## 3.3.

### Transition Detection Accuracy

In real-world applications, the binary sequence is obtained through quantifying the image intensity of the light source by certain threshold $\tau $ . For samples crossing transition events, the quantified binary value might flip, causing the transition event to shift one frame backward. If we take a sample right before the edge where signal rises from 0 to 1 for instance, its intensity is close to 1 and incorrectly quantified to 1. Equation 13 tells us that the shift will introduce additional error to the estimation. Suppose the light source intensity is normalized and $\tau =0.5$ is chosen as the threshold so that the probabilities for transitions to flip from 0 to 1 and 1 to 0 are identical. Let ${x}_{i}^{\alpha}$ denote a single shift event in a video sequence $\alpha $ . Its probability density function (pdf) is $p({x}_{i}^{\alpha}=-1)=p({x}_{i}^{\alpha}=0)=0.5$ , with expectation ${\mu}_{{x}_{i}^{\alpha}}=-0.5$ and variance ${\sigma}_{{x}_{i}^{\alpha}}=0.25$ . Let $\overline{{x}^{\alpha}}=(1\u2215N){\sum}_{i=1}^{N}{x}_{i}^{\alpha}$ denote averaged transition shift. Because ${x}_{i}^{\alpha}$ are i.i.d. random variables, once again by the use of central limit theorem, we have $\overline{{x}^{\alpha}}\sim \mathcal{N}(-0.5,1\u22154N)$ . According to Eq. 13, the extra error ${\delta}_{s}$ introduced by transition shift in two video sequences turns out to be $\overline{{x}^{1}}-\overline{{x}^{2}}$ . It can be proved that ${\delta}_{s}$ takes a normal distribution with mean of 0 and variance of $1\u22152N$ . Counting in ${\delta}_{s}$ , the total error of temporal offset estimation is

## 3.4.

### Random Binary Sequence Design

The proposed method requires ${\delta}_{k}^{\alpha}={t}_{{\mathbf{I}}^{\alpha}\left(k\right)}^{\alpha}-{\varphi}_{k}$ to be i.i.d. random variables uniformly distributed in $[0,\Delta T]$ . To achieve this, we set the transition time

where ${\varphi}_{k-1}$ is the time of the previous transition and $\chi $ is uniformly distributed in $[\iota \Delta T,(\iota +\kappa )\Delta T]\cdot \iota ,\kappa \u220a\mathbb{N}$ . Transition time generated in this way can be proved to ensure ${\delta}_{k}^{\alpha}$ meeting the requirement.## 3.5.

### Transition Matching

The estimation of the temporal offset requires the transition events of the two cameras be matched. We refer to this process as transition matching. Let the segment

between two consecutive transition events be denoted as ${\mathbf{D}}^{\alpha}\left(k\right)$ :

## Eq. 16

$${\mathbf{D}}^{\alpha}\left(k\right)=\{{f}_{\mathrm{cam}}^{\alpha}\left[{\mathbf{I}}^{\alpha}(k+1)\right]-{f}_{\mathrm{cam}}^{\alpha}\left[{\mathbf{I}}^{\alpha}\left(k\right)\right]\}\times [{\mathbf{I}}^{\alpha}(k+1)-{\mathbf{I}}^{\alpha}\left(k\right)].$$## Eq. 17

$$\underset{l,i,j}{\mathrm{arg}\phantom{\rule{0.2em}{0ex}}\mathrm{min}}(\frac{\sum _{\gamma =0}^{l-1}\lambda (i+\gamma ,j+\gamma )}{\mathrm{max}[\sum _{\gamma =0}^{l-1}\left|{\mathbf{D}}^{1}(i+\gamma )\right|,\sum _{\gamma =0}^{l-1}\left|{\mathbf{D}}^{2}(j+\gamma )\right|]}+\mathrm{exp}\{-\frac{\sum _{\gamma =0}^{l-1}\left|{\mathbf{D}}^{1}(i+\gamma )\right|+\sum _{\gamma =0}^{l-1}\left|{\mathbf{D}}^{2}(j+\gamma )\right|}{2\phantom{\rule{0.2em}{0ex}}\mathrm{max}[\sum _{k=1}^{{M}_{1}-1}\left|{\mathbf{D}}^{1}\left(k\right)\right|,\sum _{k=1}^{{M}_{2}-1}\left|{\mathbf{D}}^{2}\left(k\right)\right|]}\}),$$## 4.

## Experiments

## 4.1.

### Hardware and Configuration

The experiment system is made up of two Sony HVR-V1 high-definition (HD) video cameras, an LED array clock providing the ground truth, and a single temporally encoded LED light source. The video cameras operate at 200 frames per second. The values of $\iota $ and $\kappa $ in Eq. 15 are set to 2 and 6, which ensures there are at least two frames between adjacent transition events and avoids ambiguity in transition matching. We selected $\tau =0.5$ to quantize the image intensity of the LED a binary value.

## 4.2.

### Experiment Results

We conducted three groups of experiments under illumination conditions including daylight, fluorescent lighting, and darkness. The results are shown in Table 1 . In all the tests, only 200 transition events were used. The average estimation error was about 0.08 frame intervals. We observe that all estimation errors are less than 0.2 frame intervals. This would be explained later.

## Table 1

Results of 10 experiments.

Illum. | Estimated | Ground Truth | Error |
---|---|---|---|

Daylight | $-3.410$ | $-3.2653$ | 0.1447 |

Daylight | 0.6150 | 0.6000 | 0.0150 |

Daylight | 0.5600 | 0.6122 | 0.0522 |

Fluorescent | $-0.4900$ | $-0.6122$ | 0.1222 |

Fluorescent | 0.1100 | 0 | 0.1100 |

Darkness | $-2.9200$ | $-2.9605$ | 0.0405 |

Darkness | 1.1150 | 1.1475 | 0.0325 |

Darkness | $-2.8850$ | $-2.7961$ | 0.0889 |

Darkness | 1.1900 | 1.3158 | 0.1258 |

Darkness | $-2.7300$ | $-2.6273$ | 0.1027 |

## 4.3.

### Comprison with Other Methods

The proposed method was compared with the feature-based approaches^{3, 4, 5} and the results are summarized in Table 2
. The comparison indicates ROOLS achieves higher estimation accuracy than existing approaches.

## Table 2

Average temporal offset error of various approaches

Method[3] | Method[4] | Method[5] | ROOLS | |
---|---|---|---|---|

Averageerror | 0.1 | 1 | 0.2 | 0.08 |

## 4.4.

### Analysis and Discussions

The property of normal distribution states that 3 standard deviations from the mean account for about 99.7% of the distribution. When $N=200$ transitions are used, according to Eq. 14, the standard deviation is about 0.05. The estimation error is bounded in $3\times 0.05=0.15$ frame intervals. This explains why the estimation error in Table 1 are bounded in 0.2 frame intervals. The performance of the proposed method can be improved by increasing $N$ .

## 5.

## Conclusion

We presented an innovative approach toward synchronizing commercial video cameras. It achieved high-precision synchronization at the low cost of adding only a simple temporally coded light source. The proposed method requiring the video cameras to have identical frame rates is not a serious limitation since using identical video cameras for one task is convenient and typical.

## Acknowledgments

The research work presented in this paper is supported by National Natural Science Foundation of China, Grant No. 60875024.

## references

**,” 132 –137 (2005). Google Scholar**

*Temporal synchronization of video sequences in theory and in practice***,” IEEE Trans. Pattern Anal. Mach. Intell., 24 1409 –1424 (2002). https://doi.org/10.1109/TPAMI.2002.1046148 Google Scholar**

*Spatio-temporal alignment of sequences***,” 939 –945 (2003). Google Scholar**

*View-invariant alignment and matching of video sequences***,” 116 –119 (2004). Google Scholar**

*Synchronization and calibration of camera networks from silhouettes*