Intelligent multi-camera systems that integrate computer vision algorithms are not error free, and thus both false positive and negative detections need to be revised by a specialized human operator. Traditional multi-camera systems usually include a control center with a wall of monitors displaying videos from each camera of the network. Nevertheless, as the number of cameras increases, switching from a camera to another becomes hard for a human operator.
In this work we propose a new method that dynamically selects and displays the content of a video camera from all the available contents in the multi-camera system. The proposed method is based on a computational model of human visual attention that integrates top-down and bottom-up cues. We believe that this is the first work that tries to use a model of human visual attention for the dynamic selection of the camera view of a multi-camera system.
The proposed method has been experimented in a given scenario and has demonstrated its effectiveness with respect to the other methods and manually generated ground-truth. The effectiveness has been evaluated in terms of number of correct best-views generated by the method with respect to the camera views manually generated by a human operator.