A vision-based tele-operating system had been designed and simulated in our lab with model-based supervisory control and model-based, top-down image processing, IP, for vehicle pose recovery. Top-down IP at near real-time feedback was achieved by a pose extraction algorithm based on the scanpath theory of human eye movement. In this paper, we extended our system into a multiple cameras system to improve the accuracy and robustness of the visual system. Another possible approach was to use mobile camera platforms. We contrasted the difference between the two approaches in the context of our system requirement and limitation. OpenGL based simulation was used to illustrate advantages of using more cameras quantitatively to increase system robustness against plant and image noise.