A vision-based tele-operating system has been designed and simulated in our lab with model-based supervisory control and model-based, top-down image processing, IP, for robot pose recovery. A secondary global positioning system, GPS, is used for backup for situations when IP is not expected to work. These modules have been integrated to achieve robust performance of the system with reduced human attendance. Robust top-down IP at near-real time feedback is achieved by a pose extraction algorithm that is based on the Scanpath Theory of human eye movement. Extensive model-directed pre- filtering, low-level image processing and post-filtering of visual images, as well as model-directed data fusion are used to ensure consistency between internal model and external environment. Simulation of the system under a wide range of image and plant noise was performed to verify the stability of the system, as well as to reflect the influence and mode of failure of the system with such noise injection.