In traditional surveillance systems tracking of objects is achieved by means of image and video processing. The
disadvantages of such surveillance systems is that if an object needs to be tracked - it has to be observed by
a video camera. However, geometries of indoor spaces typically require a large number of video cameras to
provide the coverage necessary for robust operation of video-based tracking algorithms. Increased number of
video streams increases the computational burden on the surveillance system in order to obtain robust tracking
results. In this paper we present an approach to tracking in mixed modality systems, with a variety of sensors.
The system described here includes over 200 motion sensors as well as 6 moving cameras. We track individuals
in the entire space and across cameras using contextual information available from the motion sensors. Motion
sensors allow us to almost instantaneously find plausible tracks in a very large volume of data, ranging in months,
which for traditional video search approaches could be virtually impossible. We describe a method that allows us
to evaluate when the tracking system is unreliable and present the data to a human operator for disambiguation.