Many current video analysis systems fail to fully acknowledge the
process that resulted in the acquisition of the video data, i.e. they don't view the complete multimedia system that encompasses the several physical processes that lead to the captured video data. This multimedia system includes the physical process that created the appearance of the captured objects, the capturing of the data by the sensor (camera), and a model of the domain the video data belongs to. By modelling this complete multimedia system, a much more robust and theoretically sound approach to video analysis can be taken. In this paper we will describe such a system for the detection, recognition and tracking of objects in video's. We will introduce an extension of the mean shift tracking process, based on a detailed model of the video capturing process. This system is used for two applications in the soccer video domain: Billboard recognition and tracking and player tracking.