The proposed system STABIL combines four levels of abstraction. In the first level the foreground is extracted by using a Kalman filter technique. The second level uses the foreground regions in order to seek subsequently for parts of the skin of a human. The three channel color signal is transformed into a 2D color space best representing the color of the skin. The Kalman-filtering speeds up the classification in the case of a stationary camera; in the case of a moving camera the classification is directly applied on the sequence. The regions representing the skin serve as input for the third level estimating the position of the person in the 3D space relative to the camera. The fourth level handles a model of a human using statistical data of their size. The model is adapted to a person in an iterative process taking into account the limits of the movements of the person, the restrictions on the model in order to match the skin regions in the image to the correct person. The result of processing the n-the image of an image sequence is a scaled model projected and superimposed on the n-1-th image showing the correct estimation of the position of the person.