Scene understanding needs not only detecting objects in the scene, but also obtaining the relationship between the objects
and the scene, for example the reasonable size and occurrence possibility of objects at one position in the scene. With
this relationship, the traditional object detection approach, which may misclassify objects with wrong sizes or position of
the scene, can be greatly improved. In this paper, a novel scale model is proposed to describe the understanding of the
scene. The scale model consists of the occurrence possibility and the reasonable size of pedestrian in each position of the
scene. The scale model is learned by counting the pedestrian examples with different sizes in different positions of the
scene for a period of time, instead of computing the geometry and viewpoint information in a single image. The
examples are detected automatically by a detector which is trained with tri-training based semi-supervised approach.
Experimental results indicate that the scale model of the scene can be learned with semi-supervised detection without the
information of the 3D geometry and the assumption of plain ground.