This paper presents an approach dedicated to accurately track one or several semantic objects in a video sequence. The accurate tracking of the partition object boundary is obtained by a label prediction. This prediction is performed thanks to motion vectors obtained with two different block-matching uses. In the predicted partition, a local segmentation is necessary only where matching failed and close to the predicted boundaries, in order to get the most accurate boundaries. This local segmentation is then followed by a classification step. During the classification a backward projection is used to assign or not a region to a given object.