In this paper we propose a simply yet effective and efficient method for long-term object tracking. Different from traditional visual tracking method which mainly depends on frame-to-frame correspondence, we combine high-level semantic information with low-level correspondences. Our framework is formulated in a confidence selection framework, which allows our system to recover from drift and partly deal with occlusion problem. To summarize, our algorithm can be roughly decomposed in a initialization stage and a tracking stage. In the initialization stage, an offline classifier is trained to get the object appearance information in category level. When the video stream is coming, the pre-trained offline classifier is used for detecting the potential target and initializing the tracking stage. In the tracking stage, it consists of three parts which are online tracking part, offline tracking part and confidence judgment part. Online tracking part captures the specific target appearance information while detection part localizes the object based on the pre-trained offline classifier. Since there is no data dependence between online tracking and offline detection, these two parts are running in parallel to significantly improve the processing speed. A confidence selection mechanism is proposed to optimize the object location. Besides, we also propose a simple mechanism to judge the absence of the object. If the target is lost, the pre-trained offline classifier is utilized to re-initialize the whole algorithm as long as the target is re-located. During experiment, we evaluate our method on several challenging video sequences and demonstrate competitive results.