We propose a simple yet effective method for long-term object tracking. Different from the traditional visual tracking method, which mainly depends on frame-to-frame correspondence, we combine high-level semantic information with low-level correspondences. Our framework is formulated in a confidence selection framework, which allows our system to recover from drift and partly deal with occlusion. To summarize, our algorithm can be roughly decomposed into an initialization stage and a tracking stage. In the initialization stage, an offline detector is trained to get the object appearance information at the category level, which is used for detecting the potential target and initializing the tracking stage. The tracking stage consists of three modules: the online tracking module, detection module, and decision module. A pretrained detector is used for maintaining drift of the online tracker, while the online tracker is used for filtering out false positive detections. A confidence selection mechanism is proposed to optimize the object location based on the online tracker and detection. If the target is lost, the pretrained detector is utilized to reinitialize the whole algorithm when the target is relocated. During experiments, we evaluate our method on several challenging video sequences, and it demonstrates huge improvement compared with detection and online tracking only.