Nowadays, video has gradually become the mainstream of dissemination media for its rich information capacity and intelligibility, and texts in videos often carry significant semantic information, thus making great contribution to video content understanding and construction of content-based video retrieval system. Text-based video analyses usually consist of text detection, localization, tracking, segmentation and recognition. There has been a large amount of research done on video text detection and tracking, but most solutions focus on text content processing in static frames, few making full use of redundancy between video frames. In this paper, a unified framework for text detection, localization and tracking in video frames is proposed. We select edge and corner distribution of text blocks as text features, localizing and tracking are performed. By making good use of redundancy between frames, location relations and motion characteristics are determined, thus effectively reduce false-alarm and raise correct rate in localizing. Tracking schemes are proposed for static and rolling texts respectively. Through multi-frame integration, text quality is promoted, so is correct rate of OCR. Experiments demonstrate the reduction of false-alarm and the increase of correct rate of localization and recognition.