In this paper, we propose a novel statistical method for video analysis and object-oriented structured video representation. Although the existing shot-based approaches facilitate our access to video than the raw video data do, they are still not effective for the semantic level video browsing and retrieval. To represent video structure on semantic levels, the proposed method employs an extended hidden Markov model trained by EM algorithm in order to explore the meaningful objects and the important video structures. According to the principles of the proposed method, important objects and video structure with the consideration of short term statistic as well as long term recurrence can be captured and several interesting video analysis tasks can also be performed. Finally, experiments based on several different videos validate the effectiveness of the proposed approach.