In this paper, we present a unified framework for semantic shot classification in sports videos. Unlike previous approaches, which focus on clustering by aggregating shots with similar low-level features, the proposed scheme makes use of domain knowledge of a specific sport to perform a top-down video shot classification, including identification of video shot classes for each sport, and supervised learning and classification of the given sports video with low-level and middle-level features extracted from the sports video. It is observed that for each sport we can predefine a small number of semantic shot classes, about 5~10, which covers 90~95% of sports broadcasting video. With the supervised learning method, we can map the low-level features to middle-level semantic video shot attributes such as dominant object motion (a player), camera motion patterns, and court shape, etc. On the basis of the appropriate fusion of those middle-level shot classes, we classify video shots into the predefined video shot classes, each of which has a clear semantic meaning. The proposed method has been tested over 4 types of sports videos: tennis, basketball, volleyball and soccer. Good classification accuracy of 85~95% has been achieved. With correctly classified sports video shots, further structural and temporal analysis, such as event detection, video skimming, table of content, etc, will be greatly facilitated.