In this paper, we propose a subjective evaluation criterion which is a guide for selecting affective features and modeling highlights. Firstly, the database of highlights ground truth is established, and both the randomness of the data set and the preparation of the subjects are considered. Secondly, commonly used affective features including visual, audio and editing features are extracted to express the highlights. Thirdly, subjective evaluation criterion is proposed based on the analysis of the average error method and pairwise comparisons method, especially the rationality of this criterion in our specific application is explained clearly according to the three detailed issues. Finally, evaluation experiments are designed on tennis and table tennis as examples. Based on the experiments, we prove that previous works on affective features and linear model highlights are effective. Furthermore, 82.0% (79.3%) affective accuracy is obtained fully automatically by computer which is a marvelous highlights ranking result. This result shows the subjective evaluation criterion is well designed for selecting affective features and modeling highlights.
In this paper, we present a new and efficient clustering approach for scene analysis in sports video. This method is generic and does not require any prior domain knowledge. It performs in an unsupervised manner and relies on the scene likeness analysis of the shots in the video. The two most similar shots are merged into the same scene in each iteration. And this procedure is repeated until the merging stop criterion is satisfied. The stop criterion is defined based on a J value which is defined according to the Fisher Discriminant Function. We call this method J-based Scene Clustering. By using this method, the low-level video content representation-shots could be clustered into the midlevel video content representation-scenes, which are useful for high-level sports video content analysis such as playbreak parsing, story units detection, highlights extraction and summarization, etc. Experimental results obtained from various types of broadcast sports videos demonstrate the efficacy of the proposed approach. Moreover, in this paper, we also present a simple application of our scene clustering method to story units detection in periodic sports videos like archery video, diving video and so on. The experimental results are encouraging.
As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.