This article presents a Description Scheme (DS) to describe the audio-visual documents from the video editing work point of view. This DS is based on edition techniques used in the video edition domain. The main objective of this DS is to provide a complete, modular and extensible description of the structure of the video documents based on editing process. This VideoEditing DS is generic in the sense that it may be used in a large number of applications such as video document indexing and analysis, description of Edit Decision List and elaboration of editing patterns. It is based on accurate and complete definitions of shots and transition effects required for video document analysis applications. The VideoEditing DS allows three levels of description : analytic, synthetic and semantic. In the DS, the higher (resp. the lower) is the element of description, the more analytic (resp. synthetic) is the information. %Phil This DS allows describing the editing work made by editing boards, using more detailed descriptors of Shots and Transition DSs. These elements are provided to define editing patterns that allow several possible reconstructions of movies depending on, for example, the target audience. A part of the video description made with this DS may be automatically produced by the video to shots segmentation algorithms (analytic DSs ) or by editing software, at the same time the edition work is made.
This DS gives an answer to the needs related to the exchange of editing work descriptions between editing softwares. At the same time, the same DS provide an analytic description of editing work which is complementary to existing standards for Edit Decision Lists like SMPTE or AAF.
This article presents the results of a study on spatio-temporal images to evaluate their performances for video-to-shots segmentation purposes. Some shots segmentation methods involve spatio-temporal images that are computed by a projection of successive video frames over the X or Y-axis. On these projections, transition effects and motion are supposed to have different characteristics. Whereas cuts can be easily recognized, the main problem remains in determining a measure that discriminates motions from gradual transition effects. In this article, the quality of transition detections based on line similarity of spatio-temporal images is studied. The probability functions of several measures are estimated to determine which one produce the lowest detection error rate. These distributions are computed on four classes of events: intra shot sequences without motion, sequences with cuts, sequences with fades and sequences with motion. A line matching is performed, based on correlation estimations between projection lines. To separate these classes, we estimate first the density probability functions of the correlation between consecutive lines for each class. For different line segment sizes, the experimental results prove that the class separation can not be clearly produced. To take into account the evolution of the correlation and because we try to detect some particular types of boundaries, we then consider ratios between statistic moments. There are computed over a subset of correlation values. The results show that used measures, based on the matching of projection lines, can not discriminate between motion and fade. Only a subset of motions will be differentiated from gradual transitions. Therefore previous measures should be combined with methods that produce complementary results. Such a method could be a similar measure based on correlation between spatial-shifted segments.