Logical units are semantic video segments above the shot level. Depending on the common semantics within the unit
and data domain, different types of logical unit extraction algorithms have been presented in literature. Topic units are
typically extracted for documentaries or news broadcasts while scenes are extracted for narrative-driven video such as
feature films, sitcoms, or cartoons. Other types of logical units are extracted from home video and sports. Different
algorithms in literature used for the extraction of logical units are reviewed in this paper based on the categories unit
type, data domain, features used, segmentation method, and thresholds applied. A detailed comparative study is
presented for the case of extracting scenes from narrative-driven video. While earlier comparative studies focused on
scene segmentation methods only or on complete news-story segmentation algorithms, in this paper various visual
features and segmentation methods with their thresholding mechanisms and their combination into complete scene
detection algorithms are investigated. The performance of the resulting large set of algorithms is then evaluated on a set
of video files including feature films, sitcoms, children's shows, a detective story, and cartoons.
The types of shot transitions used by film editors in video are not randomly chosen. Cuts, dissolves, fades, and wipes are
devices in film grammar used to structure video. In this work knowledge of film grammar is used to improve scene
detection algorithms. Three improvements to known scene detection algorithms are proposed: (1) The selection of key-frames
for shot similarity measurement should take the position of gradual shot transitions into account. (2) Gradual
shot transitions have a separating effect. It is shown how this local cue can be used to improve the global structuring into
logical units. (3) Gradual shot transitions also have a merging effect upon shots in their temporal proximity. It is shown
how coherence values and shot similarity values used during scene detection have to be modified to exploit this fact.
The proposed improvements can be used together with a variety of scene detection approaches. Experimental results
with time adaptive grouping indicate that considerable improvements in terms of precision and recall are achieved.
Huge amounts of video data are produced around the world each day. Management is increasingly difficult. Tasks like archiving, browsing, analysis, and search and retrieval are aided by prior automatic temporal video segmentation into shots which are basic units of video. Among the different shot transition types: cut, dissolve, fade and wipe are wipes regarded as difficult to detect because of their variety. This paper presents a new efficient and fast algorithm for wipe detection and position determination. It can be used on the luminance DC coefficients extracted from MPEG sequences in compressed domain or on spatially sub-sampled sequences. With the newly proposed evenness factor the observation that during a wipe spatial zones of change move thru the image can be exploited very well. Wipe candidates are checked with frame differences of frame pairs with certain temporal distances. For the remaining candidates usage of a new approach for the detection of uniform movement of linear zones of change using double Hough transform is proposed. Motion compensation is used to handle local and global motion. The algorithm has a low computational complexity due to its small input data rate and step-wise reduction of wipe candidates.