The proliferation of captured personal and broadcast content in personal consumer archives necessitates comfortable
access to stored audiovisual content. Intuitive retrieval and navigation solutions require however a semantic level that
cannot be reached by generic multimedia content analysis alone. A fusion with film grammar rules can help to boost the
reliability significantly. The current paper describes the fusion of low-level content analysis cues including face
parameters and inter-shot similarities to segment commercial content into film grammar rule-based entities and
subsequently classify those sequences into so-called shot reverse shots, i.e. dialog sequences. Moreover shot reverse shot
specific mid-level cues are analyzed augmenting the shot reverse shot information with dialog specific descriptions.