The amount of digitized video in archives is becoming so huge, that easier access and content browsing tools are desperately needed. Also, video is no longer one big piece of data, but a collection of useful smaller building blocks, which can be accessed and used independently from the original context of presentation. In this paper, we demonstrate a content model for audio video sequences, with the purpose of enabling the automatic generation of video summaries. The model is based on descriptors, which indicate various properties and relations of audio and video segments. In practice, these descriptors could either be generated automatically by methods of analysis, or produced manually (or computer-assisted) by the content provider. We analyze the requirements and characteristics of the different data segments, with respect to the problem of summarization, and we define our model as a set of constraints, which allow to produce good quality summaries.