To the end-user of a video database, content consists of objects and events occurring in the video. A video database system must be designed to extract, represent and organize this information in a fashion that supports querying, manipulation and data visualization by a user. As a data modeling exercise, objects and events are defined in terms of semantic attributes such that an end-user's queries are expressible through the modeling language. On the other hand, as a feature extraction exercise, objects are defined as solutions to equations, often in terms of low-level visual primitives like voxels or contours. These two formalisms constitute entirely different languages. However, integration of these two approaches can provide a powerful mechanism for description and manipulation of complex visual data. This paper explores issues involved with this integration. We introduce the notion of a visual data modeling language (VDML), which supports data definition and data manipulation operations over complex visual data characteristic of video database systems. We discuss this data- modeling effort in the context of our multiple perspective interactive video system which generates three-dimensional data sets using input from multiple video cameras.