Many applications would benefit if media objects such as images could be selected and classified (or clustered) such that 'conceptually similar' images are grouped together by content. This requires that image content be described by some coherent semantic domain model rather than relying on the use of keywords as in most commercial image database systems. However, a description of image contents cannot be predefined by prescribing what should be in the images but must incrementally evolve to link image instances with descriptions of what is actually there. Flexibility is required as the same image may be reused from many different application perspectives, and classified and reclassified by many different, unpredictable, and possibly contradictory interpretations of the same contents. We present preliminary work on the incremental and flexible description of image and video semantic content by the use of a description logic (DL), GRAIL, developed at the University of Manchester. GRAIL progressively bridges the gap between the uninterpreted raw image and the application's semantic domain of 'world' objects by supporting the incremental specification of a schema, the automatic classification of descriptions (and hence images), the notion of 'conceptual similarity' for imprecise queries, multiple granularity of views and reuse. We then present a model for a video database system based on this approach. A primary aim is to determine if GRAIL in particular, and DLs in general, are suitable for such an application.