Recently, there has been much research on video manipulation in the area of video-on-demand and video databases. Most of the recent work has so far focused on video classification, feature extraction, spatial reasoning and image retrieval; little work has been done on supporting adaptive video editing and production activities, nor has there been much work done on providing facilities for building a versatile video manipulation server. In this paper, we describe the development of an experimental video manipulation server called 'VIMS', which has been implemented at the Hong Kong University of Science and Technology. VIMS consists of two fundamental components: i) a video classification components (VCC) for the generation of effective indices necessary for structuring the video data, and ii) a conceptual clustering mechanism (CCM) having advanced object-oriented features and techniques. The former supports video structuring through camera break detection, shot classification using domain knowledge, as well as content-based retrieval through interactive learning, whereas the latter enables users to form, among other things, video programs form existing objects based on semantic features/index terms dynamically and adaptively. By tightly coupling CCM techniques together with VCC;s, the VIMS further allows the user to perform annotation-based and content-based retrieval in a well integrated and interleaved manner, which we regard as essential for a versatile video manipulation server. A prototype of VIMS embodying VCC and CCM has recently been constructed, running on the PC Pentium platform.