In this work, we present a system for the automatic segmentation, indexing and retrieval of audiovisual data based on the combination of audio, visual and textural content analysis. The video stream is demultiplexed into audio, image and caption components. Then, a semantic segmentation of the audio signal based on audio content analysis is conducted, and each segment is indexed as one of the basic audio types. The image sequence is segmented into shots based on visual information analysis, and keyframes are extracted from each shot. Meanwhile, keywords are detected from the closed caption. Index tables are designed for both linear and non-linear access to the video. It is shown by experiments that the proposed methods for multimodal media content analysis are effective. And that the integrated framework achieves satisfactory results for video information filtering and retrieval.