As an attempt to achieve realistic image representation, an efficient segmentation algorithm is proposed. The proposed method aims to represent homogeneous visual objects with few regions while preserving the semantic contents of an image as much as possible. This strategy is quite useful for a typical ''head and shoulder" video sequence, since visually homogeneous objects occupy large portions of the image. For this objective, we adopt a bottom-up approach using spatial domain information only. Initially, precise initial image segmentation is performed using an efficient marker extraction algorithm. Then, we classify the initially segmented regions by gradients, and apply an ordered region-merging algorithm to each class to reduce the number of regions. Finally, we eliminate redundant small regions by considering their neighborhoods. Experimental results show that the segmentation result can be a well self-contained representation of an image since it preserves most of the perceptually important image components.