An image can be considered as a collection of small regions. Most researches of image understanding extract features of
these regions, and investigate relationships between these regions and keywords of images that are annotated manually.
There are also some researches that explore the ontology of words. However, little attention has been paid to the
relationships among regions in an image. In this paper, we make a close study of this type of relationships without the
assumption that they are independent for visual content understanding. We first analyze the co-occurrence of regions
using a statistical relevance probability model (SRP). Since human attention in the perception process of an image first
focuses in one region and then moves on to other relevant regions, we propose a novel model called region sequence
prediction model (RSP) to describe it. In RSP, annotation keywords for region sequences of the image and their
probabilities are generated by a hidden Markov model. Experimental results of both image annotation and retrieval on
the Corel dataset (an open image dataset) show that mining the relationships of image regions will achieve comparative
or better performance in visual content understanding.