Image description and annotation is an active research topic in content-based image retrieval. How to utilize human visual perception is a key approach to intelligent image feature extraction and representation. This paper has proposed an image feature descriptor called the local structure co-occurrence pattern (LSCP). LSCP extracts the whole visual perception for an image by building a local binary structure, and it is represented by a color-shape co-occurrence matrix which explores the relationship of multivisual feature spaces according to visual attention mechanism. As a result, LSCP not only describes low-level visual features integrated with texture feature, color feature, and shape feature but also bridges high-level semantic comprehension. Extensive experimental results on an image retrieval task on the benchmark datasets, corel-10,000, MIT VisTex, and INRIA Holidays, have demonstrated the usefulness, effectiveness, and robustness of the proposed LSCP.