Encoder-decoder framework attracts great interests in image caption. It focuses on the extraction of low-level features and achieves good results. The performance can be further improved if high-level semantics are considered. In this work, we propose a new image caption model incorporating high-level semantic features through an revised Convolutional Neural Network(CNN). Both the low-level image features and high-level semantic features are fed into the Long-Short Term Memory networks(LSTMs) to acquire natural sentence descriptions. We show in a number of experiments on Flickr8K and Flickr30K datasets that our method outperforms most standard network baseline for image caption.