An image caption generation model with adaptive attention mechanism is proposed for dealing with the weakness of the image description model by the local image features. Under the framework of encoder and decoder architecture, the local and global features of images are extracted by using inception V3 and VGG19 network models at the encoder. Since the adaptive attention mechanism proposed in this paper can automatically identify and acquire the importance of local and global image information, the decoder can generate sentences describing the image more intuitively and accurately. The proposed model is trained and tested on Microsoft COCO dataset. The experimental results show that the proposed method can extract more abundant and complete information from the image and generate more accurate sentences, compared with the image caption model based on local features.