In long short-term memory (LSTM) neural networks, the input gates and output gates control information flowing into and out of memory cells. For sequence-to-sequence learning problems, each element is input into the network only once. If the input gates are closed at a certain step, the information is lost and is not input again. The same problem exists for the output gates. Therefore, the input and output gates do not fully support the roles of gating. An LSTM network with external memories, in which separate memories are installed for the input and output gates, is proposed. Information that is blocked by the input gates is preserved in the input memories, enabling the cells to read these memories when necessary. Similarly, information blocked by the output gates is preserved in the output memories and flows out to hidden units of the network at an appropriate time. In addition, a dynamic attention model is proposed to take into account the attention history. It provides guidance when predicting the attention weights at each step. The proposed model exploits attention-based encoder–decoder architecture to generate image captions. Experiments were conducted on three benchmark datasets, namely Flickr8k, Flickr30k, and MSCOCO, to demonstrate the effectiveness of the proposed approach. Captions generated by the proposed method are longer and more informative than those obtained with the original LSTM network.