Recognizing text in images captured in the wild is a fundamental preprocessing task for many computer vision and machine learning applications and has gained significant attention in recent years. This paper proposes an end-to-end trainable deep review neural network for scene text recognition, which is a combination of feature extraction, feature reviewing, feature attention, and sequence recognition. Our model can generate the predicted text without any segmentation or grouping algorithm. Because the attention model in the feature attention stage lacks global modeling ability, a review network is applied to extract the global context of sequence data in the feature reviewing stage. We perform rigorous experiments across a number of standard benchmarks, including IIIT5K, SVT, ICDAR03, and ICDAR13 datasets. Experimental results show that our model is comparable to or outperforms state-of-the-art techniques.