In this paper we study the real-time augmentation - method of increasing variability of training dataset during the learning process. We consider the most common label-preserving deformations, which can be useful in many practical tasks. Due to limitations of existing augmentation tools like increase in learning time or dependence on a specific platform, we developed own real-time augmentation system. Experiments on MNIST and SVHN datasets demonstrated the effectiveness of suggested approach - the quality of the trained models improves, and learning time remains the same as if augmentation was not used.
This paper addresses one of the fundamental problems of machine learning - training data acquiring. Obtaining enough natural training data is rather difficult and expensive. In last years usage of synthetic images has become more beneficial as it allows to save human time and also to provide a huge number of images which otherwise would be difficult to obtain. However, for successful learning on artificial dataset one should try to reduce the gap between natural and synthetic data distributions. In this paper we describe an algorithm which allows to create artificial training datasets for OCR systems using russian passport as a case study.