The detection of the characters in the natural scene is susceptible to factors such as complex background, variable viewing angle and diverse forms of language, which leads to poor detection results. Aiming at these problems, a new text detection method was proposed, which consisted of two main stages, candidate region extraction and text region detection. At first stage, the method used multiple scale transformations of original image and multiple thresholds of maximally stable extremal regions (MSER) to detect the text regions which could detect character regions comprehensively. At second stage, obtained SWT maps by using the stroke width transform (SWT) algorithm to compute the candidate regions, then using cascaded classifiers to propose non-text regions. The proposed method was evaluated on the standard benchmark datasets of ICDAR2011 and the datasets that we made our own data sets. The experiment results showed that the proposed method have greatly improved that compared to other text detection methods.