Textual information contained in images is a valuable source of high-level semantics for image indexing and retrieval.
This paper proposes a new method to detect and segment text from complex images. First, a density-based clustering
method is employed to discover the candidate text regions. The clustering method is from data mining area. It computes
the density distribution of overall image and makes spatial connective pixels with similar color/grayscale into one region.
The clustered regions are deemed as candidate text regions. Then simple heuristics are applied to delete those obvious
non-text regions from the candidate. But there still exits a few non-text regions in the candidate. Therefore a texture-based
method is used to select text regions from the filtered candidate text regions. Considering the time complexity of
density computation in clustering step, an approximate algorithm is designed to improve the efficiency. Experimental
result shows the method is robust to variations in text font, orientation, language, and size.