This paper addresses to text line extraction in free style document, such as business card, envelope, poster, etc. In free
style document, global property such as character size, line direction can hardly be concluded, which reveals a grave
limitation in traditional layout analysis.
'Line' is the most prominent and the highest structure in our bottom-up method. First, we apply a novel intensity
function found on gradient information to locate text areas where gradient within a window have large magnitude and
various directions, and split such areas into text pieces. We build a probability model of lines consist of text pieces via
statistics on training data. For an input image, we group text pieces to lines using a simulated annealing algorithm with
cost function based on the probability model.