The textual structures like the characters, words, text lines, paragraphs on a document image are usually laid out in a very structured manner -- having preferred spatial relations. These spatial relations are rarely deterministic; instead, they describe correlations and likelihoods. Therefore, any realistic document layout analysis algorithm should utilize this type of knowledge in order to optimize its performances. In this paper, we first describe a method for automatically generating a large amount of almost 100% correct ground truth data for the document layout analysis. The bounding boxes for the characters, words, text lines, paragraphs are expressed in a hierarchy. Then based on these layout ground-truth, we build statistical models to model the layout structures for the words, text lines, paragraphs on document images. Finally, we described an algorithm that utilizes these statistical models to extract the text words on document images. The performance of the algorithm is evaluated and reported.