We propose a document categorization method based on a document model that can be defined externally for each task and that categorizes Web content or business documents into a target category in accordance with the similarity of the model. The main feature of the proposed method consists of two aspects of semantics extraction from an input document. The semantics of terms are extracted by the semantic pattern analysis and implicit meanings of document substructure are specified by a bottom-up text clustering technique focusing on the similarity of text line attributes. We have constructed a system based on the proposed method for trial purposes. The experimental results show that the system achieves more than 80% classification accuracy in categorizing Web content and business documents into 15 or 70 categories.