Conventional block-based classification is based on the labeling of individual blocks of an image, disregarding any adjacency information. When analyzing a small region of an image, it is sometimes difficult even for a person to tell what the image is about. Hence, the drawback of context-free use of visual features is recognized up front. This paper studies a context-dependant classifier based on a two dimensional Hidden Markov Model. In particular we explore how the balance between structural information and content description affect the precision in a semantic feature extraction scenario. We train a set of semantic classes using the development video archive annotated by the TRECVid 2005 participants. To extract semantic features the classes with maximum a posteriori probability are searched jointly for all blocks. Preliminary results indicate that the performance of the system can be increased by varying the block size.