Vision is a part of a larger information system that converts visual information into knowledge structures. These structures drive vision process, resolve ambiguity and uncertainty via feedback projections, and provide image understanding that is an interpretation of visual information in terms of such knowledge models. The ability of human brain to emulate knowledge structures in the form of networks-symbolic models is found. And that means an important shift of paradigm in our knowledge about brain from neural networks to "cortical software". Symbols, predicates and grammars naturally emerge in such active multilevel hierarchical networks, and logic is simply a way of restructuring such models. Brain analyzes an image as a graph-type decision structure created via multilevel hierarchical compression of visual information. Mid-level vision processes like clustering, perceptual grouping, separation of figure from ground, are special kinds of graph/network transformations. They convert low-level image structure into the set of more abstract ones, which represent objects and visual scene, making them easy for analysis by higher-level knowledge structures. Higher-level vision phenomena are results of such analysis. Composition of network-symbolic models works similar to frames and agents, combines learning, classification, analogy together with higher-level model-based reasoning into a single framework. Such models do not require supercomputers. Based on such principles, and using methods of Computational intelligence, an Image Understanding system can convert images into the network-symbolic knowledge models, and effectively resolve uncertainty and ambiguity, providing unifying representation for perception and cognition. That allows creating new intelligent computer vision systems for robotic and defense industries.