Vision is only a part of a larger system that converts visual information into knowledge structures. These structures drive the vision process, resolving ambiguity and uncertainty via feedback, and provide image understanding, which is an interpretation of visual information in terms of these knowledge models. This mechanism provides a reliable recognition if the target is occluded or cannot be recognized. It is hard to split the entire system apart, and reliable solutions to the target recognition problems are possible only within the solution of a more generic Image Understanding Problem. Brain reduces informational and computational complexities, using implicit symbolic coding of features, hierarchical compression, and selective processing of visual information. Biologically inspired Network-Symbolic representation, where both systematic structural/logical methods and neural/statistical methods are parts of a single mechanism, converts visual information into relational Network-Symbolic structures, avoiding artificial precise computations of 3-dimensional models. Logic of visual scenes can be captured in Network-Symbolic models and used for disambiguation of visual information. Network-Symbolic Transformations derive abstract structures, which allow for invariant recognition of an object as exemplar of a class. Active vision helps build consistent, unambiguous models. Such Image/Video Understanding Systems will be able reliably recognizing targets in real-world conditions.