Modern target recognition systems suffer from the lack of human-like abilities to understand the visual scene, detect, unambiguously identify and recognize objects. As result, the target recognition systems become dysfunctional if target doesn't demonstrate remarkably distinctive and contrast features that allow for unambiguous separation from background and identification upon such features. This is somewhat similar to visual systems of primitive animals like frogs, which can separate and recognize only moving objects. However, human vision unambiguously separates any object from its background. Human vision combines a rough but wide peripheral, and narrow but precise foveal systems with visual intelligence that utilize both scene and object contexts and resolve ambiguity and uncertainty in the visual information. Perceptual grouping is one of the most important processes in human vision, and it binds visual information into meaningful patterns and structures. Unlike the traditional computer vision models, biologically-inspired Network-Symbolic models convert image information into an "understandable" Network-Symbolic format, which is similar to relational knowledge models. The equivalent of interaction between peripheral and foveal systems in the network-symbolic system is achieved via interaction between Visual and Object Buffers and the top-level system of Visual Intelligence. This interaction provides recursive rough context identification of regions of interest in the visual scene and their analysis in the object buffer for precise and unambiguous separation of the object from background/clutter with following recognition of the target.