The state of the art in avoiding obstacles using only vision - not sonar or laser rangefinders - is roughly half an hour between collisions. After reviewing the design and failure modes of several current systems, we compare psychology's understanding of perception to current computer/robot perception. There are fundamental differences - which lead to fundamental limitations with current computer perception. The key difference is that robot software is built out of 'black boxes', which have very restricted interactions with each other. In contrast, the human perceptual system is much more integrated. The claim is that a robot that performs any significant task, and does it as well as a person, cannot be created out of 'black boxes.' In fact, it would probably be too interconnected to be designed by hand - instead, tools will be needed to create such designs. To illustrate this idea, we propose to create a visual depth cues at each pixel, as well as depth cues from neighboring pixels and previous depth estimates. Genetic Programming is used to combine these into a new depth estimate. The system learns by predicting both sonar readings and the next image. The design of the system is described, and design decisions are rationalized.