From Event: SPIE Commercial + Scientific Sensing and Imaging, 2018
In the human visual system, visible objects are recognized by features, which can be classified into local features that are based on their simple components (i.e., line segment, angle, color, etc.) and global features that are based on the whole objects (i.e., connectivity, number of holes, etc.). Over the past half century, anatomical, physiological, behavioral and computational studies of the visual systems have led to a generally accepted model of vision, which starts at processing local features in the early stages of the visual pathways, followed by integrating them to global features in the later stages of the visual pathways. However, this popular local-to-global model has been challenged by a set of experiments showing that the visual systems in humans, non-human primates and honey bees are more sensitive to global features than local features. These “global-first” studies further motivated developing new paradigms and approaches to understand human vision and build new vision models. In this study, we started a new series of experiments that examine how two representative pre-trained Convolutional Neural Networks (CNN) (AlexNet and VGG-19) process local and global features. The CNNs were trained to classify geometric shapes into two categories based on local features (e.g., triangle, square and circle) or a global feature (e.g., having a hole). In contrast to the biological visual systems, the CNNs were more effective at classifying images based on local features than the global feature. We further showed that adding distractors greatly lowered the performance of the CNNs, again different from the biological visual systems. Ongoing studies will extend these analyses to other geometrical invariants and internal representations of the CNNs. The overarching goal is to use the powerful CNNs as a tool to gain insights into the biological visual systems, including that of humans and non-human primates.
Yufeng Zheng, Jun Huang, Tianwen Chen, Yang Ou, and Wu Zhou, "Processing global and local features in convolutional neural network (CNN) and primate visual systems," Proc. SPIE 10668, Mobile Multimedia/Image Processing, Security, and Applications 2018, 1066809 (Presented at SPIE Commercial + Scientific Sensing and Imaging: April 16, 2018; Published: 14 May 2018); https://doi.org/10.1117/12.2305421.
Conference Presentations are recordings of oral presentations given at SPIE conferences and published as part of the proceedings. They include the speaker's narration with video of the slides and animations. Most include full-text papers. Interactive, searchable transcripts and closed captioning are now available for 2018 presentations, with transcripts for prior recordings added daily.
Search our growing collection of more than 16,000 conference presentations, including many plenaries and keynotes.