In this study, we investigated how shapes are classified based on local and global features by four representative convolutional neural networks (CNNs), i.e., AlexNet, VGG, ResNet and Inception. While the local features are based on simple components, such as orientation of line segment, the global features are based on the whole object, such as whether an object has a hole. For example, solid triangles and solid squares are differentiated by local features, solid circles and rings are differentiated by a global feature. Two sets of experiments were performed in this research. In the first experiment, we examined how the four CNNs pre-trained on ImageNet (with transfer learning) learned to differentiate the regular shapes (equilateral-triangles, squares, circles and rings). Our results showed that the pre-trained CNNs exhibited faster learning rates in the tasks discriminating the local features than in the tasks discriminating the global feature. However, the transfer learning of discriminating the global feature in regular shapes were better generalized to irregular shapes than the transfer learning of discriminating local features. In the second experiment, the CNNs were trained from scratch (with random weights initialization) to discriminate local and global features in regular and irregular shapes. Different from the transfer learning, the CNNs exhibited faster learning rates in discriminating the global feature than the local features. Similar to transfer learning, the CNNs exhibited excellent learning generalization to discriminating the global feature of irregular shapes, but poor learning generalization to discriminating the local features in the irregular shapes. The overarching goal of this research is to create a paradigm and benchmark to directly compare how the CNNs and primate visual systems process geometrical invariants. In contrast to the ImageNet approach which employs natural images to train CNNs, we employed the “ShapeNet” approach which features geometrical shapes with well-defined properties. The ShapeNet approach will not only help elucidate the strengths and limitations of CNN computation, but also provide insights into visual information processing in the primates.
|