From Event: SPIE Optical Engineering + Applications, 2017
We investigate low-complexity convolutional neural networks (CNNs) for object detection for embedded vision applications. It is well-known that consolidation of an embedded system for CNN-based object detection is more challenging due to computation and memory requirement comparing with problems like image classification. To achieve these requirements, we design and develop an end-to-end TensorFlow (TF)-based fully-convolutional deep neural network for generic object detection task inspired by one of the fastest framework, YOLO.1 The proposed network predicts the localization of every object by regressing the coordinates of the corresponding bounding box as in YOLO. Hence, the network is able to detect any objects without any limitations in the size of the objects. However, unlike YOLO, all the layers in the proposed network is fully-convolutional. Thus, it is able to take input images of any size. We pick face detection as an use case. We evaluate the proposed model for face detection on FDDB dataset and Widerface dataset. As another use case of generic object detection, we evaluate its performance on PASCAL VOC dataset. The experimental results demonstrate that the proposed network can predict object instances of different sizes and poses in a single frame. Moreover, the results show that the proposed method achieves comparative accuracy comparing with the state-of-the-art CNN-based object detection methods while reducing the model size by 3× and memory-BW by 3 − 4× comparing with one of the best real-time CNN-based object detectors, YOLO. Our 8-bit fixed-point TF-model provides additional 4× memory reduction while keeping the accuracy nearly as good as the floating-point model. Moreover, the fixed- point model is capable of achieving 20× faster inference speed comparing with the floating-point model. Thus, the proposed method is promising for embedded implementations.
Subarna Tripathi, Byeongkeun Kang, Gokce Dane, and Truong Nguyen, "Low-complexity object detection with deep convolutional neural network for embedded systems," Proc. SPIE 10396, Applications of Digital Image Processing XL, 103961M (Presented at SPIE Optical Engineering + Applications: August 10, 2017; Published: 19 September 2017); https://doi.org/10.1117/12.2275512.
Conference Presentations are recordings of oral presentations given at SPIE conferences and published as part of the conference proceedings. They include the speaker's narration along with a video recording of the presentation slides and animations. Many conference presentations also include full-text papers. Search and browse our growing collection of more than 14,000 conference presentations, including many plenary and keynote presentations.