The implementations of the convolution operation in neural networks are usually based on convolution-to-GeMM (General Matrix Multiplication) transformation. However, this transformation requires a big intermediate buffer (called im2col or im2row), and its initialization is both memory and time-consuming. To overcome this problem, one may use the Indirect Convolution Algorithm. This algorithm replaces the im2row buffer with a much smaller buffer of pointers, called indirection buffer. However, it limits our flexibility in the choice of multiplication micro-kernel, making matrix multiplication slightly less efficient than in the classical GeMM algorithm. To overcome this problem, we propose the Almost Indirect Convolution Algorithm, which initializes small specifically ordered block of values, which is used in matrix multiplication, via indirection buffer, the same way GeMM Algorithms initializes one block from im2row buffer. Our approach allows us to combine computational efficiency and flexibility in shape of GeMM micro-kernels with a small memory footprint of the Indirect Convolution Algorithm. Experiments with convolutions of 8-bit matrices on ARM processors show that our convolution works 14-24% faster than Indirect for a small number of channels and 10-20% faster than classical GeMM-based. This proves that it is perfectly suitable for computing inference of 8-bit quantized networks on mobile devices.
In this work we apply commonly known methods of non-adaptive interpolation (nearest pixel, bilinear, B-spline, bicubic, Hermite spline) and sampling (point sampling, supersampling, mip-map pre-filtering, rip-map pre-filtering and FAST) to the problem of projective image transformation. We compare their computational complexity, describe their artifacts and than experimentally measure their quality and working time on mobile processor with ARM architecture. Those methods were widely developed in the 90s and early 2000s, but were not in an area of active research in resent years due to a lower need in computationally efficient algorithms. However, real-time mobile recognition systems, which collect more and more attention, do not only require fast projective transform methods, but also demand high quality images without artifacts. As a result, in this work we choose methods appropriate for those systems, which allow to avoid artifacts, while preserving low computational complexity. Based on the experimental results for our setting they are bilinear interpolation combined with either mip-map pre-filtering or FAST sampling, but could be modified for specific use cases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.