A comparative study of recently deep learning optimizers

Yan Liu; Maojun Zhang; Zhiwei Zhong; Xiangrong Zeng; Xin Long

doi:10.1117/12.2626430

21 December 2021 A comparative study of recently deep learning optimizers

Yan Liu, Maojun Zhang, Zhiwei Zhong, Xiangrong Zeng, Xin Long

Proceedings Volume 12156, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2021); 121560F (2021) https://doi.org/10.1117/12.2626430
Event: International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2021), 2021, Sanya, China

Abstract

Deep learning has achieved great success in computer vision, natural language processing, recommendation systems and other fields. However, the models of deep neural network (DNN) are very complex, which often contain millions of parameters and tens or even hundreds of layers. Optimizing weights of DNNs is easy to fall into local optima, and hard to achieve better performance. Thus, how to choose an effective optimizer which is able to obtain network with higher precision and stronger generalization ability is of great significance. In this article, we make a review of some popular historical and state-of-the-art optimizers, and conclude them into three main streams: first order optimizers that accelerate convergence speed of stochastic gradient descent or/and adaptively adjust learning rates; second order optimizers that can make use of second-order information of loss landscape which helps escape from local optima; proxy optimizers that are able to deal with non-differentiable loss functions through combining with the proxy algorithm. We also summarize the first and second order moment used in different optimizers. Moreover, we provide an insightful comparison on some optimizers through image classification. The results show that first order optimizers like AdaMod and Ranger not only have low computational cost, but also show great convergence speed. Meanwhile, the optimizers that can introduce curvature information such as Adabelief and Apollo, have a better generalization especially when optimizing complex network.

Citation Download Citation

Yan Liu, Maojun Zhang, Zhiwei Zhong, Xiangrong Zeng, and Xin Long "A comparative study of recently deep learning optimizers", Proc. SPIE 12156, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2021), 121560F (21 December 2021); https://doi.org/10.1117/12.2626430

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
9 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Stochastic processes

Lithium

Image classification

Neural networks

Optimization (mathematics)

Network architectures

Quantization

Show All Keywords

Keywords/Phrases

Search In:

Publication Years