Deep learning has achieved great success in computer vision, natural language processing, recommendation systems and other fields. However, the models of deep neural network (DNN) are very complex, which often contain millions of parameters and tens or even hundreds of layers. Optimizing weights of DNNs is easy to fall into local optima, and hard to achieve better performance. Thus, how to choose an effective optimizer which is able to obtain network with higher precision and stronger generalization ability is of great significance. In this article, we make a review of some popular historical and state-of-the-art optimizers, and conclude them into three main streams: first order optimizers that accelerate convergence speed of stochastic gradient descent or/and adaptively adjust learning rates; second order optimizers that can make use of second-order information of loss landscape which helps escape from local optima; proxy optimizers that are able to deal with non-differentiable loss functions through combining with the proxy algorithm. We also summarize the first and second order moment used in different optimizers. Moreover, we provide an insightful comparison on some optimizers through image classification. The results show that first order optimizers like AdaMod and Ranger not only have low computational cost, but also show great convergence speed. Meanwhile, the optimizers that can introduce curvature information such as Adabelief and Apollo, have a better generalization especially when optimizing complex network.
With the rapid development of oblique photography (OP) in recent years, the accuracy of reality modeling has increased, which has led to a surge in computational complexity. To solve the problem, a lot of reality modeling software adopts the strategy of cluster parallel computing for modeling. In this paper, the regression analysis method is used to study the influence of the configuration of the compute nodes in the cluster, which aims at improving the computational efficiency of the cluster for the 3D reconstruction task. Furthermore, the M/M/S queuing model in queuing theory is used to model the multi-task assignment of the cluster, and the mathematical model between compute nodes and performance of the cluster is established, which achieves the effective quantitative evaluation of the cluster computing efficiency. Experiments show that the CPU performance of the compute nodes is the most critical hardware factor affecting the efficiency of the cluster.