Stochastic Gradient Descent (SGD) is a commonly used optimization algorithm for training machine learning models.The learning rate (step size), as a crucial hyperparameter in SGD, directly affects the magnitude of parameter updates.To improve the convergence speed and performance of models, researchers have proposed a variety of advanced learning rate scheduling methods. The objective of this talk is to review advanced learning rate schedules for SGD including step decay,
cyclical step size, and adaptive learning rate. By comprehensively comparing these advanced learning rate scheduling methods, we can observe their respective advantages and applicability to different problems and datasets. Choosing an appropriate learning rate schedule is a crucial step in optimizing the model training process, leading to improved performance and convergence speed.