Advanced Optimization (2025 Fall)

Course Information

This course aims to provide an overview of modern optimization theory designed for machine learning, particularly focusing on the gradient-based first-order optimization methods (with offline and online deployments), which serve as the foundational optimization tools for modern large-scale learning tasks.

Announcement

[2025.02.10] The lecture note will be updated in this website:** https://www.pengzhao-ml.com/course/AOptLectureNote

Homework

There will be two HWs. You will have a budget of 24 hours throughout the semester for which there is no late penalty. No further delayed hw will be accepted.

[New!] HW1 (2025.10.16 - 11.18)
HW2 (2025.11.20 - 12.23)

 

Course agenda

WeekDateTopic
Slides
Lecture Notes/ Readings
109.19Course Introduction; PreliminariesLecture 1 (v0925)on matrix norm
Chapter 3.2 of Boyd and Vandenberghe’s book (on basics of convexity)
Chapter 2.1 of Nesterov’s book (on "invention" of convexity)
209.26Convex ProblemsLecture 2 (v1007)Chapter 3 of Amir Beck’s book (on subgradients)
Chapter 5 of Amir Beck’s book (on smoothness and strong convexity)
 10.03no lecture (due to National Holiday)  
310.10GD Methods I: GD method, Lipschitz optimization, gradient descent lemma, Polyak step sizeLecture 3 (v1017)Chapter 8.2 of Amir Beck's book (on GD methods for convex and strongly convex functions)
410.17GD Methods II: GD method, smooth optimization, one-step improvement, Polyak's momentum, Nesterov's AGD, composite optimizationLecture 4 (v1017)Chapter 3.2 of Bubeck’s book (on GD methods for smooth functions)
Chapter 14 & 15 of Ashok Cutkosky's lecture note (on momentum and acceleration)
Chapter 10 of Amir Beck’s book (on composite optimization and proximal gradient)
510.24Online Optimization I: interactive optimization, online gradient descent, convex, strongly convex; online-to-batch conversion, weighted O2B, SGDLecture 5 (v1031)Chapter 3 of Hazan's book (on OGD for convex and strongly convex functions)
Chapter 3 of Orabona's book (on online to batch conversion)
610.31Online Optimization II: exp-concave, online Newton step, expert problem, HedgeLecture 6 (v1103)Chapter 4 of Hazan's book (on ONS for exp-concave functions)
Lecture Note 2 of Luo's course (on PEA problem)
--11.07cancelled  
711.14Online Mirror Descent: geometry, mirror descent, Bregman divergence, stability lemma, Bregman proximal inequality, mirror map, FTRL, dual averagingLecture 7 (v1121)Chapter 6 of Orabona's note (on OMD)
Chapter 7 of Orabona's note (on FTRL)
Chapter 4 of Bubeck's book (on MD and Dual Averaging)
811.21Adaptive Online Optimization: small-loss bound, self-confident tuning, optimistic OMD, conceptual OMD, predictable sequence,Lecture 8 (v1129)Lecture Note 4 of Luos course (on small-loss PEA)
Chapter 4.2 of Orabonas note (on small-loss OCO)
911.28Optimistic OMD: small-loss bound, gradient-variance bound, gradient-variation bound, implication to offline opt, Accelerated Methods, stabilized online-to-batch conversionLecture 9 (v1206)Chapter 7.12 of Orabonas note (on variance/variation bound of OCO)
Lecture Note 9: Optimism for Acceleration
1012.05Adversarial Bandits: MAB, IW loss estimator, Bandit Convex Optimization, Gradient Estimator, Self-concordant BarrierLecture 10 (v1206)Lecture 6 of Luos course (on adversarial MAB)
Lecture 9 of Luos course (on self-concordant barrier for adversarial bandits)
1112.12Stochastic Bandits: MAB, exploration-exploitation dilemma, ETE, Upper confidence bound, Thompson samplingLecture 11Lecture Note 14 of Luo’s course (on stochastic MAB)
1212.19Contextual Bandits: linear bandits, self-normalized concentration, generalized linear bandits, online reinforcement learningLecture 12Lecture Note 15 of Luo’s course (on stochastic linear bandits)
Chapter 6 & 7 of Lattimore and Szepesvári’s book (on ETC and UCB)
1312.26Advanced TopicLecture 13see the references in the slides

Past courses

You may use the following links to access lecture slides and related materials from my past courses. While the overall structure remains consistent each year, I continually refine the content by adding new topics and improving the logical flow based on the feedback and my latest research understanding.

Prerequisites

Familiar with calculus, probability, and linear algebra. Basic knowledge in convex optimization and machine learning.

Reading

Unfortunately, we don't have a specific textbook for this course. In addition to the course slides and lecture notes (will write if time permits), the following books are very good materials for extra readings.

Some related courses:



Last modified: 2025-12-06 by Peng Zhao