This course aims to provide an overview of modern optimization theory designed for machine learning, particularly focusing on the gradient-based first-order optimization methods (with offline and online deployments), which serve as the foundational optimization tools for modern large-scale learning tasks.
Instructor: Peng Zhao ([email protected])
TA: Yuheng Zhao ([email protected])
Location: Room 304, 2号教学楼, Xianlin Campus (仙II-304)
Office hours: appointment by email
[New! 2025.11.29] The tenth lecture will take place on 12.05 (14:00-16:45) onsite.
[2025.11.27] The ninth lecture will take place on 11.28 (14:00-16:45) onsite.
[New! 2025.11.20] The second HW is posted on the website (ddl is 12.23), have a check and complete it as soon as possible!
[2025.11.18] The eighth lecture will take place on 11.21 (14:00-16:45) onsite.
[2025.11.11] The seventh lecture will take place on 11.14 (14:00-16:45) onsite.
[2025.11.06] The lecture originally scheduled for 11.07 has been canceled due to Sports Day at Nanjing U.
[2025.10.31] The sixth lecture will take place on 10.31 (14:00-16:45) onsite.
[2025.10.24] The fifth lecture will take place on 10.24 (14:00-16:45) onsite.
[New! 2025.10.16] The first HW is posted on the website (ddl is 11.18), have a check and complete it as soon as possible!
[2025.10.16] The fourth lecture will take place on 10.17 (14:00-16:45) onsite.
[2025.10.07] The thrid lecture will take place on 10.10 (14:00-16:45) onsite.
[2025.09.25] The second lecture will take place on 09.26 (14:00-16:45) onsite.
[2025.09.18] The first lecture will take place on 09.19 (14:00-16:45) onsite.
[2025.08.21] We now have a course website!
[2025.02.10] The lecture note will be updated in this website:** https://www.pengzhao-ml.com/course/AOptLectureNote
There will be two HWs. You will have a budget of 24 hours throughout the semester for which there is no late penalty. No further delayed hw will be accepted.
Homework1 PDF file: AOpt25-HW-1.pdf
Homework1 tex file download: homework1_file.zip [update history, last update@
]
Instruction: submission rules.html
ID List of who already submitted: list.txt
DDL: 11.18 (Tuesday) 20:59:59 Beijing time
Homework2 PDF file: AOpt25-HW-2.pdf
Homework2 tex file download: homework2_file.zip [update history, last update@
]
Instruction: submission rules.html
ID List of who already submitted: list.txt
DDL: 12.23 (Tuesday) 20:59:59 Beijing time
| Week | Date | Topic | Slides | Lecture Notes/ Readings |
|---|---|---|---|---|
| 1 | 09.19 | Course Introduction; Preliminaries | Lecture 1 (v0925) | on matrix norm Chapter 3.2 of Boyd and Vandenberghe’s book (on basics of convexity) Chapter 2.1 of Nesterov’s book (on "invention" of convexity) |
| 2 | 09.26 | Convex Problems | Lecture 2 (v1007) | Chapter 3 of Amir Beck’s book (on subgradients) Chapter 5 of Amir Beck’s book (on smoothness and strong convexity) |
| 10.03 | no lecture (due to National Holiday) | |||
| 3 | 10.10 | GD Methods I: GD method, Lipschitz optimization, gradient descent lemma, Polyak step size | Lecture 3 (v1017) | Chapter 8.2 of Amir Beck's book (on GD methods for convex and strongly convex functions) |
| 4 | 10.17 | GD Methods II: GD method, smooth optimization, one-step improvement, Polyak's momentum, Nesterov's AGD, composite optimization | Lecture 4 (v1017) | Chapter 3.2 of Bubeck’s book (on GD methods for smooth functions) Chapter 14 & 15 of Ashok Cutkosky's lecture note (on momentum and acceleration) Chapter 10 of Amir Beck’s book (on composite optimization and proximal gradient) |
| 5 | 10.24 | Online Optimization I: interactive optimization, online gradient descent, convex, strongly convex; online-to-batch conversion, weighted O2B, SGD | Lecture 5 (v1031) | Chapter 3 of Hazan's book (on OGD for convex and strongly convex functions) Chapter 3 of Orabona's book (on online to batch conversion) |
| 6 | 10.31 | Online Optimization II: exp-concave, online Newton step, expert problem, Hedge | Lecture 6 (v1103) | Chapter 4 of Hazan's book (on ONS for exp-concave functions) Lecture Note 2 of Luo's course (on PEA problem) |
| -- | 11.07 | cancelled | ||
| 7 | 11.14 | Online Mirror Descent: geometry, mirror descent, Bregman divergence, stability lemma, Bregman proximal inequality, mirror map, FTRL, dual averaging | Lecture 7 (v1121) | Chapter 6 of Orabona's note (on OMD) Chapter 7 of Orabona's note (on FTRL) Chapter 4 of Bubeck's book (on MD and Dual Averaging) |
| 8 | 11.21 | Adaptive Online Optimization: small-loss bound, self-confident tuning, optimistic OMD, conceptual OMD, predictable sequence, | Lecture 8 (v1129) | Lecture Note 4 of Luo’s course (on small-loss PEA) Chapter 4.2 of Orabona’s note (on small-loss OCO) |
| 9 | 11.28 | Optimistic OMD: small-loss bound, gradient-variance bound, gradient-variation bound, implication to offline opt, Accelerated Methods, stabilized online-to-batch conversion | Lecture 9 (v1206) | Chapter 7.12 of Orabona’s note (on variance/variation bound of OCO) Lecture Note 9: Optimism for Acceleration |
| 10 | 12.05 | Adversarial Bandits: MAB, IW loss estimator, Bandit Convex Optimization, Gradient Estimator, Self-concordant Barrier | Lecture 10 (v1206) | Lecture 6 of Luo’s course (on adversarial MAB) Lecture 9 of Luo’s course (on self-concordant barrier for adversarial bandits) |
| 11 | 12.12 | Stochastic Bandits: MAB, exploration-exploitation dilemma, ETE, Upper confidence bound, Thompson sampling | Lecture 11 | Lecture Note 14 of Luo’s course (on stochastic MAB) |
| 12 | 12.19 | Contextual Bandits: linear bandits, self-normalized concentration, generalized linear bandits, online reinforcement learning | Lecture 12 | Lecture Note 15 of Luo’s course (on stochastic linear bandits) Chapter 6 & 7 of Lattimore and Szepesvári’s book (on ETC and UCB) |
| 13 | 12.26 | Advanced Topic | Lecture 13 | see the references in the slides |
You may use the following links to access lecture slides and related materials from my past courses. While the overall structure remains consistent each year, I continually refine the content by adding new topics and improving the logical flow based on the feedback and my latest research understanding.
Advanced Optimization (For Undergraduate and Graduate Students, 2024 Fall)
Advanced Optimization (For Undergraduate and Graduate Students, 2023 Fall)
Familiar with calculus, probability, and linear algebra. Basic knowledge in convex optimization and machine learning.
Unfortunately, we don't have a specific textbook for this course. In addition to the course slides and lecture notes (will write if time permits), the following books are very good materials for extra readings.
Amir Beck. First-Order Methods in Optimization. MOS-SIAM Series on Optimization, 2017.
Yurii Nesterov. Lectures on Convex Optimization. Second Edition, 2018.
Sébastien Bubeck. Convex Optimization: Algorithms and Complexity. Foundation and Trends in Machine Learning, 2015.
Elad Hazan. Introduction to Online Convex Optimization (second edition). MIT Press, 2022.
Francesco Orabona. A Modern Introduction to Online Learning. Lecture Notes, 2022.
Tor Lattimore and Csaba Szepesvári. Bandit Algorithms. Cambridge University Press, 2021.
Some related courses:
CSCI 659: Introduction to Online Optimization/Learning, Fall 2022. University of South California, Haipeng Luo.
EC525: Optimization for Machine Learning, Fall 2022. Boston University, Ashok Cutkosky.
EECS 272, Fall 2025: Foundations of Learning, Decisions, and Games. UC Berkey, Nika Haghtalab.
Last modified: 2025-12-06 by Peng Zhao