Optimization in Machine Learning (Fall 2024)

Lecture Information

Tue 13:00-14:35 at Wenshi Building 211

Grading

Your grade will be determined by assignments (40%) and a project (60%).

Teaching Staffs

Instructor: Cheng Chen, 201E Math Building, chchen@sei.ecnu.edu.cn
Teaching Asistant: Zicheng Hu (51275902019@stu.ecnu.edu.cn); Tingkai Jia (51275902086@stu.ecnu.edu.cn)

References

Amir Beck. First-Order Methods in Optimization. MOS-SIAM Series on Optimization, 2017.

Schedule

Date	Topic	Slides	Homework	Material
9.24.	Introduction; Basic mathematics	Lecture 1	-	Notes 1
10.8.	Matrix calculus; Convex set	Lecture 2	Homework 1, sol	Notes 2
10.15.	Convex function; Subgradient	Lecture 3	Homework 2, sol	Notes 3
10.29.	Gradient descent; smoothness and strongly convex	Lecture 4	Homework 3, sol	Notes 4
11.5.	More on gradient descent	Lecture 5	Homework 4, sol	Notes 5
11.12.	Projected gradient descent	Lecture 6	Homework 5, sol	Notes 6
11.19.	Subgradient descent	Lecture 7	Homework 6, sol	Notes 7
11.26.	Proximal gradient descent	Lecture 8	Homework 7, sol	Notes 8
12.3.	Accelerated gradient descent	Lecture 9	-	Notes 9
12.10.	Newton and quasi-Newton methods	Lecture 10	-	Notes 10
12.17.	Stochastic gradient descent	Lecture 11	Homework 8	Notes 11
12.24.	Variance reduced methods	Lecture 12	-	Notes 12
12.31.	Nonconvex optimization; minimax optimization	Lecture 13	-	Notes 13

Project

Projects will be evaluated based on a combination of:

presentation (25%), at Tuesday of the 18th week
final report (75%)

Projects can either be individual or in teams of size up to 3 students. In the team case, you must clearly describe who contributed which parts of the project, and the contributions should be roughly equal. Obviously, it is expected that a team project makes more progress than an individual project.

Various types of projects are possible. Some examples include:

optimization in application: you have some particular application that involves solving some type of optimization problem. You explore the performance of different algorithms for this setting.
methodology projects: you are interested in understanding the properties of some class of optimization algorithms.
survey projects: you are interested in learning more deeply about a particular area of the course that we touched on, so you read a number of papers in the area and summarize your findings.
a new algorithm: you have an idea for a new kind of optimization algorithm, and would like to study/explore it.

Requirements

You need to submit an English report and your code. For individual project, the report should be up to 6 pages (with unlimited pages for references and appendix). For team project, the report should be up to 8 pages (with unlimited pages for references and appendix). The submission deadline is January 14th, 2025.
You are required to use Latex to write the report. We suggest you to use this templete.
Plagiarism is strictly prohibited, and it is unacceptable to duplicate entire sentences or figures from other sources.

Topic examples
These areas are just mentioned for illustration. You are encouraged to freely choose very different topics as long as they concern optimization for machine learning.

Esacape from saddle points in nonconvex optimization, e.g., “Sharp analysis for nonconvex sgd escaping from saddle points”, C. Fang, Z. Lin, and T. Zhang, 2019.
Minimax optimization, e.g., “On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems”, T. Lin, J. Chi, and M. Jordan, ICML 2020.
Acceleration for variance reduction, e.g., “Katyusha: The First Direct Acceleration of Stochastic Gradient Methods”, Z. Allen-Zhu, STOC 2017.
Variance reduction for nonconvex optimization, e.g., “Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator”, C. Fang, C. Li, Z. Lin, T. Zhang, NeurIPS 2018.
Adam-like algorithms, e.g., “On the convergence of Adam and beyond”, S. Reddi, S. Kale, S. Kumar, ICLR 2018.
Federated learning, e.g., “On the convergence of fedavg on non-iid data”, X. Li, K. Huang, W. Yang, S. Wang, Z. Zhang, ICLR 2020.
Bilevel optimization, e.g., “A Fully First-Order Method for Stochastic Bilevel Optimization”, J. Kwon, D. Kwon, S. Wright, R. Nowak, ICML 2023.
Explicit superlinear convergence of quasi-Newton methods, e.g., “Greedy quasi-Newton methods with explicit superlinear convergence”, A. Rodomanov and Y. Nesterov. SIAM Journal on Optimization, 31(1):785–811, 2021
Zeroth-order optimization, e.g., “Random gradient-free minimization of convex functions”, Y. Nesterov, V. Spokoiny, Foundations of Computational Mathematics, 17(2):527–566, 2017.
Nonsmooth nonconvex optimization, e.g., “Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions”, J. Zhang, H. Lin, S. Jegelka, S. Sra, A. Jadbabaie, ICML 2020.
Parameter-free optimization, e.g., “On the convergence of stochastic gradient descent with adaptive stepsizes.”, X. Li, F. Orabona, ICML 2023.
...