This course provides an in-depth theoretical treatment of classical and modern optimization methods that are relevant in data science. After a general discussion about the role that optimization has in the process of learning from data, we give an introduction to the theory of (convex) optimization. Based on this, we present and analyze algorithms in the following four categories: first-order methods (gradient and coordinate descent, Frank-Wolfe, subgradient and mirror descent, stochastic and incremental gradient methods); second-order methods (Newton and quasi Newton methods); non-convexity (local convergence, provable global convergence, cone programming, convex relaxations); min-max optimization (extragradient methods). The emphasis is on the motivations and design principles behind the algorithms, on provable performance bounds, and on the mathematical tools and techniques to prove them. The goal is to equip students with a fundamental understanding about why optimization algorithms work, and what their limits are. This understanding will be of help in selecting suitable algorithms in a given application, but providing concrete practical guidance is not our focus.
Lecturers: Prof. Niao He (OAT Y21.1), Prof. Bernd Gärtner (OAT Z15).
Classes: Mon 13-14 (NO C60), Tue 10-12 (ETF C1).
Useful Links: Course Catalogue (261-5110-00L). All materials are available at Moodle including annoucements and Q&A.
Prerequisites: A solid background in analysis and linear algebra; some background in theoretical computer science (computational complexity, analysis of algorithms); the ability to understand and write mathematical proofs.
Week | Date | Topic |
---|---|---|
8 | Mon 19.02 (cancelled) Tue 20.02 |
Introduction and Optimization |
9 | Mon 26.02 Tue 27.02 |
Theory of Convex Functions |
10 | Mon 04.03 Tue 05.03 |
Gradient Descent |
11 | Mon 11.03 Tue 12.03 |
Projected Gradient Descent and Coordinate Descent |
12 | Mon 18.03 Tue 19.03 |
Nonconvex Functions |
13 | Mon 25.03 Tue 26.03 |
The Frank-Wolfe Algorithm |
14 | Easter Break | |
15 | Mon 08.04 Tue 09.04 |
Newton’s Method and Quasi-Newton methods |
16 | Mon 15.04 Tue 16.04 |
Subgradient Method |
17 | Mon 22.04 Tue 23.04 |
Mirror Descent, Smoothing, Proximal Algorithms |
18 | Mon 29.04 Tue 30.04 |
Stochastic Optimization: SGD |
19 | Mon 06.05 Tue 07.05 |
Finite Sum Optimization |
20 | Mon 13.05 Tue 14.05 |
Min-Max Optimization, Part I |
21 | Mon 20.05 Tue 21.05 |
Min-Max Optimization, Part II |
22 | Mon 27.05 Tue 28.05 |
Data Science Applications |
There will be a written exam in the examination session. Furthermore, there will be two mandatory written graded assignments during the semester. The final grade of the whole course will be calculated as a weighted average of the grades for the exam (70%) and the graded assignments (30%). Concretely, let PE be the performance in the final exam, and P1, P2 be the performances in the four graded assignments, measured as the percentage of points being attained (between 0% and 100%). A graded assignment that is not handed in is counted with a performance of 0%. Then the overall course performance is computed as P = 0.15*P1 + 0.15*P2 + 0.7*PE. A course performance of P >= 50% is guaranteed to lead to a passing grade, but depending on the overall performance of the cohort, we may lower the threshold for a passing grade.
Graded Assignments:
Exam:
Xiang Li (OAT Y23) | Zebang Shen (OAT Y21.2) | Anas Barakat (OAT Y21.2) |
Ilyas Fatkhullin (OAT X14) | Liang Zhang (OAT Y23) | Linfei Pan (CNB G100.9) |
Fangyuan Sun | Weixuan Yuan |
Regular Exercises:
Group | Students with Surnames | Time | Room |
---|---|---|---|
A | A - L | Tue 14-16 | CAB G51 |
B | M - Z | Fri 14-16 | ML H44 |