IEOR 8100-001: Learning and optimization for sequential decision making

Instructor: Shipra Agrawal
Schedule: MW, 1:10 pm-2:25 pm, Location: 825 Mudd
Office Hours: Thursday 3-4pm, 423 Mudd or by appointment.

TA: Michael Levi Hamilton
Office hours 12-1 Mondays in room 301 Mudd

Class mailing list:
(Please use class mailing list for queries about the class, and submitting homeworks, if submitting by email. The emails sent to this mailing list will reach the instructor and TA)

Course description:
In many sequential decision making problems, ranging from electronic commerce (recommendation systems, internet advertising, content optimization), revenue and inventory management to playing Atari games, the decision maker faces a fundamental tradeoff between taking the option that has performed best in the past and exploring new options to gather more information, i.e. the exploitation vs. exploration. This course will discuss recent advances in combining machine learning and optimization techniques to achieve near-optimal tradeoff between exploration and exploitation when making sequential decisions under uncertainty.

Tentative Schedule
Jan 20
Jan 25, 27
epsilon-greedy, UCB, Lower bounds
Feb 1,3
Thompson Sampling
Feb 8,10
Adversarial bandits ( EXP3 algorithm)
Feb 15,17
Linear bandits, Linear contextual bandits
Feb 22, 24
General contextual bandits + applications (Guest lectures by Alekh Agarwal, MSR New York)
Feb 29, Mar 2
Dynamic pricing (Guest lectures by Alex Slivkins, MSR New York)
Mar 7,9
Bandits with Global Constraints

-------Spring break----------
Mar 21,23
MDP/Reinforcement Learning
Mar 28,30
MDP/Reinforcement Learning
April 4,6
MDP/Reinforcement Learning
April 11,13
Monte Carlo Tree search
April 18
Active Learning
April 20, 25, 27
[Research project presentations]
May 2
[Research project presentations]

Probability theory or statistics (at least at the level of SIEO W3600, preferably SIEO W4150) and competence with reading and writing mathematical proofs. Some parts of the course may be difficult to follow without any previous exposure to analysis of algorithms and linear/convex optimization. You should seek instructor’s permission if that is the case. Familiarity with programming languages is NOT required for this course.

Resources and study material:
There are no required textbooks for the course. A reading list and lecture notes will be provided as the course progresses. The following survey is recommended:

The assessment will be based on (50%) homeworks, (10%) scribe one lecture, and (40%) research project.