Lecture Notes and references

Lecture 23-24 (Wed April 13, Mon April 18): X-armed bandits, Monte Carlo Tree Search
Lecture Slides

Figures in these slides were copied from Munos, 2014 (reference below).
Main Reference
Remi Munos. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning. 2014. <hal-00747575v5> [pdf]

Other related references
Bandit based Monte-Carlo Planning, by Levente Kocsis, Csaba Szepesvári, ECML 2006 [pdf]
  • UCT algorithm: first optimistic (UCB like) approach to Monte Carlo Tree Search
Modification of UCT with Patterns in Monte-Carlo Go.
Sylvain Gelly, Yizao Wang, Remi Munos, Olivier Teytaud.[Research Report] 2006. <inria-00117266v1> [pdf]
  • Application of Optimistic tree-search algorithm UCT to Computer Go


Lecture 21-22 (Wed April 6, Mon April 11): UCRL2 algorithm.
Lecture Slides (Complete slide deck for two lectures: April 6 and 11, includes algorithm description and regret analysis)
Lecture notes part I (April 6) Algorithm description
Lecture notes part II (April 11) Regret analysis

Main Reference
Near-optimal Regret Bounds for Reinforcement Learning [pdf]
Other closely related references
Optimal Adaptive Policies for Markov Decision Processes
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

Lecture 20 (Mon April 4): Exploration-exploitation in MDP/reinforcement learning.
Lecture Notes

Lecture 19 (Wed Mar 30): Bayesian bandit problem and Introduction to MDP.
Lecture notes
References
J. C. Gittins Bandit Processes and Dynamic Allocation Indices 1979

Useful textbooks on MDP
Markov Decision Processes: Discrete Stochastic Dynamic Programming by Martin L. Puterman
  • Chapter 3 has many interesting examples of MDP. The inventory control example discussed in class is from Section 3.2
Neuro-Dynamic Programming by Dimitri P. Bertsekas and John Tsitsiklis
  • Section 2.4 has many interesting examples. The Tetris game example discussed in class is Example 2.3 page 50.

Lecture 18 (Mon Mar 28): Proof of Gittins Index theorem Part 2
Lecture Notes
Hand-written notes
References
J. N. Tsitsiklis A short proof to Gittins Index Theorem 1994

Lecture 17 (Wed Mar 23): Proof of Gittins Index theorem Part 1
Lecture notes
Proof of Markovian stationary optimal policy (Lecture slides)

Lecture 16 (Mon Mar 21): Bandits with Markovian inputs, Gittins Index theorem
Lecture notes

Lecture 14, 15 (Mon Mar 7, Wed Mar 9): Bandits with constraints and concave objective
Lecture notes Part I (Bandits with Knapsacks)
Lecture notes Part II (Bandits with global concave objective)
Preliminary Lecture Notes (slides)
References
S. Agrawal, N. R. Devanur, "Bandits with concave rewards and convex knapsacks". EC 2014
  • bandits with knapsacks, bandits with convex constraints, bandits with global concave objective
S. Agrawal, N. R. Devanur, "Fast algorithms for online stochastic convex programming".
  • full information (before the decision) setting

Lecture 12, 13 (Mon Feb 29, Wed Mar 2): Dynamic Pricing
Guest Lecture by Alex Slivkins

Lecture Slides Part I
Lecture Slides Part II

Lecture 10, 11 (Mon Feb 22, Wed Feb 24): Contextual bandits
Guest Lecture by Alekh Agarwal

Lecture Notes part I
Lecture Notes part II

References

Lecture 8, 9(Mon Feb 15, Wed Feb 17): Linear bandits

Lecture notes part I

Lecture Notes Part II (Last edited on Mar 24, 2016)

References
(Adversarial)


(Stochastic)

Lecture 6,7 (Mon Feb 8, Wed Feb 10, 2016): Adversarial bandits
Lecture Notes

References

Lecture 5 (Wednesday Feb 3, 2016): Thompson Sampling
Lecture notes

Lecture 4 (Monday Feb 1, 2016): Thompson Sampling
Lecture Notes

References

Lecture 3 (Wednesday Jan 27, 2016) : UCB algorithm, lower bounds
Lecture notes (Last edited on 1/31/2016)
  • includes UCB algorithm and regret analysis
These lecture notes from Robert Kleinberg's class at Cornell contain a proof of problem-independent lower bound discussed in class.

References:


Lecture 2 (Monday Jan 25, 2016) : UCB algorithm, lower bounds

Lecture Notes (Last edited on 1/31/2016)

References:

Lecture 1 (Wednesday Jan 20, 2016): Introduction

Slides