查看“增强学习-入门导读”的源代码

= 强化学习 =

== 定义 ==
强化学习（Reinforcement Learning）是一种通用的决策框架( decision-making framework)。Agent代理具有采取动作（action）的能力（capacity），每次动作都会影响Agent的未来状态（State），返回一个标量的奖赏信号（reward signal）来量化表示成功与否（success）。强化学习算法的目标（Goal）就是如何采取动作（action）最大化未来的奖赏（future reward）。

== 通用AI ==
深度强化学习（Deep Reinforcement Learning, Deep RL）就是把强化学习RL和深度学习DL的结合起来。用强化学习定义目标，用深度学习给出相应的机制，如Q学习等技术，以实现通用人工智能（General Artificial Intelligence）。

= 研究 =

== 计算机围棋与AlphaGo ==

===多臂赌博机===
* 多臂赌博机（mutiarmed bandit problem）

#Multi-armed bandits with episode context, AMAI 2011.
#Algorithms for Infinitely Many-Armed Bandits, nips 2009.

===蒙特卡洛树搜索===
* 蒙特卡洛树搜索（Monte-Carlo Tree Search）

# Bandit based monte-carlo planning, ECML 2006.
# Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, CG 2006.
# Combining Online and Offline Knowledge in UCT, ICML 2007.
# '''Monte-Carlo tree search and rapid action value estimation in computer Go, Artificial Intelligence, Elsevier 2011.'''

===卷积网络下围棋===
* 卷积网络

# Mimicking Go Experts with Convolutional Neural Networks, ICANN 2008.
# '''Training Deep Convolutional Neural Networks to Play Go, ICML 2015.'''

== 历史性进展 ===

# Achieving Master Level Play in 9 × 9 Computer Go, AAAI 2008.
# The grand challenge of computer Go Monte Carlo tree search and extensions, CACM 2012.
# '''Mastering the game of Go with deep neural networks and tree search, Nature 2016.'''

==计算机游戏==
#Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.

== 神经科学 ==

# Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282.

= 参考资料 =

== 参考教材 ==

# Richard S. Sutton, Andrew Barto, An Introduction to Reinforcement Learning, MIT Press, 1998. [http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html Intro_RL]
# Csaba Szepesvari, Algorithms for Reinforcement Learning, Synthesis lectures on artificial intelligence and machine learning 4, no. 1, pp.1-103, 2010. [http://www.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf RLAlgsInMDPs]

== 参考课程 ==

UC Berkeley CS 294: Deep Reinforcement Learning,  [http://rll.berkeley.edu/deeprlcourse/ Deep RL]