2017年3月18日 (六) 02:19的版本

强化学习

定义

强化学习（Reinforcement Learning）是一种通用的决策框架( decision-making framework)。Agent代理具有采取动作（action）的能力（capacity），每次动作都会影响Agent的未来状态（State），返回一个标量的奖赏信号（reward signal）来量化表示成功与否（success）。强化学习算法的目标（Goal）就是如何采取动作（action）最大化未来的奖赏（future reward）。

与通用AI的关系

深度强化学习（Deep Reinforcement Learning, Deep RL）就是把强化学习RL和深度学习DL的结合起来。用强化学习定义目标，用深度学习给出相应的机制，如Q学习等技术，以实现通用人工智能（General Artificial Intelligence）。

研究

计算机围棋与AlphaGo

多臂赌博机（mutiarmed bandit problem）

Multi-armed bandits with episode context, AMAI 2011.
Algorithms for Infinitely Many-Armed Bandits, nips 2009.

蒙特卡洛树搜索（Monte-Carlo Tree Search）

Bandit based monte-carlo planning, ECML 2006.
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, CG 2006.
Combining Online and Offline Knowledge in UCT, ICML 2007.
Monte-Carlo tree search and rapid action value estimation in computer Go, Artificial Intelligence, Elsevier 2011.

神经网络

Mimicking Go Experts with Convolutional Neural Networks, ICANN 2008.
Training Deep Convolutional Neural Networks to Play Go, ICML 2015.

进展

Achieving Master Level Play in 9 × 9 Computer Go, AAAI 2008.
The grand challenge of computer Go Monte Carlo tree search and extensions, CACM 2012.
Mastering the game of Go with deep neural networks and tree search, Nature 2016.

计算机游戏

Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.

神经科学

Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282.

参考资料

参考教材

Richard S. Sutton, Andrew Barto, An Introduction to Reinforcement Learning, MIT Press, 1998. Intro_RL
Csaba Szepesvari, Algorithms for Reinforcement Learning, Synthesis lectures on artificial intelligence and machine learning 4, no. 1, pp.1-103, 2010. RLAlgsInMDPs

参考课程

UC Berkeley CS 294: Deep Reinforcement Learning, Deep RL

@@ 第1行： / 第1行： @@
-== 教材 ==
+= 强化学习 =
-# Richard S. Sutton, Andrew Barto, An Introduction to Reinforcement Learning, MIT Press, 1998. [http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html Intro_RL]
+== 定义 ==
-# Csaba Szepesvari, Algorithms for Reinforcement Learning, Synthesis lectures on artificial intelligence and machine learning 4, no. 1, pp.1-103, 2010. [http://www.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf RLAlgsInMDPs]
+强化学习（Reinforcement Learning）是一种通用的决策框架( decision-making framework)。Agent代理具有采取动作（action）的能力（capacity），每次动作都会影响Agent的未来状态（State），返回一个标量的奖赏信号（reward signal）来量化表示成功与否（success）。强化学习算法的目标（Goal）就是如何采取动作（action）最大化未来的奖赏（future reward）。
-== 研究 ==
+== 与通用AI的关系 ==
+深度强化学习（Deep Reinforcement Learning, Deep RL）就是把强化学习RL和深度学习DL的结合起来。用强化学习定义目标，用深度学习给出相应的机制，如Q学习等技术，以实现通用人工智能（General Artificial Intelligence）。
-=== 计算机围棋与AlphaGo ===
+= 研究 =
+== 计算机围棋与AlphaGo ==
 * 多臂赌博机（mutiarmed bandit problem）
@@ 第31行： / 第34行： @@
 # '''Mastering the game of Go with deep neural networks and tree search, Nature 2016.'''
-===计算机游戏===
+==计算机游戏==
 #Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.
@@ 第38行： / 第41行： @@
 # Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282.
-==参考课程==
+= 参考资料 =
+== 参考教材 ==
+# Richard S. Sutton, Andrew Barto, An Introduction to Reinforcement Learning, MIT Press, 1998. [http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html Intro_RL]
+# Csaba Szepesvari, Algorithms for Reinforcement Learning, Synthesis lectures on artificial intelligence and machine learning 4, no. 1, pp.1-103, 2010. [http://www.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf RLAlgsInMDPs]
+== 参考课程 ==
 UC Berkeley CS 294: Deep Reinforcement Learning,  [http://rll.berkeley.edu/deeprlcourse/ Deep RL]

“增强学习-入门导读”版本间的差异