“增强学习-入门导读”版本间的差异

2017年3月16日 (四) 09:57的版本

教材

Richard S. Sutton, Andrew Barto, An Introduction to Reinforcement Learning, MIT Press, 1998. Intro_RL
Csaba Szepesvari, Algorithms for Reinforcement Learning, Synthesis lectures on artificial intelligence and machine learning 4, no. 1, pp.1-103, 2010. RLAlgsInMDPs

研究

计算机围棋 AlphaGo

多臂赌博机（mutiarmed bandit problem）

Multi-armed bandits with episode context, AMAI 2011.

蒙特卡洛树搜索（Monte-Carlo Tree Search）

Bandit based monte-carlo planning, ECML 2006.
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, CG 2006.
Combining Online and Offline Knowledge in UCT, ICML 2007.
Monte-Carlo tree search and rapid action value estimation in computer Go, Artificial Intelligence, Elsevier 2011.

神经网络

Mimicking Go Experts with Convolutional Neural Networks, ICANN 2008.
Training Deep Convolutional Neural Networks to Play Go, ICML 2015.

进展

Achieving Master Level Play in 9 × 9 Computer Go, AAAI 2008.
The grand challenge of computer Go Monte Carlo tree search and extensions, CACM 2012.
Mastering the game of Go with deep neural networks and tree search, Nature 2016.

计算机游戏

Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.

神经科学

[1] Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282.

@@ 第7行： / 第7行： @@
 === 计算机围棋 AlphaGo ===
+* 多臂赌博机（mutiarmed bandit problem）
+#Multi-armed bandits with episode context, AMAI 2011.
 * 蒙特卡洛树搜索（Monte-Carlo Tree Search）

“增强学习-入门导读”版本间的差异

2017年3月16日 (四) 09:57的版本

目录

教材

研究

计算机围棋 AlphaGo

计算机游戏

神经科学

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具