“增强学习-入门导读”版本间的差异
来自iCenter Wiki
(→计算机围棋 AlphaGo) |
(→计算机游戏) |
||
第31行: | 第31行: | ||
===计算机游戏=== | ===计算机游戏=== | ||
− | Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533. | + | #Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533. |
− | + | ||
=== 神经科学 === | === 神经科学 === | ||
[1] Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282. | [1] Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282. |
2017年3月16日 (四) 09:57的版本
教材
- Richard S. Sutton, Andrew Barto, An Introduction to Reinforcement Learning, MIT Press, 1998. Intro_RL
- Csaba Szepesvari, Algorithms for Reinforcement Learning, Synthesis lectures on artificial intelligence and machine learning 4, no. 1, pp.1-103, 2010. RLAlgsInMDPs
研究
计算机围棋 AlphaGo
- 多臂赌博机(mutiarmed bandit problem)
- Multi-armed bandits with episode context, AMAI 2011.
- 蒙特卡洛树搜索(Monte-Carlo Tree Search)
- Bandit based monte-carlo planning, ECML 2006.
- Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, CG 2006.
- Combining Online and Offline Knowledge in UCT, ICML 2007.
- Monte-Carlo tree search and rapid action value estimation in computer Go, Artificial Intelligence, Elsevier 2011.
- 神经网络
- Mimicking Go Experts with Convolutional Neural Networks, ICANN 2008.
- Training Deep Convolutional Neural Networks to Play Go, ICML 2015.
- 进展
- Achieving Master Level Play in 9 × 9 Computer Go, AAAI 2008.
- The grand challenge of computer Go Monte Carlo tree search and extensions, CACM 2012.
- Mastering the game of Go with deep neural networks and tree search, Nature 2016.
计算机游戏
- Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.
神经科学
[1] Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282.