“增强学习-入门导读”版本间的差异
来自iCenter Wiki
(→神经科学) |
|||
第1行: | 第1行: | ||
− | = | + | = 强化学习 = |
− | + | == 定义 == | |
− | + | 强化学习(Reinforcement Learning)是一种通用的决策框架( decision-making framework)。Agent代理具有采取动作(action)的能力(capacity),每次动作都会影响Agent的未来状态(State),返回一个标量的奖赏信号(reward signal)来量化表示成功与否(success)。强化学习算法的目标(Goal)就是如何采取动作(action)最大化未来的奖赏(future reward)。 | |
− | == | + | == 与通用AI的关系 == |
+ | 深度强化学习(Deep Reinforcement Learning, Deep RL)就是把强化学习RL和深度学习DL的结合起来。用强化学习定义目标,用深度学习给出相应的机制,如Q学习等技术,以实现通用人工智能(General Artificial Intelligence)。 | ||
− | === | + | = 研究 = |
+ | |||
+ | == 计算机围棋与AlphaGo == | ||
* 多臂赌博机(mutiarmed bandit problem) | * 多臂赌博机(mutiarmed bandit problem) | ||
第31行: | 第34行: | ||
# '''Mastering the game of Go with deep neural networks and tree search, Nature 2016.''' | # '''Mastering the game of Go with deep neural networks and tree search, Nature 2016.''' | ||
− | + | ==计算机游戏== | |
#Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533. | #Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533. | ||
第38行: | 第41行: | ||
# Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282. | # Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282. | ||
− | ==参考课程== | + | = 参考资料 = |
+ | |||
+ | == 参考教材 == | ||
+ | |||
+ | # Richard S. Sutton, Andrew Barto, An Introduction to Reinforcement Learning, MIT Press, 1998. [http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html Intro_RL] | ||
+ | # Csaba Szepesvari, Algorithms for Reinforcement Learning, Synthesis lectures on artificial intelligence and machine learning 4, no. 1, pp.1-103, 2010. [http://www.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf RLAlgsInMDPs] | ||
+ | |||
+ | == 参考课程 == | ||
UC Berkeley CS 294: Deep Reinforcement Learning, [http://rll.berkeley.edu/deeprlcourse/ Deep RL] | UC Berkeley CS 294: Deep Reinforcement Learning, [http://rll.berkeley.edu/deeprlcourse/ Deep RL] |
2017年3月18日 (六) 02:19的版本
强化学习
定义
强化学习(Reinforcement Learning)是一种通用的决策框架( decision-making framework)。Agent代理具有采取动作(action)的能力(capacity),每次动作都会影响Agent的未来状态(State),返回一个标量的奖赏信号(reward signal)来量化表示成功与否(success)。强化学习算法的目标(Goal)就是如何采取动作(action)最大化未来的奖赏(future reward)。
与通用AI的关系
深度强化学习(Deep Reinforcement Learning, Deep RL)就是把强化学习RL和深度学习DL的结合起来。用强化学习定义目标,用深度学习给出相应的机制,如Q学习等技术,以实现通用人工智能(General Artificial Intelligence)。
研究
计算机围棋与AlphaGo
- 多臂赌博机(mutiarmed bandit problem)
- Multi-armed bandits with episode context, AMAI 2011.
- Algorithms for Infinitely Many-Armed Bandits, nips 2009.
- 蒙特卡洛树搜索(Monte-Carlo Tree Search)
- Bandit based monte-carlo planning, ECML 2006.
- Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, CG 2006.
- Combining Online and Offline Knowledge in UCT, ICML 2007.
- Monte-Carlo tree search and rapid action value estimation in computer Go, Artificial Intelligence, Elsevier 2011.
- 神经网络
- Mimicking Go Experts with Convolutional Neural Networks, ICANN 2008.
- Training Deep Convolutional Neural Networks to Play Go, ICML 2015.
- 进展
- Achieving Master Level Play in 9 × 9 Computer Go, AAAI 2008.
- The grand challenge of computer Go Monte Carlo tree search and extensions, CACM 2012.
- Mastering the game of Go with deep neural networks and tree search, Nature 2016.
计算机游戏
- Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.
神经科学
- Gadagkar, V., Puzerey, P., Chen, R., Baird-daniel, E., Farhang, A., & Goldberg, J. (2016). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282.
参考资料
参考教材
- Richard S. Sutton, Andrew Barto, An Introduction to Reinforcement Learning, MIT Press, 1998. Intro_RL
- Csaba Szepesvari, Algorithms for Reinforcement Learning, Synthesis lectures on artificial intelligence and machine learning 4, no. 1, pp.1-103, 2010. RLAlgsInMDPs
参考课程
UC Berkeley CS 294: Deep Reinforcement Learning, Deep RL