更改

增强学习-入门导读

添加366字节2019年5月23日 (四) 07:56
== 强化学习定义 ==
强化学习(Reinforcement Learning)是一种通用的决策框架( decision-making framework)。Agent代理具有采取动作(action)的能力(capacity),每次动作都会影响Agent的未来状态(State),返回一个标量的奖赏信号(reward signal)来量化表示成功与否(success)。强化学习算法的目标(Goal)就是如何采取动作(action)最大化未来的奖赏(future  Agent代理具有采取动作(action)的能力(capacity),每次动作都会影响Agent的未来状态(State),返回一个标量的奖赏信号(reward signal)来量化表示成功与否(success)。 强化学习算法的目标(Goal)就是如何采取动作(action)最大化未来的奖赏(future reward)。
== 强化学习要素 ==
(3) 模型(Model): Agent的环境的表示。
== 通用人工智能AGI ==
== 通用AI ==
深度强化学习(Deep Reinforcement Learning, Deep RL)就是把强化学习RL和深度学习DL的结合起来。
用强化学习定义目标,用深度学习给出相应的机制,如Q学习等技术,以实现通用人工智能(Artificial General Intelligence, AGI)。
= 强化学习应用 =
== 计算机围棋与阿尔法围棋 计算机围棋 == # Mastering the game of Go with deep neural networks and tree search, nature 2015.# Better Computer Go Player with Neural Network and Long-term Prediction, ICLR 2016.# Pachi: State of the art open source Go program, Advances in computer games, Springer Berlin Heidelberg, 2011.
===多臂赌博机===
#Christopher D. Rosin, Multi-armed bandits with episode context, Annals of Mathematics and Artificial Intelligence, March 2011, Volume 61, Issue 3, pp 203–230 2011.
'''其它论文'''
#Wang, Yizao, Jean-Yves Audibert, and Rémi Munos. "Algorithms for infinitely many-armed bandits." Advances in Neural Information Processing Systems. 2009.
# '''Training Deep Convolutional Neural Networks to Play Go, ICML 2015.'''
在用3千万5dan以上的选手的棋局训练卷积网路,其中机器也会把人类选手下的昏招或者臭招也学会了。但是可以用自我博弈出的棋局数据来训练,这样就可以稀释掉这些昏招。 == 历史性进展 ===
# Achieving Master Level Play in 9 × 9 Computer Go, AAAI 2008.
# The grand challenge of computer Go Monte Carlo tree search and extensions, CACM 2012.
# '''Mastering the game of Go with deep neural networks and tree search, Nature 2016.'''
=阿尔法围棋=计算机游戏==#Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.
== 神经科学 ==# '''Mastering the game of Go with deep neural networks and tree search, Nature 2016.'''#AlphaGo Zero#AlphaZero
=计算机游戏=# GadagkarMnih, V.Volodymyr, Koray Kavukcuoglu, PuzereyDavid Silver, PAndrei A.Rusu, ChenJoel Veness, RMarc G.Bellemare, BairdAlex Graves et al. "Human-daniel, Elevel control through deep reinforcement learning." Nature 518, Farhang, A., & Goldberg, Jno. 7540 (20162015). Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282: 529-533.
= 参考资料 =
== 参考课程 ==
#UC Berkeley CS 294: Deep Reinforcement Learning, [http://rll.berkeley.edu/deeprlcourse/ Deep RL]
行政员管理员
6,105
个编辑