更改 - iCenter Wiki

增强学习-入门导读

添加366字节、2019年5月23日 (四) 07:56

== 强化学习定义 ==

强化学习（Reinforcement Learning）是一种通用的决策框架( decision-making framework)。Agent代理具有采取动作（action）的能力（capacity），每次动作都会影响Agent的未来状态（State），返回一个标量的奖赏信号（reward signal）来量化表示成功与否（success）。强化学习算法的目标（Goal）就是如何采取动作（action）最大化未来的奖赏（future 。 Agent代理具有采取动作（action）的能力（capacity），每次动作都会影响Agent的未来状态（State），返回一个标量的奖赏信号（reward signal）来量化表示成功与否（success）。强化学习算法的目标（Goal）就是如何采取动作（action）最大化未来的奖赏（future reward）。

== 强化学习要素 ==

(3) 模型（Model）： Agent的环境的表示。

== 通用人工智能AGI ==

~~== 通用AI ==~~

深度强化学习（Deep Reinforcement Learning, Deep RL）就是把强化学习RL和深度学习DL的结合起来。

用强化学习定义目标，用深度学习给出相应的机制，如Q学习等技术，以实现通用人工智能（Artificial General Intelligence, AGI）。

= 强化学习应用 =

== ~~计算机围棋与阿尔法围棋~~ 计算机围棋 == # Mastering the game of Go with deep neural networks and tree search, nature 2015.# Better Computer Go Player with Neural Network and Long-term Prediction, ICLR 2016.# Pachi: State of the art open source Go program, Advances in computer games, Springer Berlin Heidelberg, 2011.

===多臂赌博机===

#Christopher D. Rosin, Multi-armed bandits with episode context, Annals of Mathematics and Artificial Intelligence, March 2011, Volume 61, Issue 3, pp 203–230 2011.

'''其它论文'''

#Wang, Yizao, Jean-Yves Audibert, and Rémi Munos. "Algorithms for infinitely many-armed bandits." Advances in Neural Information Processing Systems. 2009.

# '''Training Deep Convolutional Neural Networks to Play Go, ICML 2015.'''

在用3千万5dan以上的选手的棋局训练卷积网路，其中机器也会把人类选手下的昏招或者臭招也学会了。但是可以用自我博弈出的棋局数据来训练，这样就可以稀释掉这些昏招。 == 历史性进展 ===

# Achieving Master Level Play in 9 × 9 Computer Go, AAAI 2008.

# The grand challenge of computer Go Monte Carlo tree search and extensions, CACM 2012.

~~# '''Mastering the game of Go with deep neural networks and tree search, Nature 2016.'''~~

=阿尔法围棋=~~计算机游戏==~~#Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.

~~== 神经科学 ==~~# '''Mastering the game of Go with deep neural networks and tree search, Nature 2016.'''#AlphaGo Zero#AlphaZero

=计算机游戏=# ~~Gadagkar~~Mnih, V.Volodymyr, Koray Kavukcuoglu, ~~Puzerey~~David Silver, PAndrei A.Rusu, ~~Chen~~Joel Veness, RMarc G.Bellemare, ~~Baird~~Alex Graves et al. "Human-~~daniel, E~~level control through deep reinforcement learning." Nature 518, ~~Farhang, A., & Goldberg, J~~no. 7540 (~~2016~~2015)~~. Dopamine Neurons Encode Performance Error in Singing Birds. Science, 354(6317), 1278–1282~~: 529-533.

= 参考资料 =

== 参考课程 ==

#UC Berkeley CS 294: Deep Reinforcement Learning, [http://rll.berkeley.edu/deeprlcourse/ Deep RL]

Zhenchen

行政员、管理员

6,105

个编辑