正文
这篇论文的第一和通讯作者是DeepMind的David Silver博士, 阿法狗项目负责人。他介绍说阿法元远比阿法狗强大,因为它不再被人类认知所局限,而能够发现新知识,发展新策略:
This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself. AlphaGo Zero also discovered new knowledge, developing unconventional strategies and creative new moves that echoed and surpassed the novel techniques it played in the games against Lee Sedol and Ke Jie.
DeepMind联合创始人和CEO则说这一新技术能够用于解决诸如蛋白质折叠和新材料开发这样的重要问题:
AlphaGo Zero is now the strongest version of our program and shows how much progress we can make even with less computing power and zero use of human data. Ultimately we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems like protein folding or designing new materials.
美国的两位棋手在Nature对阿法元的棋局做了点评:它的开局和收官和专业棋手的下法并无区别,人类几千年的智慧结晶,看起来并非全错。但是中盘看起来则非常诡异:
the AI’s open¬ing choices and end-game methods have converged on ours — seeing it arrive at our sequences from first principles suggests that we haven’t been on entirely the wrong track. By contrast, some of its middle-game judgements are truly mysterious.
为更深入了解阿法元的技术细节,我们采访了美国杜克大学人工智能专家陈怡然教授。他介绍说:
DeepMind最新推出的AlphaGo Zero降低了训练复杂度,摆脱了对人类标注样本(人类历史棋局)的依赖,让深度学习用于复杂决策更加方便可行。我个人觉得最有趣的是证明了人类经验由于样本空间大小的限制,往往都收敛于局部最优而不自知(或无法发现),
而机器学习可以突破这个限制。之前大家隐隐约约觉得应该如此,而现在是铁的量化事实摆
在面前!
他进一步解释道:
这篇论文数据显示学习人类选手的下法虽然能在训练之初获得较好的棋力,但在训练后期所能达到的棋力却只能与原版的AlphaGo相近,而不学习人类下法的AlphaGo Zero最终却能表现得更好。这或许说明人类的下棋数据将算法导向了局部最优(local optima),而实际更优或者最优的下法与人类的下法存在一些本质的不同,人类实际’误导’了AlphaGo。有趣的是如果AlphaGo Zero放弃学习人类而使用完全随机的初始下法,训练过程也一直朝着收敛的方向进行,而没有产生难以收敛的现象。
阿法元是如何实现无师自通的呢? 杜克大学博士研究生吴春鹏介绍了技术细节:
之前战胜李世石的AlphaGo基本采用了传统增强学习技术再加上深度神经网络DNN完成搭建,而AlphaGo Zero吸取了最新成果做出了重大改进。