用户名: 密码: 验证码:
基于数据库自学习的中国象棋研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
中国象棋的计算机博弈研究起步较晚,但是发展较快。到目前为止出现了许多优秀的中国象棋软件,如许舜钦及其团队的“ELP”、上海计算机博弈研究所黄晨的“象眼”等。但是这些象棋软件大多数是通过优化数据结构和改进搜索策略等方法提高棋力,虽然也具有较高的博弈水平,但是有自学习能力的却很少。
     本文通过引入数据库存储计算机判断失误的对手着法和局面值,从而使象棋软件能够转变策略,避免再次的判断失误,实现自学习能力。
     象棋博弈树搜索是中国象棋计算机博弈的关键技术之一。但是博弈树的搜索在没有记忆的情况下不能实现学习,而记忆要借助数据库来实现。付强在其论文中应用了数据库记录计算机所走的最好着法和其局面值,并使用加强学习修改局面值以达到学习的目的。
     但是在象棋软件输棋的情况下,主要是由于对对手走棋判断失误造成的,所以记录那些与计算机预想不同的对手着法和展开博弈树后的返回值,再从中选择造成输棋结果的关键着法和值。当下次搜索到相同着法时,将数据库中该着法下的值取出,继续搜索使象棋软件实现策略的转变,达到自学习的目的。
The research of Chinese chess computer game starts later than chess, but develops quickly. So far there are many excellent Chinese chess programs, such as the "ELP" developed by Shun-Chih Hsu and his team, the "Elephant eye" developed by Chen Huang and so on. Most of the existing Chinese chess programs are improved through optimizing the data structure and improving the search strategy. Self-learning programs have rarely appeared.
     This thesis proposes a method that can change strategy and avoid making the same mistakes through searching the database with which it stores the'opponent movement computer has misjudged and the position value. In other words, it has the ability of self-learning.
     Game-tree search is one of the key technologies in Chinese Chess computer game. Game-tree search can achieve self-learning with a database. Fu qiang produced a method which used database to record the best movement and position value, then amended the position value by reinforcement learning to achieve the purpose of study.
     That Chinese Chess software lost game is mainly because of the opponent movement computer has misjudged. In this thesis, the opponent movements which are different from what computer forecasts and the return value of game-tree are recorded and the key movement and the value that cause failure are chosen. When meeting the same movement, it will read the value of the movement and continue to search. It will change the strategy so that it achieves the ability of self-learning.
引文
[1]王小春.PC游戏编程[M].重庆:重庆大学出版社,2002
    [2]徐心和,王骄.中国象棋计算机博弈关键技术分析[J].小型微型计算机系统,2006,27(6):961-965.
    [3]许舜钦.电脑西洋棋和电脑象棋的回顾与前瞻.电脑学刊第二卷第二期.1990.
    [4]付强,陈焕文.中国象棋人机对弈的自学习方法研究,计算机技术与发展,第17卷,第12期,2007.
    [5]付强,陈焕文.基于RL算法的自学习博弈程序设计及实现,长沙理工大学学报,2007.12(4):73-78.
    [6]付强.基于激励学习的中国象棋研究.长沙:长沙理工大学,2006.
    [7]张赜.计算机中国象棋博弈中的二次估值方法及其优化的研究.沈阳:东北大学,2006.
    [8]蒋加伏,陈霭祥,唐贤英.基于知识推理的博弈树搜索算法,计算机工程与应用,2004.1.
    [9]王骄,王涛,罗艳红,等.中国象棋计算机博弈系统评估函数的自适应遗传算法实现[J].东北大学学报:自然科学版,2005,26(10):949-951.
    [10]王洪岩,朱峰.一种基于粗神经网络的中国象棋机器博弈评估实现.2007中国控制与决策学术年会.
    [11]王一非.具有自学习功能的计算机象棋博弈系统的研究与实现.哈尔滨:哈尔滨工程大学,2007年.
    [12]莫建文,林士敏,张顺岚.基于TD强化学习智能博弈程序的设计与实现[J].计算机应用,2004,24(6):287-288.
    [13]高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-91.
    [14]Shi-Jim Yen, Jr-Chang Chen, Tai-Ning Yang, Shun-Chin Hsu, Computer Chinese Chess, ICGA Journal, March 2004.
    [15]Gerald Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, Vol.6, No.2,1994, pp:215-219.
    [16]Gerald Tesauro. Temporal Difference Learning and TD-Gammon. Communications of the ACM, Vol. 38, No.3,1995, pp:58-68.
    [17]Chun-bo Wei. Study and Implementation of Chinese Chess Computer Game System. Master thesis. Kunming University of Science and Technology.2008.
    [18]D. E. Knuth, R. W. Moore. An Analysis of Alpha-Beta Pruning. Artificial Intelligence, Vol.6, No.4, 1975, pp:293-326.
    [19]Henk Mannen, Learning To Play Chess Using Reinforcement Learning With Database Games, Master's Thesis Cognitive Artificial Intelligence Utrecht University,2003.
    [20]Sebastian Thrun, Learning To Play the Game of Chess, Advances in Neural Information Processing Systems 7 G. Tesauro, D. Touretzky, and T. Leen, eds.,1995.
    [21]Zhi-jian Tu. Design and Implementation of Computer Chess. Master thesis. Zhongshan University, 2004.
    [22]C. E. Shannon. Programming a computer for playing chess. Philosophical Magazine, Vol.41, No.7, 1950, pp:256-275.
    [23]T. A. Marsland. A Review of Game-Tree Pruning. ICCA Journal. Vol.9, No.1,2001, pp:3-19.
    [24]A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development,3,1959, pp:210-229.
    [25]Jonathan Baxter, Andrew Tridgell & Lex Weaver, TDLeaf(λ):Combining Temporal Difference Learning with Game-Tree Search.
    [26]Jonathan Baxter, Andrew Tridgell, Lex Weaver. Learning To Play Chess Using Temporal Differences. Machine Learning, Vol.40, No.3,2001, pp:243-263.
    [27]Johannes Fiirnkranz. Recent Advances in Machine Learning and Game Playing.2006.
    [28]Lorenz D, Markovitch S. Derivative evaluation function learning using genetic operators [A] Proceedings of the AAA I Fall Symposium on Games:Planing and Learning [C]. New Carolina,1993. 106-114.
    [29]Szeto C, McDermott D. Chinese Chess and Temporal Difference Learning [EB/OL].1999. http:// zoo. cs. yale. edu/classes/cs490/98-99b/szeto.christopher.szeto/report.html.
    [30]Christopher John Cornish Hellaby Watkins. learning from Delayed Rewards, kings college, thesis Submitted for Ph.D.1989.
    [31]Gillogly. The Technology Chess Program. Artificial Intelligence,3,1972, pp:145-163.
    [32]Thong B.Trinh, Anwer S. Bashi, Nikhil Deshpande, Temporal Difference Learning In Chinese Chess.[EB/OL],1998.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700