基于Q-learning的不确定环境BDI Agent最优策略规划研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于Q-learning的不确定环境BDI Agent最优策略规划研究

详细信息查看全文 | 推荐本文 |

英文篇名：Optimal strategy planning of BDI agent based on Q-learning in uncertain environments
作者：万谦 ; 刘玮 ; 徐龙龙 ; 郭竞知
英文作者：WAN Qian;LIU Wei;XU Long-long;GUO Jing-zhi;School of Computer Science and Engineering,Wuhan Institute of Technology;Hubei Provincial Key Laboratory of Intelligent Robot;
关键词：BDI ; Agent ; 强化学习 ; Q-learning ; ASL ; Jason ; 规划
英文关键词：BDI agent;;reinforcement learning;;Q-learning;;ASL;;Jason;;planning
中文刊名：JSJK
英文刊名：Computer Engineering & Science
机构：武汉工程大学计算机科学与工程学院;智能机器人湖北省重点实验室;
出版日期：2019-01-15
出版单位：计算机工程与科学
年：2019
期：v.41;No.289
基金：国家自然科学基金(61502355);; 武汉工程大学第九届研究生教育创新基金(CX2017068)
语种：中文;
页：JSJK201901023
页数：7
CN：01
ISSN：43-1258/TP
分类号：170-176

摘要

BDI模型能够很好地解决在特定环境下的Agent的推理和决策问题,但在动态和不确定环境下缺少决策和学习的能力。强化学习解决了Agent在未知环境下的决策问题,却缺少BDI模型中的规则描述和逻辑推理。针对BDI在未知和动态环境下的策略规划问题,提出基于强化学习Q-learning算法来实现BDI Agent学习和规划的方法,并针对BDI的实现模型ASL的决策机制做出了改进,最后在ASL的仿真平台Jason上建立了迷宫的仿真,仿真实验表明,在加入Q-learning学习机制后的新的ASL系统中,Agent在不确定环境下依然可以完成任务。
The belief-desire-intention(BDI)model can solve the problem of reasoning and decisionmaking of agents in a particular environment,but lacks the ability of decision-making and learning in dynamic and uncertain environments.Reinforcement learning solves the decision-making problem of agent in unknown environments,but lacks the rule description and logical reasoning of the BDI model.Aiming at the strategic planning problem of the BDI in the unknown and dynamic environment,we propose an optimal strategy planning method based on Q-learning algorithm of reinforcement learning.And we make improvement for the decision-making mechanism on the implementation model of the BDI—agent speak language(ASL).Finally,the simulation of the maze on the ASL simulation platform Jason proves the feasibility of this method,and the new agent model can fulfill tasks in uncertain environments.

引文

[1]Petrie C.Agent-based software engineering[M]∥Agent-oriented software engineering.Berlin:Springer-Verlag,2001:59-75.
    [2]Yan Yue-jin,Li Zhou-jun,Chen Yue-xin.Multi-agent system architecture[J].Computer Science,2001,28(5):77-80.(in Chinese)
    [3]Chen Mei,Hu Xiao-hui.BDI Agent action planning mechanism based on reinforcement learning[J].Computer Engineering and Design,2011,32(3):1043-1046.(in Chinese)
    [4]Yang Fang-qiong.Multi-sensor information fusion for positioning and navigation for mobile robot[D].Changsha:Central South University,2010.(in Chinese)
    [5]Ancona D,Mascardi V.Coo-BDI:Extending the BDI model with cooperativity[C]∥Proc of International Workshop on Declarative Agent Languages and Technologies,2003:109-134.
    [6]Mcgeary F,Decker K.Modeling a virtual food court using DECAF[C]∥Proc of the 2nd International Workshop on Multi-Agent-Based Simulation,2000:68-81.
    [7]Burgemeestre B C,Hulstijn J,Tan Y H.Towards an architecture for self-regulating agents:A case study in international trade[C]∥Proc of the 5th International Conference on Coordination,Organizations,Institutions and Norms in Agent Systems V,2010:320-333.
    [8]Pokahr A,Braubach L,Lamersdorf W.Jadex:A BDI reasoning engine[M]∥Multi-Agent Programming.New York:Springer US,2005:149-174.
    [9]Bordini R H,Hübner J F,Wooldridge M.Programming multi-agent systems in AgentSpeak using Jason[M]∥Chichester:Wiley Publishing,2008.
    [10]Sutton R S,Barto A G.Reinforcement learning:An introduction,bradford book[J].IEEE Transactions on Neural Networks,2005,16(1):285-286.
    [11]Schwartz H M.Multi-agent machine learning:A reinforcement approach[J].Journal of Cellence,2014,103(6):989-998.
    [12]Watkins C J C H,Dayan P.Technical note:Q-learning[J].Machine Learning,1992,8(3-4):279-292.
    [13]Xu Shuang,Jia Yun-de.Intention tracking based reinforcement learning agent model[J].Journal of Beijing Institute of Technology,2004,24(8):679-682.(in Chinese)
    [14]Liu Xin-yu,Hong Bing-rong.A multiagent dynamic cooperation model based on BDI framework and its application[J].Journal of Computer Research and Development,2002,39(7):797-801.(in Chinese)
    [15]Rabinowitz N C,Perbet F,Song H F,et al.Machine theory of mind[EB/OL].[2018-05-17].https://arxiv.org/abs/1802.07740.
    [16]Feliu J L.Use of reinforcement learning(RL)for plan generation in belief-desire-intention(BDI)agent systems[D].US:University of Rhode Island,2013.
    [17]Broekens J,Hindriks K,Wiggers P.Reinforcement learning as heuristic for action-rule preferences[C]∥Proc of the 8th International Conference on Programming Multi-Agent Systems,2010:25-40.
    [18]Badica A,Badica C,Ivanovic M,et al.An approach of temporal difference learning using agent-oriented programming[C]∥Proc of IEEE International Conference on Control Systems and Computer Science,2015:735-742.
    [19]Li G,Whiteson S,Knox W B,et al.Social interaction for efficient agent learning from human reward[J].Autonomous Agents and Multi-Agent Systems,2018,32(1):1-25.
    [20]Guo Yan.The research and development of agent-based modeling approach[EB/OL].[2010-03-29].http://www.paper.edu.cn/releasepaper/content/201003-982.(in Chinese)
    [21]Morreale V,Bonura S,Francaviglia G.Goal-oriented development of BDI[C]∥Proc of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology,2006:71-72.
    [22]Jason:A java-based interpreter for an extended version of AgentSpeak[EB/OL].[2018-05-17].http://jason.sourceforge.net/.
    [23]Habib A,Khan M I,Jia U.Optimal route selection in complex multi-stage supply chain networks using SARSA(λ)[C]∥Proc of IEEE International Conference on Computer and Information Technology,2017:170-175.
    [24]Píbil R,Novák P,Brom C,et al.Notes on pragmatic agentprogramming with Jason[C]∥Proc of the 9th International Workshop on Programming Multi-Agent Systems,2011:58-73.
    [25]Reinforcement learning through asynchronous advantage actor-critic on a GPU[EB/OL].[2018-05-17].http://cn.arxiv.org/abs/1611.06256.
    [26]Hong Chang-hao.Research on multi-agent rescue simulation system[D].Harbin:Harbin Engineering University,2011.(in Chinese)
    [2]颜跃进,李舟军,陈跃新.多Agent系统体系结构[J].计算机科学,2001,28(5):77-80.
    [3]陈梅,胡晓辉.基于加强学习的BDI Agent动作规划机制[J].计算机工程与设计,2011,32(3):1043-1046.
    [4]杨放琼.基于信息融合的移动机器人定位导航及其深海采矿应用研究[D].长沙:中南大学,2010.
    [13]续爽,贾云得.一种基于意图跟踪和强化学习的agent模型[J].北京理工大学学报,2004,24(8):679-682.
    [14]刘新宇,洪炳镕.基于BDI框架的多Agent动态协作模型与应用研究[J].计算机研究与发展,2002,39(7):797-801.
    [20]郭雁.基于Agent的建模方法的研究与开发[EB/OL].[2010-03-29].http://www.paper.edu.cn/releasepaper/content/201003-982.
    [26]洪长昊.多智能体救援仿真系统研究[D].哈尔滨:哈尔滨工程大学,2011.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700