摘要
BDI模型能够很好地解决在特定环境下的Agent的推理和决策问题,但在动态和不确定环境下缺少决策和学习的能力。强化学习解决了Agent在未知环境下的决策问题,却缺少BDI模型中的规则描述和逻辑推理。针对BDI在未知和动态环境下的策略规划问题,提出基于强化学习Q-learning算法来实现BDI Agent学习和规划的方法,并针对BDI的实现模型ASL的决策机制做出了改进,最后在ASL的仿真平台Jason上建立了迷宫的仿真,仿真实验表明,在加入Q-learning学习机制后的新的ASL系统中,Agent在不确定环境下依然可以完成任务。
The belief-desire-intention(BDI)model can solve the problem of reasoning and decisionmaking of agents in a particular environment,but lacks the ability of decision-making and learning in dynamic and uncertain environments.Reinforcement learning solves the decision-making problem of agent in unknown environments,but lacks the rule description and logical reasoning of the BDI model.Aiming at the strategic planning problem of the BDI in the unknown and dynamic environment,we propose an optimal strategy planning method based on Q-learning algorithm of reinforcement learning.And we make improvement for the decision-making mechanism on the implementation model of the BDI—agent speak language(ASL).Finally,the simulation of the maze on the ASL simulation platform Jason proves the feasibility of this method,and the new agent model can fulfill tasks in uncertain environments.
引文
[1]Petrie C.Agent-based software engineering[M]∥Agent-oriented software engineering.Berlin:Springer-Verlag,2001:59-75.
[2]Yan Yue-jin,Li Zhou-jun,Chen Yue-xin.Multi-agent system architecture[J].Computer Science,2001,28(5):77-80.(in Chinese)
[3]Chen Mei,Hu Xiao-hui.BDI Agent action planning mechanism based on reinforcement learning[J].Computer Engineering and Design,2011,32(3):1043-1046.(in Chinese)
[4]Yang Fang-qiong.Multi-sensor information fusion for positioning and navigation for mobile robot[D].Changsha:Central South University,2010.(in Chinese)
[5]Ancona D,Mascardi V.Coo-BDI:Extending the BDI model with cooperativity[C]∥Proc of International Workshop on Declarative Agent Languages and Technologies,2003:109-134.
[6]Mcgeary F,Decker K.Modeling a virtual food court using DECAF[C]∥Proc of the 2nd International Workshop on Multi-Agent-Based Simulation,2000:68-81.
[7]Burgemeestre B C,Hulstijn J,Tan Y H.Towards an architecture for self-regulating agents:A case study in international trade[C]∥Proc of the 5th International Conference on Coordination,Organizations,Institutions and Norms in Agent Systems V,2010:320-333.
[8]Pokahr A,Braubach L,Lamersdorf W.Jadex:A BDI reasoning engine[M]∥Multi-Agent Programming.New York:Springer US,2005:149-174.
[9]Bordini R H,Hübner J F,Wooldridge M.Programming multi-agent systems in AgentSpeak using Jason[M]∥Chichester:Wiley Publishing,2008.
[10]Sutton R S,Barto A G.Reinforcement learning:An introduction,bradford book[J].IEEE Transactions on Neural Networks,2005,16(1):285-286.
[11]Schwartz H M.Multi-agent machine learning:A reinforcement approach[J].Journal of Cellence,2014,103(6):989-998.
[12]Watkins C J C H,Dayan P.Technical note:Q-learning[J].Machine Learning,1992,8(3-4):279-292.
[13]Xu Shuang,Jia Yun-de.Intention tracking based reinforcement learning agent model[J].Journal of Beijing Institute of Technology,2004,24(8):679-682.(in Chinese)
[14]Liu Xin-yu,Hong Bing-rong.A multiagent dynamic cooperation model based on BDI framework and its application[J].Journal of Computer Research and Development,2002,39(7):797-801.(in Chinese)
[15]Rabinowitz N C,Perbet F,Song H F,et al.Machine theory of mind[EB/OL].[2018-05-17].https://arxiv.org/abs/1802.07740.
[16]Feliu J L.Use of reinforcement learning(RL)for plan generation in belief-desire-intention(BDI)agent systems[D].US:University of Rhode Island,2013.
[17]Broekens J,Hindriks K,Wiggers P.Reinforcement learning as heuristic for action-rule preferences[C]∥Proc of the 8th International Conference on Programming Multi-Agent Systems,2010:25-40.
[18]Badica A,Badica C,Ivanovic M,et al.An approach of temporal difference learning using agent-oriented programming[C]∥Proc of IEEE International Conference on Control Systems and Computer Science,2015:735-742.
[19]Li G,Whiteson S,Knox W B,et al.Social interaction for efficient agent learning from human reward[J].Autonomous Agents and Multi-Agent Systems,2018,32(1):1-25.
[20]Guo Yan.The research and development of agent-based modeling approach[EB/OL].[2010-03-29].http://www.paper.edu.cn/releasepaper/content/201003-982.(in Chinese)
[21]Morreale V,Bonura S,Francaviglia G.Goal-oriented development of BDI[C]∥Proc of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology,2006:71-72.
[22]Jason:A java-based interpreter for an extended version of AgentSpeak[EB/OL].[2018-05-17].http://jason.sourceforge.net/.
[23]Habib A,Khan M I,Jia U.Optimal route selection in complex multi-stage supply chain networks using SARSA(λ)[C]∥Proc of IEEE International Conference on Computer and Information Technology,2017:170-175.
[24]Píbil R,Novák P,Brom C,et al.Notes on pragmatic agentprogramming with Jason[C]∥Proc of the 9th International Workshop on Programming Multi-Agent Systems,2011:58-73.
[25]Reinforcement learning through asynchronous advantage actor-critic on a GPU[EB/OL].[2018-05-17].http://cn.arxiv.org/abs/1611.06256.
[26]Hong Chang-hao.Research on multi-agent rescue simulation system[D].Harbin:Harbin Engineering University,2011.(in Chinese)
[2]颜跃进,李舟军,陈跃新.多Agent系统体系结构[J].计算机科学,2001,28(5):77-80.
[3]陈梅,胡晓辉.基于加强学习的BDI Agent动作规划机制[J].计算机工程与设计,2011,32(3):1043-1046.
[4]杨放琼.基于信息融合的移动机器人定位导航及其深海采矿应用研究[D].长沙:中南大学,2010.
[13]续爽,贾云得.一种基于意图跟踪和强化学习的agent模型[J].北京理工大学学报,2004,24(8):679-682.
[14]刘新宇,洪炳镕.基于BDI框架的多Agent动态协作模型与应用研究[J].计算机研究与发展,2002,39(7):797-801.
[20]郭雁.基于Agent的建模方法的研究与开发[EB/OL].[2010-03-29].http://www.paper.edu.cn/releasepaper/content/201003-982.
[26]洪长昊.多智能体救援仿真系统研究[D].哈尔滨:哈尔滨工程大学,2011.