用户名: 密码: 验证码:
控制系统的学习和优化:马尔可夫性能势理论与方法
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文采用性能势理论和方法,研究了动态控制系统的学习和优化的问题。性能势理论是学习和优化领域相当重要的一套理论和方法。基于性能势这一核心概念,可以将学习和优化领域的各种研究内容和成果放到统一的框架中。进一步,还可以提出相当多的理论和算法。传统的最优控制方法,只能处理比较简单的,或者比较特殊的情况,而对于一般化的问题难以求解。将学习和优化领域的各种理论与方法应用到这些问题上,可以得到相当多的用传统方法无法得到的成果。
     本文首先将马尔可夫性能势理论扩展到连续的状态空间。成功的建立起动态系统和马尔可夫系统之间的联系。基于此,导出了动态控制系统的性能势表达式。在有了性能势这个核心概念以后,学习与优化领域的各种方法,如策略迭代方法、强化学习方法都可以成功的运用到动态控制系统中,以寻找最优的反馈控制策略。性能势理论和方法的优势,在于重新发掘了系统结构信息,并且很容易设计出在线学习的优化算法。
     本文重点考虑了在三类系统中,性能势理论和方法的应用。分别是跳变线性二次(JLQ)系统的分层控制问题,基于事件的控制问题和带有约束的最优控制问题。对每一类问题,应用马尔可夫模型建模,将原问题转化为等价马尔可夫决策过程的优化问题。应用性能势的概念,可以发现一些有用的信息。针对跳变线性二次模型的上层优化问题,我们提出了模态的性能势表达式,由此可以求解传统方法无法处理的JLQ系统的分层控制问题。采用时间集结的思路,首次给出了Lebesgue采样系统的最优控制模型,并提出解析的和基于样本路径的算法,同时可以将时间集结的想法应用于熔炉加热过程这一工程系统中。采用性能梯度方法,研究了带有约束的最优控制问题,提出了在线的学习优化算法。
The thesis considers the learning and optimization problem of dynamic control systems,by using the performance potential theory and approaches.Based on the core concept of potential,lots of research directions and results on learning and optimization can be unified.New theory and algorithms can be developed from the viewpoint of potential.The traditional approaches of optimal control problems can only handle special systems.For general cases,there is no simple way to address them.We apply the theory in the field of learning and optimization to the optimal control problems,and acquire some important results that can not be obtained by traditional approaches.
     Firstly,we extend the potential theory to continuous state space,to build a connection between dynamic systems and Markov systems.Secondly,we derive the performance potential of dynamic control systems.After having the core concept of potential, the approaches on learning and optimization,e.g.policy iteration and reinforcement learning,can be applied successfully to control problems to find the optimal control policy.The potential theory has two advantages:it retrieves structure information for optimization,and it is easy to design on-line learning algorithms.
     The thesis considers three classes of problems:two-level control problem of jump linear quadratic(JLQ) model,event-based control problem and constrained control problem.We formulate each class with the Markv model,and construct the equivalent Markov decision process to optimize system performance.We propose the potential of high-level modes for JLQ system,and solve the two-level control problem.With time aggregation,we formulate the optimal control problem of Lebesgue sampling system for the first time,and solve it with both analytical and sample-path-based algorithms. With performance gradient,we provide an on-line learning algorithm for the constrained control problem.
引文
[1]Cao X R.Stochastic learning and optimization - a sensitivity based approach.New York:Springer,2007.
    [2]Cassandras C,Lafortune S.Introduction to discrete event systems.Boston:Kluwer Academic Publishers,1999.
    [3]汪自勤,宋文忠,冯纯伯.离散事件动态系统的分析和优化-排队网络模型方法(上).信息与控制,1989,18(6):31-40.
    [4]汪自勤,宋文忠,冯纯伯.离散事件动态系统的分析和优化-排队网络模型方法(下).信息与控制,1990,19(1):35-44.
    [5]郑应平.离散事件系统理论研究和应用进展Ⅰ.控制与决策,1996,11(2):233-241.
    [6]郑应平.离散事件系统理论研究和应用进展Ⅱ.控制与决策,1996,11(3):329-333.
    [7]郑人钟,赵千川.离散事件动态系统.北京:清华大学出版社,2001.
    [8]Ho Y C and Cao X R.Perturbation analysis of discrete event systems.Norwell:Kluwer Academic Publishers,1991.
    [9]Glasserman P.Gradient Estimation via Perturbation Analysis.Boston,MA:Kluwer Academic Publishers,1991.
    [10]Cao X R.Realization Probabilities - The Dynamics of Queueing Systems.New York:Springer Verlag,1994.
    [11]Puterman M L.Markov decision processes:discrete stochastic dynamic programming.New York:John Wiley & Sons,1994.
    [12]侯振挺,郭先平.马尔可夫决策过程.长沙:湖南科技出版社,1998.
    [13]胡奇英,刘建庸.马尔可夫决策过程引论.西安:电子科技大学出版社,2000.
    [14]刘克.实用马尔可夫决策过程.北京:清华大学出版社,2004.
    [15]Sutton R S,Barto A G.Reinforcement leaming:an introduction.Cambridge,MA:MIT Press,1998.
    [16]Kaelbling L P,Littman M L,Moore A W.Reinforcement learning:a survey.Journal of Artificial Intelligence Research,1996,4:237-285.
    [17]Wyatt J.Reinforcement learning:a brief overview.In:Bull L and Kovacs Ted.Foundations of learning classifier systems.Springer,2005.
    [18]Bertsekas D P,Tsitsiklis J N.Neuro-dynamic programming.Belmont,Massachusetts:Athena Scientific,1996.
    [19]Roy B V,Bertsekas D P,Lee Y C,Tsitsiklis J N.A neuro-dynamics programming approach to retailer inventory management.In:Proceedings of the 36th Conference on Decision and Control,San Diego,California,USA,1997.
    [20]Roy B V.Learning and value function approximation in complex decision processes.PhD thesis,MIT,Cambridge,MA,1998.
    [21]Roy B V.Neuro-dynamic programming:overview and recent trends.In:Feinberg E A and Shwartz A ed.Handbook of Markov decision processes:methods and applications.Norwell:Kluwer Academic Publishers,2002.
    [22]Chong E K P,Zak S H.An introduction to optimization,2~(nd) Edition.New York:John Wiley & Sons,2001.
    [23]陈翰馥,朱运民.随机逼近.上海科技出版社,1996.
    [24]Kirkpatrick S,Gelatt C,Vecchi P.Optimization by simulated annealing.Science,1983,221:671-680.
    [25]Cerny V.Thermodynamical approach to the traveling salesman problem:an efficient simulation algorithm.Journal of Optimization Theory Applications,1985,45:41-51.
    [26]Metropolis N,Rosenbluth A,Rosenbluth M,Teller A,Teller E.Equation of state calculations by fast computing machines.Journal of Chemical Physics,1953,21:1087-1092.
    [27]Van Laarhoven P J M,Aarts E H L.Simulated annealing:theory and applications.Dordrecht,The Netherlands:D.Reidel Publishing,1987.
    [28]Aarts E H L and Korst J.Simulated annealing and Boltzmann machines:a stochastic approach to combinatorial optimization and neural computing.Chichester,England:John Wiley & Sons,1989.
    [29]Ho Y C,Eyler A,Chien T T.A gradient technique for general buffer-storage design in a serial production line.International Journal of Production Research,1979,17:557-580.
    [30]Ho Y C,Cao X R,Cassandras C G.Infinitesimal and finite perturbation analysis for queue-ing networks.Automatica,1983,19(4):439-445.
    [31]Gong W B,Ho Y C.Smoothed(conditional)perturbation analysis of discrete event dynamical systems.IEEE Transactions on Automatic Control,1987,32:858-866.
    [32]Schruben L W,Cogliano V J.Simulation sensitivity analysis:a frequency domain approach.In:Proceedings of the 1981 Winter Simulation Conference.Piscataway,NJ:IEEE Press,1981:455-459.
    [33]Jacobson S H,Morrice D,Schruben L W.The global simulation clock as the frequency domain experiment index.In:Proceedings of the 1988 Winter Simulation Conference.Piscataway,NJ:IEEE Press,1988:558-563.
    [34]Jacobson S H,Buss A,Schruben L W Driving frequency selection for frequency domain simulation experiments.Operations Research,1991,39:917-924.
    [35]Hazra M M,Mortice D J,Park S K.A simulation clock-based solution to the frequencydomain experiment indexing problem.IIE Transactions,1997,29:769-782.
    [36]Glynn P W.Likelihood ratio gradient estimation:an overview.Commtmications of the Association for Computing Machinery,1990,33:75-84.
    [37]L'Ecuyer R A tmified view of the IPA,SF,and LR gradient estimation techniques.Management Science,1990,36(11):1364-1383.
    [38]Rubinstein R Y.How to optimize discrete-event systems from a single sample path by the score function method.Annals of Operations Research,1991,27:175-212.
    [39]Glynn P W and L'Ecuyer R Likelihood ratio gradient estimation for stochastic recursions.Advances in Applied Probabily,1995,27:1019-1053.
    [40]L'Ecuyer R On the interchange of derivative and expectation for likelihood ratio derivative estimators.Management Science,1995,41:738-748.
    [41]Nakayama M K,Shahabuddin R Likelihood ratio derivative estimation for finite-time performance measures in generalized semi-Markov processes.Management Science,1998,44:1426-1441.
    [42]Box E P G,Draper N R.Empirical model-building and response surfaces.New York:Wiley,1986.
    [43]Myers R H,Khuri A I,Carter W H.Response surface methodology:1966-1988.Technometrics,1989,31:137-157.
    [44]Montgomery D C,Evans D M.Second order response surface designs in computer simulation.Simulation,1975,25:169-178.
    [45]Hopfield J J,Tank D W.'Neural' composition of decisions optimization problems.Biol.Cybern,1985,52:141-152.
    [46]Tagliarini G A,Christ J F,Page E W.Optimization using neural networks.IEEE Transactions on Computers,1991,40(12):1347-1358.
    [47]Kamgar-Parsi B,Kamgar-Parsi B.Hopfield model and optimization problems.Neural networks for perception(Vol.2):computation,learning,architectures,1992:94-110.
    [48]Caparros G,Ruiz M,Hernandez F.Hopfield neural network applied to optimization problems:some theoretical and simulation results.Lecture Notes In Computer Science,1997,1240:556-565.
    [49]丛爽,王怡雯.随机神经网络发展现状综述.控制理论与应用,2004,21(6):975-985.
    [50]Wang L.A hybrid genetic algorithm-neural network strategy for simulation optimization.Applied Mathematics and Computation,2005,170(2):1329-1343.
    [51]Song S J,Li G C,Guan X H.Differential inclusions-based neural networks for non-smooth convex optimization on a closed convex subset.Lecture Notes in Computer Science,2006,3971:350-358.
    [52]韩力群.人工神经网络理论、设计及应用(第二版).化学工业出版社,2007.
    [53]Holland J H.Adaptation in natural and artificial systems.Ann Arbor,MI:Michigan Press,1975.
    [54]Goldberg D.Genetic algorithms in search,optimization,and machine learning.Reading,MA:Addison-Wesley,1989.
    [55]Liepins G E,Hilliard M R.Genetic algorithms:foundations and applications.Annals of Operations Research,1989,21:31-58.
    [56]Davis L.Handbook of genetic algorithms.New York:Nostrand,Reinhold,1991.
    [57]Muhlebein H.Genetic algorithms.In:Aarts E and Lenstra J K(Eds.) Local search in combinatorial optimization.New York:Wiley,1997,137-172.
    [58]Chambers L,ed.Practical handbook of genetic algorithms:applications,volume Ⅰ.Boca Raton,FL:Chapman and Hall/CRC Press,1995.
    [59]Chambers L,ed.Practical handbook of genetic algorithms:new Frontiers,volume Ⅱ.Boca Raton,FL:CRC Press,1995.
    [60]Chambers L,ed.Practical handbook of genetic algorithms:complex coding systems,volume Ⅲ.Boca Raton,FL:CRC Press,1999.
    [61]Schwefel H P.Numerical optimization of computer models.Chichester,UK:Wiley,1981.
    [62]Back T,Hoffmeister F,Schwefel H P.A survey of evolution strategies.In:Proceedings of the Fourth International Conference on Genetic Algorithms,San Diego,CA,1991:2-9.
    [63]Beyer H G.The theory of evolution strategies.Springer,2001.
    [64]Glover F.Tabu search - part Ⅰ.ORSA Journal on Computing,1989,1:190-206.
    [65]Glover F.Tabu search - part Ⅱ.ORSA Journal on Computing,1990,2:4-32.
    [66]Hu N F.Tabu search method with random moves for globally optimal-design.International Journal for Numerical Methods in Engineering,1992,35:1055-1070.
    [67]Glover F,Laguna M.Tabu search.Norwell,MA:Kluwer Academic Publishers,1997.
    [68]Kennedy J,Eberhart R.Particle swarm optimization,In:Proceedings of IEEE International Conference on Neural Networks,1995,4:1942-1948.
    [69]Shi Y,Eberhart R.Parameter selection in particle swarm optimization.Lecture Notes in Computer Science,1998,1447:591-600.
    [70]Parsopoulos K E,Vrahatis M N.Recent approaches to global optimization problems through particle swarm optimization.Natural Computing,2002,1:235-306.
    [71]Dudewicz E J,Dalal S R.Allocation of measurements in ranking and selection with unequal variances.Sankhya,1975,B37:28-78.
    [72]Sullivan D W,Wilson J R.Restricted subset selection procedure for simulation.Operations Research,1989,37:52-67.
    [73]Nelson B L,Swann J,Goldsman D,Song W.Simple procedures for selecting the best simulated system when the number of alternatives is large.Operations Research,2001,49:950-963.
    [74]Hochberg Y,Tamhane A C.Multiple comparison procedures.New York:Wiley,1987.
    [75]Hsu J C.Multiple comparisons:theory and methods.London,England:Chapman & Hall,1996.
    [76]Ho Y C,Sreenivas R,Vakili P.Ordinal optimization of discrete event dynamic systems.Discrete Event Dynamic Systems:Theory and Applications,1992,2:61-88.
    [77]Ho Y C.An explanation of ordinal optimization:soft computing for hard problems.Information Sciences,1999,113:169-192.
    [78]Ho Y C,Cassandras C G,Chen C H,Dai L Y.Ordinal optimization and simulation.Journal of Operations Research Society,2000,51:490-500.
    [79]Lee L H,Lau T W E,and Ho Y C.Explanation of goal softening in ordinal optimization.IEEE Transactions on Automatic Control,1999,44:94-99.
    [80]Chen C H.A lower bound for the correct subset selection probability and its application to discrete event system simulations.IEEE Transactions on Automatic Control,1996,41:1227-1231.
    [81]Lau T W E,Ho Y C.Universal alignment probabilities and subset selection for ordinal optimization.Journal of Optimization Theory and Applications,1997,93:455-489.
    [82]Zhao Q C,Ho Y C,Jia Q S.Vector ordinal optimization.Journal of Optimization Theory and Applications,2005,125(2):259-274.
    [83]Shi L Y,Olafsson S.Nested partitions method for global optimization.Operations Research,2000,48(3):390-407.
    [84]Shi L Y,Olafsson S.Convergence rate of the nested partitions method for stochastic optimization.Methodology and Computing in Applied Probability,2000,2(1):37-58.
    [85]Cao X R,Yuan X M,Qiu L.A single sample path-based performance sensitivity formula for Markov chains.IEEE Transactions on Automatic Control,1996,41(12):1814-1817.
    [86]Cao X R,Chen H F.Perturbation realization,potentials,and sensitivity analysis of Markov processes.IEEE Transactions on Automatic Control,1997,42:1382-1393.
    [87]Guo X P,Hernadez-Lerma O.Continuous-time controlled Markov chains.Ann.Appl.Probab.,2003,13:363-388.
    [88]奚宏生,唐昊,殷保群.连续时间MCP在紧致行动集上的最优策略.自动化学报,2003,29(2):206-211.
    [89]Guo X P,Cao X R.Optimal control of ergodic continuous-time Markov chains with average sample-path rewards.SIAM Journal on Control and Optimization,2005,44(1):29-48.
    [90]Guo X P,Ulirich R.Average optimality for continuous-time Markov decision processes in Polich spaces.Ann.Appl.Probab,2006,16:730-756.
    [91]Cao X R.Semi-Markov decision problems and performance sensitivity analysis.IEEE Transactions on Automatic Control,2003,48(5):758-769.
    [92]Dai G P,Yin B Q,Li Y J,and Xi H S.Performance optimization algorithms based on potentials for semi-Markov control processes.International Journal of Control,2005,78:801-812.
    [93]Cao X R,Guo X P.A unified approach to Markov decision problems and sensitivity analysis with discounted and average criteria:the multichain case.Automatica,2004,40:1749-1759.
    [94]Cao X R,Wan Y W.Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization.IEEE Transactions on Control Systems Technology,1998,6:482-494.
    [95]Marbach P,Tsitsiklis J N.Simulation-based optimization of Markov reward processes.IEEE Transactions on Automatic Control,2001,46(2):191-209.
    [96]Fang H T,Chen H F,Cao X R.Recursive approaches for single sample path based Markov reward processes.Asian Journal of Control,2001,3(1):21-26.
    [97]Fang H T and Cao X R.Potential-based on-line policy iteration algorithms for Markov decision processes.IEEE Transactions on Automatic Control,2004,49:493-505.
    [98]Cao X R.A basic formula for online policy gradient algorithms.IEEE Transactions on Automatic Control,2005,50(5):696-699.
    [99]Cao X R.The potential structure of sample paths and performance sensitivities of Markov systems.IEEE Transactions on Automatic Control,2004,49(12):2129-2142.
    [100]Cao X R,Ren Z Y,Bhatnagar S,Fu M C,Marcus S.A time aggregation approach to Markov decision processes.Automatica,2002,38:929-943.
    [101]Wan Y W,Cao X R.The control of a two-level Markov decision process by time aggregation.Automatica,2006,42:393-403.
    [102]Ren Z Y,Krogh B H.Markov decision processes with fractional costs.IEEE Transactions on Automatic Control,2005,50(5):646-650.
    [103]Sun T,Zhao Q C,Luh P B.Incremental value iteration for time aggregated Markov decision processes.IEEE Transactions on Automatic Control,2007,52:2177-2182.
    [104]Cao X R.Basic ideas for event-based optimization of Markov systems.Discrete Event Dynamic Systems:Theory and Applications,2005,15:169-197.
    [105]Xia L,Cao X R.Aggregation of perturbation realization factors and service rate-based policy iteration for queueing systems.In:Proceedings of the 45th IEEE Conference on Decision and Control,San Diego,CA,USA,2006:1063-1068.
    [106]Cao X R,Zhang J Y.Event-based optimization of Markov systems.IEEE Transactions on Automatic Control,To appear,2008.
    [107]Bradtke S J,Ydstie B E,Barto A G.Adaptive linear quadratic control using policy iteration.Proceedings of American Control Conference,Maryland,USA,1994:3475-3479.
    [108]Hagen S,Krose B,Linear quadratic regulation using reinforcement learning.Proceedings of Belgian-Dutch Conference on Machine Learning,1998:39-46.
    [109]Al-Tamimi A,Lewis F L,Abu-Khalaf M,Model-free Q-learning designs for linear discretetime zero-sum games with application to H-infinity control.Automatica,2007,43:473-481.
    [110]Cao X R.A sensitivity view of Markov decision processes and reinforcement learning.In:Gong W B and Shi L Y(ed.) Modeling,control and optimization of complex systems.Kluwer Academic Publishers,2001.
    [111]Cao X R.From perturbation analysis to Markov decision processes and reinforcement learning.Discrete Event Dynamic Systems:Theory and Applications,2003,13:9-39.
    [112]Watkins C J C H,Dayan P.Q-learning.Machine Learning,1992,8:279-292.
    [113]Tsitsiklis J N.Asynchronous stochastic approximation and Q-learning.Machine Learning,1994,16:185-202.
    [114]Watkins C J C H.Learning from delayed rewards.PhD thesis,Cambridge University,Cambridge,UK,1989.
    [115]Tesauro G J.Temporal difference learning and TD-Gammon.Communications of the ACM,1995,38:58-68.
    [116]Dayan P D,Sejnowski T J.TD(λ) converges with probability 1.Machine Learning,1994,14:295-301.
    [117]Schwartz A,A reinforcement learning method for maximizing undiscounted rewards.In:Proceedings of the Tenth Annual Conference on Machine Learning,1993:298-305.
    [118]Tadepalli P,Ok D,H-learning:A reinforcement learning method to optimize undiscounted average reward.Technical Report,Oregon State Univ.,1994,94-30-01.
    [119]Singh S,Jaakkola T,Littman M L,Szepesvari C.Convergence results for single-step onpolicy reinforcement-learning algorithms.Machine Learning,2000,38:287-308.
    [120]殷保群,周亚平,奚宏生,孙德敏.闭排队网络当性能函数与参数相关时的性能灵敏度分析.控制理论与应用,2002,19(2):311-312.
    [121]周亚平,奚宏生,殷保群,孙德敏.Markov控制过程基于性能势的平均代价最优策略.自动化学报,2002,28(6):904-910.
    [122]李衍杰,殴保群,奚宏生,周亚平,代桂平.半Markov过程基于性能势的灵敏度分析和性能优化.控制理论与应用,2004,21(6):1032-1035.
    [123]Tang H,Xi H S,Yin B Q.A simulation optimization algorithm for CTMDPs based on randomized stationary policies.Acta Automatica Sinica,2004,30(2):229-234.
    [124]蒋兆春,殴保群,李俊.基于耦合技术计算Markov链性能势的仿真算法.系统仿真学报,2007,19(15):3398-3401.
    [125]杨晓辉,张侃健.一类非线性随机系统的在线优化算法.计算技术与自动化,2007,26(4):57-60.
    [126]唐吴,陈栋,周雷,吴玉华.SMDP基于Actor网络的统一NDP方法.控制与决策,2007,22(2):155-159.
    [127]Bryson A E,Jr.,Ho Y C.Applied optimal control.New York:Blaisdell,1969.
    [128]解学书.最优控制:理论与应用.北京:清华大学出版社,1986.
    [129]Hocking L M.Optimal control:an introduction to the theory with applications.New York:Clarendon Press,1991.
    [130]Lewis F L,Syrmos V L.Optimal control,second edition.New York:J.Wiley,1995.
    [131]Vinter R B.Optimal control.Boston:Birkhser,2000.
    [132]王朝珠,秦化淑.最优控制理论.北京:科学出版社,2003.
    [133]Anderson B,Moore J B.Optimal control:linear quadratic methods.Englewood Cliffs,N.J.:Prentice Hall,1990.
    [134]Dorato P,Cerone V,Abdallah C.Linear-quadratic control:an introduction.Simon & Schuster,1994.
    [135]Kushner H J.Probability methods for approximations,In:Stochastic control and elliptic equations,volume 129 of Mathematics in Science and Engineering.New York:Academic Press,1977.
    [136]Landelius T.Reinforcement learning and distributed local model synthesis.PhD thesis,Linkoping University,Sweden,1997.
    [137]Garcia C E,Prett D M,Morari M.Model predictive control:theory and practice—a survey.Automatica,1989,25(3):335-348.
    [138]Qin S J,Badgewell T A.An overview of industrial model predictive control technology.Chemical Process Control,1997,93(316):232-256.
    [139]Camacho E F,Bordons C.Model predictive control.Springer,2004.
    [140]Kolosov G E.Optimal design of control systems:stochastic and deterministic problem.New York:M.Dekker,1999.
    [141]Randal W B,Timothy W M.Sucessive Galerkin approximation algorithm for nonlinear optimal and robust control.Int.J.Contr.,1998,71:717-743.
    [142]Hernandez-Lerma O,Lasserre J B.Policy iteration for average cost Markov control processes on Borel spaces.Acta Appl.Math.,1997,47(2):125-154.
    [143]Meyn S P.The policy iteration algorithm for average reward Markov decision processes with general state space.IEEE Trans.Autom.Contr.,1997,42:1663-1680.
    [144]Zhang K J,Xu Y K,Chen X,Cao X R.Policy iteration based feedback control.Automatica,2008,44:1055-1061.
    [145]Kushner H J,Paul G.Numerical methods for stochastic control problems in continuous time.New York:Springer-Verlag,1992.
    [146]Tsitsiklis J N,van Roy B.Feature-based methods for large scale dynamic programming.Machine Learning,1996,22:59-94.
    [147]Hernandez-Lerma O,Lasserre J B.Discrete-time Markov control processes:basic optimality criteria.New York:Springer,1996.
    [148]Guo X P,Zhu Q X.Average optimality for Markov decision processes in Borel spaces:a new condition and approach.J.Appl.Probab.,2006,43(2):318-334.
    [149]Bertsekas D P.Dynamic programming and optimal control,volumn Ⅱ.Belmont,MA:Athena Scientific,1995.
    [150]Durrett R.Probability:theory and examples,2nd ed.USA:Duxbury Press,1996.
    [151]Meyn S P,Tweedie R L.Markov chains and stochastic stability.London:Springer-Verlag,1993.
    [152]Costa O L V,Fragoso M D,Marques R P.Discrete-time LQ-optimal control problems for infinite Markov jump parameter systems.IEEE Trans.Auto.Contr.,1995,40(12):2076-2088.
    [153]Costa O L V,Fragoso M D,Marques R P.Discrete-time Markov jump linear systems.New York:Springer-Verlag,2005.
    [154]Xu Y K,Chen X,Zhang K J,Cao X R.Learning and feedback control with policy iteration,technical report,2008.
    [155]Abu-Khalaf M,Lewis F L.Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach.Automatica,2005,41:779-791.
    [156]Werbos P J.Approximate dynamic programming for real-time control and neural modeling.In:White D A and Sofge D A(Eds.),Handbook of intelligent control.New York:Van Nostrand Reinhold,1992.
    [157]Lee J H,Lee J M.Approximate dynamic programming based approach to process control and scheduling.Computers and Chemical Engineering,2006,30:1603-1618.
    [158]Si J,Barto A,Powell W,Wunsch D.Handbook of learning and approximate dynamic pro-grammingm.New Jersey:Wiley,2004.
    [159]Boussios C I.An approach for nonlinear control design via approximate dynamic programming.PhD thesis,M.I.T.,Cambridge,MA,1998.
    [160]Munos R,Moore A W Variable resolution discretization for high-accuracy solutions of optimal control problems.Proceedings of the 16th International Joint Conference on Artificial Intelligence(IJCAI),Stockholm Sweden,1999:1348-1355.
    [161]Chmielewski D J,Manousiouthakis V.On constrained infinite-time linear quadratic optimal control with sotchastic distrubances.Journal of Process Control,2005,15:383-391.
    [162]Xu Y K,Chen X,Cao X R.Optimization for controlled jump rates of JLQG problem.Proceedings of Chinese Control Conference,Guangzhou,2005:378-383.
    [163]Xu Y K,Chen X.Discrete-time JLQG model with dependency controlled jump probabilities:an MDP-based approach.Proceedings of IEEE Multi-Conference on System and Control,Singapore,Oct.2007:441-445.
    [164]Xu Y K,Chen X.Optimization of dependency controlled jump rates of JLQG problem,submitted to Asian Journal of Control,2008.
    [165]Boukas E K.Stabilization of stochastic nonlinear hybrid systems.Int.J.Innovative Computing,Information and Control,2005,1:131-141.
    [166]Boukas E K,Xu S,Lam J.On stability and stabilizability of singular stochastic systems with delays.Journal of Optimization Theory and Applications,2005,127(2):249-262.
    [167]Chen W H,Guan Z H,Lu X M.Passive control synthesis for uncertain Markovian jump linear systems with multiple mode-dependent time-delays.Aisa Journal of Control,2005,7(2):135-143.
    [168]Karan M,Shi P,Kaya C Y.Transition probability bounds for the stochastic stability robustness of continuous-and discrete-time Markovian jump linear systems.Automatica,2006,42(12):2159-2168.
    [169]Costa O L V,Fragoso M D.A separation principle for the H_2-control of continuous-time infinite Markov jump linear systems with partial observations.Journal of Mathematical Analysis and Applications,2007,331(1):97-120.
    [170]Kang Y,Xi H,Zhang D,Ji H.A robust adaptive control design for a class of uncertain nonlinear Markovian jump systems.Asia Journal of Control,2007,9(1):73-79.
    [171]Ji Y,Chizeck H J.Controllability,stabilizability,and continuous-time Markovian jump linear quadratic control.IEEE Transactions on Automatic Control,1990,35:777-788.
    [172]Wonham W M.Random differential equations in control theory.In:Bharucha-Reid A T(Eds.),Probabilistic Methods in Applied Mathematics,1971,2:131-213.
    [173]Sethi S P,Zhang Q.Hierarchical decision making in stochastic manufacturing systems.Berlin:Birkhauser,1994.
    [174]Imer O C,Yuksel S,Basar T.Optimal control of LTI systems over unreliable communication links.Automatica,2006,42(9):1429-1439.
    [175]Boukas E K,Haurie A.Manufacturing flow control and preventive maintenance:a stochastic control approach.IEEE Transactions on Automatic Control,1990,35(9):1024-1031.
    [176]Ji Y,Chizeck H J.Optimal quadratic control of jump linear system with separately controlled transition probabilities.International Journal of Control,1989,49(2):481-491.
    [177]Boukas E K,Liu Z K.Jump linear quadratic regulator with controlled jump rates.IEEE Transactions on Automatic Control,2001,46(2):301-305.
    [178]Mariton M.Jump linear systems in automatic control.New York:Marcel Dekker,1990.
    [179]Bertsekas D P.Nonlinear programming,second edition.Belmont,Massachusetts:Athena Scientific,1999.
    [180]Abou-Kandil H,Smet O D,Freiling G,Jank G.Flow control in a failure-prone multimachine manfacturing system.Proceedings of INRIA/IEEE Symposium on Emerging Technologies and Factory Automation,1995,2:575-583.
    [181]徐琰恺,陈曦.模态跳变概率可控的Markov跳变线性系统的优化.控制与决策,2008,23(3):246-250.
    [182]徐琰恺,陈曦.基于强化学习的JLQ模型的直接自适应最优控制.控制与决策已录用.
    [183]程代展,郭宇骞.切换系统进展.控制理论与应用,2005,22(6):954-960.
    [184]Boukas E K,Shi P,Andijani A.Robust inventory-production control problem with stochastic demand.Optim.Control Appl.Meth.,1999,20:1-20.
    [185]Xue F,Guo L.Necessary and sufficient conditions for adaptive stabilizability of jump linear systems.Communications in Information and Systems,2001,1(2):205-224.
    [186]张利军,李春文,程代展.参数不确定马尔可夫跳变系统的鲁棒适应控制.控制与决策,2005,20(9):1030-1033.
    [187]刘飞.不确定跳变系统鲁棒L_2-L_∞滤波.控制与决策,2005,20(1):32-35.
    [188]刘飞,张曦煌.L_2增益约束下跳变系统鲁棒控制.控制理论与应用,2006,23(3):1030-1037.
    [189]刘飞,苏宏业,褚健.含参数不确定性的马尔可夫跳变过程鲁棒正实控制.自动化学报,2003,29(5):761-766.
    [190]Costa O L V,Aya J.Monte carlo methods for the optimal control of discrete-time Markovian jump linear systems.Automatica,2002,38:217-225.
    [191]Cao X R.A unified approach to Markov decision problems and performance sensitivity analysis.Automatica,2000,36:771-774.
    [192]Goodwin G C,Sin K S.Adaptive filtering prediction and control.New Jersey:Prentice-Hall,1984.
    [193]Xu Y K,Cao X R.Time aggregation based optimal control and Lebesgue sampling.Proceedings of IEEE Conference on Decision and Control,New Orleans,USA,Dec.2007:5904-5909.
    [194]Astrom K J,Johan K,Wittenmark B.Computer-controlled systems,third edition.Prentice Hall,1997.
    [195]Astrom K J,Bernhardsson B M.Comparison of Riemann and Lebesgue sampling for first order stochastic systems.Proceeding of the 41th IEEE Conference on Decision and Control,Las Vegas,Nevada USA,December 2002.
    [196]Arzen K E.A simple event-based PID controller.Proc.IFAC World Congress,Beijing,China,1999.
    [197]Bernhardsson B.Event triggered sampling.In:M.Torngren and M.Sanfridson(Eds.),Research problem formulations in the DICOSMOS project.Lund Institute of Technology,Lund,Sweden,1998.
    [198]Branicky M S,Borkar V S,Mitter S.A unified framework for hybrid control.IEEE Trans,on Automatic Contr.,1998,43(1):31-45.
    [199]Hou Z T,Liu G X.Markov skeleton processes and their applications.Beijing:Science Press,2005.
    [200]Miskowicz M.The event-triggered sampling optimization criterion for distributed networked monitoring and control systems.Proceeding of IEEE International Conference on Industrial Technology,Maribor,Slovenia,2003:1083-1088.
    [201]Rabi M,Baras J S.Sampling of diffusion processes for real-time estimation.Proceeding of IEEE Conference on Decision and Control,Atlantis,Paradise Island,Bahamas,December 2004.
    [202]McCann R,Gunda A K,Damugatla S D.Improved operation of networked control systems using Lebesgue sampling.Industry Applications Conference,2004,2:1211-1216.
    [203]Miskowicz M,Kuta S.Application-driven flow control in distributed monitoring and control systems.Proceedings of IEEE International Conference on Industrial Technology,Maribor,Slovenia,2003:421-425.
    [204]Persson N,Gustafsson F.Event based sampling with application to vibration analysis in pneumatic tires.IEEE International Conference on Acoustics,Speech,and Signal Processing,Salt Lake City,2001:3885-3888.
    [205]Persis C De.N-bit stabilization of n-dimensional nonlinear systems in feedforward form.IEEE Transactions on Automatic Control,2005,30(3):299-311.
    [206]DeWeerth S,Nielsen L,Mead C,Astrom K J.A neuron-based pulse servo for motion control.IEEE Int.Conference on Robotics and Automation,Cincinnati,Ohio,1990.
    [207]Sira-Ramirez H.A geometric approach to pulse-width modulated control in nonlinear dynamical systems.IEEE Trans,on Automatic Contr.,1989,34(2):184-187.
    [208]Forestier J P,Varaiya P.Multilayer control of large Markov chains.IEEE Trans,on Automatic Contr.,1978,23:298-305.
    [209]Karlin S,Taylor H M.A second course in stochastic processes.California:Academic Press,1981.
    [210]Wald A.Fitting of straight lines if both variables are subject to error.Annals of Mathematical Statistics,1940,11:284-300.
    [211]Abounadi J,Bertsekas D,Borkar V S.Learning algorithms for Markov decision processes with average cost.SIAM J.Control Optim.,2001,40(3):681-698.
    [212]Cassandras C G,Pepyne D L,Wardi Y.Optimal control of a class of hybrid systems.IEEE Trans,on Automatic Contr.,2001,46(3):398-415.
    [213]Koutsoukos X D,Antsaklis P J,Stiver J A,Lemmon M.D.Supervisory control of hybrid systems.Proceedings of the IEEE,2000,88(7):1026-1049.
    [214]Xu Y K,Chen X.Learning algorithm for LQG model with constrained control.Accepted by IFAC World Congress,Seoul,Korea,July 2008.
    [215]Chmielewski D J,Manousiouthakis V.On constrained infinite-time linear quadratic optimal control.Systems and Control Letters,1996,29:121-129.
    [216]Scokaert POM,Rawlings J B.Constrained linear quadratic regulation.IEEE Trans.Automat.Contr.,1998,43:1163-1169.
    [217]Bemporad A,Morari M,Dua V,Pistikopoulos E N.The explicit linear quadratic regulator for constrained systems.Automatica,2002,38:3-20.
    [218]Toivonen H T.Suboptimal control of linear discrete stochastic systems with linear input constraints.IEEE Trans.Automat.Contr.,1983,28:246-248.
    [219]Lee J H,Cooley B L.Optimal feedback control strategies for state-space systems with stochastic parameters.IEEE Trans.Automat.Contr.,1998,43:1469-1475.
    [220]Perez T,Haimovich H,Goodwin G C.On optimal control of constrained linear systems with imperfect state information and stochastic distrubances.Int.J.of Robust and Nonlinear Control,2004,14:379-393.
    [221]Batina I,Stoorvogel A A,Weiland S.Stochastic disturbance rejection in model predictive control by randomized algorithms.Proceedings of the American Control Conference,Arlington,June 2001.
    [222]Baxter J,Bartlett P L.Infinite-horizon policy-gradient estimation.J.Art.Intell.Res.,2001,15:319-350.
    [223]Marbach P,Tsitsiklis J N.Approximate gradient methods in policy-state optimization of Markov reward processes.Discrete Event Dynamic Systems:Theory and Applications,2003,13:111-148.
    [224]Saberi A,Stoorvogel A A,Sannuti P.Control of linear systems with regulation and input constraints.London:Springer-Verlag,2000.
    [225]Batina I.Model predictive control for stochastic systems by randomized algorithms.PhD thesis,Eindhoven University of Technology,Netherlands,2004.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700