用户名: 密码: 验证码:
基于作业流的作业管理系统的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
作业管理的概念非常重要,目的在于强化操作系统的批处理功能,提供对作业的提交、调度、执行及控制等机制,从而能够更加有效地利用系统资源、平衡网络负载,提高系统的整体性能。作者于2000年11月进入西北工业大学软件工程中心,参与了一个国际间的合作项目——作业管理系统服务器端系统软件的设计与开发,在两年多的实践基础上,作者对作业管理系统的框架体系结构和实现方法有了一个比较深入的理解和认识。作者认为传统的作业管理系统都是以作业作为基本的管理和调度单位,而在实际应用中,对作业流的管理和调度却显得更为重要,针对这种状况,作者从作业流的观点出发,对传统的作业管理系统进行了功能扩展,提出了基于作业流的作业管理系统,并对该系统的体系结构和实现方法进行了详细地研究与分析,本文就反映了作者自2000年以来的主要研究成果。
     基于作业流的作业管理系统具有典型的客户/服务器模型结构,由客户、通讯代理Agent、服务器三层体系结构实现。客户端提供完全的GUI用户界面,完成用户对作业流的管理及对系统的管理功能;通讯代理采用TCP/IP网络通信协议以及与平台无关的数据通讯协议来实现客户与服务器之间的完全通讯;服务器端提供对作业流的全面支持,上层由作业流定义子系统和作业流引擎负责完成对作业流的定义与执行,底层使用网络队列系统(NQS)来实现对作业的调度与执行,使用户能够最大限度地利用计算机网络系统中的各种资源,以提高执行效率、降低作业成本。
     作业流是对作业概念的扩展,是将相互关联的作业按照一定的依赖关系组织而成的一个作业执行序列,是实现作业执行流程自动化的一种很好的解决方案。作业流描述语言用于对各种类型作业以及作业之间的依赖关系进行描述,将描述的作业流形成作业网络提交到系统,由服务器完成对它的解释、执行与控制。
     本文主要介绍了基于作业流的作业管理系统的设计及实现的关键技术,并重点对作业流的定义方法、作业流的控制与执行方面做了深入研究。主要的研究工作及成果有:
     ● 对作业管理系统的体系结构进行了全面分析,对结构的各个框架构成部分进行了详细的功能描述。并且从作业流的观点,提出了作业管理系统服务器的三层体系模型。
    
    西北仁业大学硕士学位论文
    在作业流的同步执行模型中,提出并实现了“送信作业”和“收信作
    业”两种新型作业,能够完成异机、以及异构平台的作业流之间的同
    步执行关系。
    针对目前作业流描述过程中存在的问题,提出了一种新的语言—作
    业流描述语言。该语言具有平台无关的特性,能够描述复杂的作业流。
    根据作业流的DAG模型,提出了一个基于作业网络的静态调度算法,
    该算法是对动态负载平衡算法的扩展,用于作业网络的静态负载平衡。
    对请求在队列中的状态迁移进行了深入分析,勾画了请求在管道队列
    以及批处理队列中的状态迁移模式。
The notion of Job Management is very important, for the purpose of strengthening the batch processing of OS, providing the mechanism of job submitting, scheduling, executing and controlling. Thereby, the system source can be utilized effectively, the network load can be balanced, and the system performance can be improved. The author has joined in the Software Engineering Center of NPU from November 2000, participating in the server design and development of a
    international cooperating project-Job Management System. On the basis of
    two-year-practice, the author has had a deep understanding on the architecture and realization methods of the Job Management System. The author considered that Job is the basic unit for management and schedule in the traditional Job Management System, but in real applications, the management and scheduling on JobFlow are more important. According to this fact, the author extends the function of the traditional Job Management System from the view of JobFlow. That is the Job Management System based on JobFlow. Research and analysis has done on the architecture and realization methods of this new system, and this article reflects the main research result of what the author has done from year 2000.
    The Job Management System based on JobFlow is provided with the typical model structure of C/S, and it is realized by three-layer architecture of Client, Communication Agent and Server. Client supplies complete Graphic User Interface, accomplishing the management of JobFlow and system; Communication agent accomplishes complete communication between client and server by TCP/IP network communication protocol and platform independent data communication protocol; Server supplies full support of JobFlow, The superstratum are the JobFlow Definition Subsystem and JnwEngine, which are responsible for the definition and execution of JobFlow; the understratum is the Network Queue System, which realized the schedule and execution of job, thus all kinds of sources in computer network system can be utilized in order to improve execution efficiency and reduce job cost.
    The JobFlow is the notional extension of job. It is a job execution sequence by correlative jobs organized according to the dependency of these jobs. And it is a
    
    
    
    good solving project to achieve the automatization of job execution flow. JobFlow describing language is used to describe all kinds of jobs and the dependency of these jobs, and then JobFlow can be formed into JobNetwork, which was submitted into system for explaining, executing and controlling by server.
    This article mainly introduces the implementation and realization key technology of Job Management System based on JobFlow, and emphasizes researching on the definition method, controlling and execution of JobFlow. The mainly research results include:
    · It fully analyzes the architecture of Job Management System, and every composing portion in frame structure is described in detail. And from the view of JobFlow, three-layer architecture model of Job Management System Server was provided
    · In synchronization execution model of JobFlow, two new
    job-"Send-Event Job" and "Receive-Event Job" were provided and
    realized, so it can accomplish the synchronization execution of JobFlow between different computers with different platform.
    · According to current existent problems on JobFlow description process, a new language is provided-JobFlow Description Language. This language has platform independent characteristic, and it can describe complex JobFlow.
    · Based on the DAG model of JobFlow, a new static scheduling algorithm is present to balance network workload statically as the extension of the dynamic load balancing mechanism.
    · The status transfer of request in queue is analyzed deeply, for sketching the status transfer pattern of request in pipe queue and batch queue.
引文
[1] Netshepherd & SystemScope/JobCenter User's Guide, NEC Corporation, 1998
    [2] Patrick N. Smith, Client/Server Computing, Sams Publishing2nd Edition, 1994
    [3] http://www.wfmc.org/
    [4] Brent A. Kingsbury, The Network Queueing System, Sterling Software, 1999
    [5] NQE User's Guide, SG-2148 3. 3, Cray Research and Silicon Graphics Company, 1998
    [6] Platform inc, LSF Administrator's Guide Version 4. 1, 2000
    [7] IBM, IBM LoadLeveler for AIX-Using and Administering, Version 2 Release 2 Document Number SA22-7311-01, Second Edition, April 2000
    [8] Condor Version 6. 1. 17 Manual, 2001
    [9] Albeaus Bayucan, Robert L. Henderson, PBS Administrator Guide, 1998
    [10] http://www.sw.nec.co.jp/middle/SystemScope/Products/gaivou.html#top
    [11] Load Share Facility: User's and Administrator's Guide, Platform Computing Corporation, Toronto, Canada, Aug 2000.
    [12] http://www-3. ibm.com/software/ts/mqseries/workflow/
    [13] Workflow Management Coalition, The workflow reference model, WFMC TCOO-1003, 1994
    [14] Workflow Management Coalition, Workflow management coalition terminology & glossary, WFMC TC00-1011, 1994
    [15] Alonso G, Agrawal D, Abbadi A E, et al. Functionality and limitations of current workflow management systems. IEEE Exper, 1997, 12(5)
    [16] Windows 2000 Clustering Technologies: Cluster Service Architecture, Microsoft Corporation, White Paper, 2000
    [17] W.M.P.vander Aalst, Workflow Verification:Finding Control-Flow Errors Using Petri-Net-Based Techniques, In W. van der Aalst, J. Desel, and A. Oberweis, editors, Business Process Management. Springer, 2000
    [18] W.M.P. van der Aalst, The Application of Petri Nets to Workflow Management, The Journal of Circuits, Systems and Computers, pages 21-66, volume 8:1, 1998
    [19] LSF Administrator's Guide, Platform Computing Corporation, June 2001
    
    
    [20] Veridian Systems, Inc., Portable Batch System Administrator Guide, October 2000
    [21] Robert L. Henderson, Dave Tweten, Portable Batch System Requirements Specification, NAS Scientific Computing Branch, NAS Systems Division, NASA Ames Research Center, August 1998.
    [22] Albeaus Bayucan, etc., Portable Batch System Administrator Guide, Numerical Aerospace Simulation System Division, NASA Ames Research Center, August 1998
    [23] Batch Queueing Systems, Scott Presnell, August 1998
    [24] Harsh Anand, Batch Differences: NQE/NQS vs. LoadLeveler, http://hpcf.nersc.gov, August 2001
    [25] Network Queuing System, http://www.reading.ac.uk/ITS/Topic/UnixOS/UnSQnqs_01/, April 1998
    [26] Karl Czajkowski, Ian Foster, etc., A Resource Management Architecture for Metacomputing System, Information Science Institute University of Southern California, Marina del Rey, CA 90292-6695, 1997
    [27] http://www.nec.com
    [28] SystemScope/JobCenter R9. 1, NEC Corporation, December 2000
    [29] http://www.platform.com
    [30] IEEE P1003. 2 Draft 11. 2, Institute of Electrical and Electronics Engineers, Inc., 1991. 9
    [31] Michel Cosnard Emmanuel, Compact DAG Representation and its Dynamic Scheduling, 1999
    [32] M. Livny, J. Basney, R. Raman, and T. Tannenbaum, Mechanisms for High Throughput Computing Technical Report, University of Wisconsin-Madison, WI, USA, 1997. 5
    [33] MSDN Library Visual Studio 6. 0, Microsoft Corporation, 2002
    [34] Chao-Ju Hou, Kang G. Shin, Implementation of Decentralized Load Sharing in Networked Workatation Using the Condor Pakage, Journal of Parallel and Distributed computing, 1997, 40(2) : 173-184
    [35] Microsoft NTLM, Microsoft Platform SDK, November 2001
    [36] Willim Saphir, Job Management Requirements for NAS Parallel Systems and Clusters, Leigh, 1998
    [37] http://www.sw.nec. co.jp/middle/SystemScope/Products/kousei.html
    [38] Kimberlite Cluster Whitepaper, Mission Critical Linux, Inc., 2000
    [39] 范玉顺,工作流管理技术基础,北京:清华大学出版社,2001
    
    
    [40] 刘铁铭和范玉顺,基于工作流的企业过程的建模和仿真技术研究,北京:清华大学学报,2002,40(1):107~111
    [41] 汤小春,基于集群技术的作业管理系统的研究与实现[博士论文],西安:西北工业大学,2001
    [42] 李宗良,作业管理系统的设计与实现,西安:西北工业大学硕士学位论文,1999
    [43] 李伟 徐志伟 唐志敏,国家高性能计算环境的设计与实现,中国科学院计算技术研究所,1998
    [44] 国家高性能计算环境可行性论证报告,国家高性能计算环境总体技术组,1999.6
    [45] 汤小春,李战怀,郑炜,一个基于偏序的定时投入关联网络作业调度算法,计算机研究与发展,1998.4
    [46] 陆丽娜、伍卫国等译校,分布式操作系统,北京:电子工业出版社,1999
    [47] 郑纬民,石威,汪东升等译,高性能集群计算(第一卷),北京:电子工业出版社,2001.6
    [48] 郑纬民,石威,汪东升等译,高性能集群计算(第二卷),北京:电子工业出版社,2001.7
    [49] 郑纬民,工作站/PC群集计算机系统,计算机世界,1999.8
    [50] 王庆凯,胡亮,PC机环境中的状态检测和空闲机器选择,计算机科学,1998.4.
    [51] 王意洁,肖侬,任浩,卢锡城,数据网格及其关键技术研究,计算机研究与发展,2002.8
    [52] 曹鸿强,肖侬,卢锡城,刘艳,一种基于市场的计算网格资源分配方法,计算机研究与发展,2002.8
    [53] 孙凝晖,刘淘英,支持网格的集群操作系统的设计,计算机研究与发展,2002.8
    [54] 鞠九滨等,DPVM:支持任务迁移和排队的PVM,计算机学报,1997.10
    [55] 张小梅,服务器端中间件技术,计算技术与自动化,2001.3
    [56] 肖钧,庞丽萍,Linux虚拟服务器中WRR调度算法的优化,华中科技大学学报,2001.2

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700