基于对等网络的资源搜索策略的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于对等网络的资源搜索策略的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on the Resource Retrieval for Peer to Peer Networks
作者：徐婕
论文级别：博士
学科专业名称：计算机系统结构
中文关键词：对等网络 ; 拓扑 ; 关键字搜索 ; 全文搜索 ; 分布式哈希表 ; 小世界模型 ; 向量空间 ; 降维
英文关键词：Peer-to-Peer network ; Topology ; Keyword-based retrieval ; Fulltext-based retrieval ; Distribute hash table ; Small world model ; Vector space ; Dimensional Reduction
学位年度：2007
导师：金海
学科代码：081201
学位授予单位：华中科技大学
论文提交日期：2007-06-03

摘要

近年来,随着Internet的飞速发展、网络带宽的成倍增加以及计算机计算能力的大大提高,对等网络逐渐引起了来自工业界和学术界越来越多的关注。对等网络通过对等和分布式的方式,在网络中不同节点间提供空闲的CPU处理能力,磁盘空间以及网络带宽。除了采用中央索引服务器的集中式对等网络之外,从网络拓扑上,对等网络还可以分为无结构对等网络和基于分布式哈希表的结构化对等网络。与任何大规模的分布式系统一样,对等网络系统成功与否不仅在于其网络结构的合理和有效,而且在很大程度上还取决于其资源搜索策略的灵活性和可扩展性。
     对等网络的资源搜索策略可以分为两种类型:基于关键字的搜索和基于全文的搜索。无结构对等网络采用类似泛洪的盲目搜索机制,虽然可以支持上面两种类型的查询,但搜索的效率和可扩展性都较低;结构化对等网络依据文档标识符进行查找,可扩展性和查找效率都较高,但是采用内容的哈希(Hash)值作为索引,其索引和内容语义无关,无法真正做到全文检索。
     对等网络的资源搜索的研究中,逻辑拓扑结构的组织方式和数据索引的放置方式是两个很重要的研究内容。
     在基于关键字检索的对等网络平台的设计中,首先充分考虑节点网络邻近性特征,通过节点类划分的方法将物理距离较近的节点归为一组,以确保网络中邻近节点间的路由过程大部分能在组内部完成,从而避免了Chord等网络存在的绕路问题,并能够降低系统路由时间开销,减少发送消息的数量,据此提出了使用Landmark策略进行节点类划分的方法;然后,由于网络中的工作负载都是有空间和时间的局部性,而且用户总是趋向于查找自己感兴趣的资源,这些资源通常属于同一个类别,因此根据数据的类别进行数据索引的存放,使得同一类数据索引放在相近的节点上,然后根据类检索表,可以快速的找到同一类别的数据;另外,小世界现象在网络中广泛存在的,将非确定性缓存策略应用于路由表,然后采用SW(Small World)缓存置换策略,使得对等网络能够逐渐收敛于小世界模型;在最后的理论分析和性能测评中,表明了这种策略能够提高系统的查找性能并且可以减少系统的维护开销。
     在基于全文检索的对等网络平台的设计中,首先考虑了数据索引的放置和定位策略。使用一个平衡树(文档聚类树)来组织对等网络环境中的共享数据,通过调节平衡因子的大小,可以控制和减小文档搜索的时间复杂度;然后,给出一个简单的树节点放置策略,从而保证了系统的负载均衡以及保证了系统的容错性;随后,提出TRES-CORE查询策略,使得每个查询对每个节点只操作一次,降低了分布式环境下的查询时间,避免了查询中的路由绕路问题。另外,使用向量空间模型(Vector Space Model,简称VSM)技术提取全文的数据索引,通常会有成千上万个关键字,对应了成千上万维的特征空间。这些高维的特征集对资源索引的建立是非常有害的。进而,提出了基于粗糙集的文档空间降维技术来提高资源的搜索性能。
     逻辑拓扑结构的组织方式,是基于全文检索的对等网络平台设计中的另外一个重要研究内容。首先,采用分层的方式给出了一个对等网络资源检索的通用模型,并设计了每层之间的接口。采用分层的结构,可以使得模型具有较强的适应性,当某层策略发生改变时,其它层次的策略可以不变。随后,基于扩展性好、查询效果好、不存在系统瓶颈以及能够支持全文检索的目标,提出了一个半结构化混合模型。在半结构化混合模型中,所有节点根据物理位置分成若干个节点类。每个节点类中存在一个超级节点(Super Peer,简称SP)和若干个普通节点(Ordinary Peer,简称OP)。超级节点采用分布式哈希表(Distribute Hash Table,简称DHT)的方式进行组织,每个节点类中所有的节点以非结构化的方式组织。在这里,半结构是指在系统的构造中同时存在结构化和非结构化的组织方式;混合是指在系统中仍然存在超级节点SP,但此时SP记载用于节点和文档的分类信息,并不维护整个节点类中所有共享文档的索引信息,从而部分解决了SP是系统瓶颈的问题。在最后的实验测试中,可以获知这个设计在能够完成全文搜索的前提下,在资源搜索效率上也有一定的提高。
With the rapid growth of Internet and computing power, peer-to-peer (P2P) systems have gained much attention from both industrial and academic fields. P2P systems share idle CPU power, free disk space and network bandwidth between different peer nodes in a distributed and equal way. Expect for centralized systems based on an index server, P2P systems can be roughly classified into two categories: unstructured P2P systems and DHT-based structured P2P systems. As for any large distributed system which is used heavily, the effectiveness of P2P systems largely depends on not only its topology structure, but also the versatility and scalability of its retrieval mechanism.
     The resource retrieval mechanism for P2P systems can be classified into two categories: keyword-based retrieval and fulltext-based retrieval. Resource retrieval mechanisms in unstructed P2P systems are inherently blind, which makes the search inefficient and unscalable. While structured P2P networks can provide search efficiency and scalability by deploying identifier-based retrieval mechanism, they fail to support flexible full-text retrieval just as unstructured P2P systems can do.
     For the research on the resource retrieval in P2P systems, there are two important research fields: the logical topology structure of network system and the placement of resource index.
     For the P2P system which supports the keyword-based retrieval, the Landmark scheme is proposed to group all of peer nodes into several clusters based on the physical topology of network firstly, which makes peer nodes in the same cluster have small link latency and peer nodes in the different cluster have long link latency. It can guarantee most of the routing is in the same cluster which can avoid the“reroute”in the Chord system and can reduce the time cost for the routing as well as the number of messages. Then, because it is obvious that P2P system workload has temporal and spatial localities just as that in the web traffic and users always retrieve data of a kind, which they are interested in, the resource index should be stored based on resource semantics which makes the same kind of resources placed in the same cluster. After that, a class cache table is utilized to cache the identifier of peer node where the resource of some kind searched recently stores and the identifier of this kind. If the resource of this kind is researched next, the information of cache table can be used directly. Lastly, because it has been observed that the small world phenomenon is pervasive in the network. A non-deterministic caching scheme is given to reduce maintenance cost for updating the routing cache table. And the SW cache replacement scheme with the small-world paradigm instead of the traditional LRU scheme is proposed to further improve the performance of object lookup. Both theoretical analysis and simulations show this scheme can improve the lookup performance as well as it can reduce maintenance cost under the same size of routing table.
     For the P2P system which supports the full-text retrieval, the placement scheme of resource index is considered firstly. A height-balanced tree structure DOC-Tree used to organize data objects in vector-format in the P2P system is proposed, which can reduce the time complex of search. The simple strategy for the placement of tree’s nodes is given, which can guarantee both load balance and fault tolerance. After that, TRES-CORE searching scheme is used to reduce the search time in the distribute environment. The resource index is extracted using the vector space model technology, which will result in hundreds or thousands of dimensions in the resource vector space. So a dimension reduction technology based on the rough set is presented to improve the efficiency of search mechanism.
     The logical topology structure of the network system is another research field for the P2P system which can support the full-text retrieval. Firstly, a general hierarchical model for the resource retrieval of P2P systems is presented and interfaces among each level are also given. Then, a semi-structural and hybrid logical network structure (SSH) is proposed, which can obtain a good scalability for the system and a good efficiency for the search mechanism as well as can also avoid the system bottleneck and support the full-text retrieval. In the SSH model, all of peer nodes are partitioned into several peer clusters according to their physical locations. There are a super peer (SP) and several ordinary peers (OP) in each peer cluster. Super peers are organized by Distribute Hash Table (DHT) and peer nodes in each cluster are organized in the unstructured way. At last, it is known this design can get an improvement for the resource retrieval in the P2P system through the experiment results.

引文

[1] C. Shirky. What is P2P and What isn't. O'Reilly Network, Nov. 2000 http://www.openp2p.com/pub/a/ p2p/ 2000/11/24/shirky1-whatisp2p.html
    [2] S. Saroiu, K. P. Gummadi, S. D. Gribble. Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts. Multimedia Systems, 2003, 9(2):170~184
    [3] M. Ripeanu, I. Foster, A. Lamnitchi. Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System. Journal of Internet Computing, 2002, 6(1):50~57
    [4] N. Leibowitz, M. Ripeanu, A. Wierzbicki. Deconstructing the Kazaa Network. In: Proc. of 3rd IEEE Workshop on Internet Applications (WIAPP’03), Santa Clara, CA, 2003. 112~120
    [5] D. Anderson, J. Cobb, E. Korpela et al. SETI@home: An Experiment in Public-Resource Computing. Communications of the ACM, 2002, 45 (11):56~61
    [6] Groove. http://www.groove.net/
    [7] Pandango. http://www.pandango.com
    [8]严蔚敏,吴伟明.数据结构.北京:清华大学出版社, 2002
    [9] V. Kalogeraki, D. Gunopulos, D. Zeinalipour-Yazti. A Local Search Mechanism for Peer-to-Peer Networks. In: Proc. of the 11th Int’l Conf. on Information and Knowledge Management (CIKM-02). New York: ACM Press, 2002. 300~307
    [10] B. Yang, H. Garcia-Molina. Improving Search in Peer-to-Peer Networks. In: Rodrigues LET, Raynal M, Chen WSE, eds. Proc. of the 22nd Int'l Conf. on Distributed Computing Systems. Washington: IEEE Computer Society, 2002. 5~14
    [11] Q Lv, P Cao, E Cohen et al. Search and Replication in Unstructured Peer-to-Peer Networks. In: K. Ebcioglu, K. Pingali, A. Nicolau, eds. Proc. of the 16th ACM Int'lConf. on Supercomputing. New York: ACM Press, 2002. 84~95
    [12] S. Daswani, A. Fisk. Gnutella UDP Extension for Scalable Searches (GUESS) v0.1. https://www.limewire.org/fisheye/browse/~raw,r=1.2/limecvs/core/guess_01.html, 2002
    [13] D. Tsoumakos, N. Roussopoulos. Adaptive Probabilistic Search (APS) for Peer-to-Peer Networks. Technical Report CS-TR-4451, University of Maryland, 2003
    [14] A. Crespo, H. Garcia-Molina. Routing Indices for Peer-to-Peer Systems. In: L Rodrigues, M. Raynal, W. Chen, eds. Proc. of the 22nd Int'l Conf. on Distributed Computing Systems. Washington: IEEE Computer Society, 2002. 23~34
    [15] D Menascé, L Kanchanapalli. Probabilistic Scalable P2P Resource Location Services. ACM SIGMETRICS Performance Evaluation Review, 2002, 30(2): 48～58
    [16] K. S. Candan, W. S. Li, Q. Luo et al. Enabling Dynamic Content Caching for Database-Driven Web Sites. In Proc. the 2001 ACM SIGMOD Int. Conf. Management of Data, Santa Barbara, California, USA, May 2001. 532~543
    [17] S. Saroiu, K. P. Gummadi, R. J. Dunn et al. An Analysis of Internet Content Delivery Systems. In: D. Culler, P. Druschel, eds. Proc. of the 5th Symp. on Operating Systems Design and Implementation (OSDI 2002). Boston: USENIX Association, 2002. 315~327.
    [18] I. Stoica, R. Morris, D. Karger et al. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In: Govindan, ed. Proc. of the ACM SIGCOMM. San Diego: ACM Press, 2001. 149~160
    [19] S. Ratnasamy, P. Francis, M. Handley et al. A Scalable Content-Addressable Network. In: Govindan, ed. Proc. of the ACM SIGCOMM. San Diego: ACM Press, 2001. 161~172
    [20] A. Rowstron, P. Druschel. Pastry: Scalable, Distributed Object Location and Routing for Large-scale Peer-to-Peer Systems. In: R. Guerraoui ed. Proc. of the 18thIFIP/ACM International Conference on Distributed Systems Platforms. London: Springer-Verlag, 2001. 329~350
    [21] B. Y. Zhao, L. Huang, J. Stribling et al. Tapestry: A Resilient Global-Scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications, 2004, 22: 41~53
    [22] K. Tutschku. A Measurement-based Traffic Profile of the eDonkey Filesharing Service. In: Proceedings of the 5th International Workshop on Passive and Active Network Measurement (PAM), Springer, 2004. 12~21
    [23] D. Erman. BitTorrent Traffic Measurements and Models [Master Dissertation]. Blekinge Institute of Technology (BTH), Karlskrona, Sweden, October 2005
    [24] Y. Kulbak, D. Bickson. The emule protocol specification. Technical report TR-2005-03, the Hebrew University of Jerusalem, 2005
    [25] P. Maymounkov, D. Mazieres. Kademlia: A Peer-to-Peer Information System Based on the Xor Metric. In: Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS'02), Cambridge, MA, 2002. 53～65
    [26] M. Stokes. Gnutella2 Specifications Part One. http://www.gnutella2.com/
    [27] S. Baset, H. Schulzrinne. An Analysis of the Skype Peer-toPeer Internel Telephony Protocol. Technical Report CUCS-039-04, Computer Science Department, Columbia University, NY, 2004
    [28] JXTA. http://wwwjxta.org/
    [29] S. Waterhouse. JXTA Search: Distributed Search for Distributed Networks. White Paper, Sun Microsystems, Inc. http://search.jxta.org/JXTAsearch.pdf, May 2001
    [30] G.. S. Manku, M. Bawa, P. Raghavan. Symphony: Distributed Hashing in a Small World. In: Proceedings of the Fourth USENIX Symposium on Internet Technologies and Systems (USITS). Seattle, WA, USA, 2003. 127~140
    [31] J.C. Bermond, C. Peyrat. De Bruijn and Kautz Networks: A Competitor for the Hypercube. In: F. Andre, J. P. Venus eds. Proc. of the First European Workshop onHypercube and Distributed Computers. North-Holland, Amsterdam: Elsevier Science Publishers, 1989. 279~294
    [32] M. F. Kaashoek, D. R. Karger. Koorde: A Simple Degree-Optimal Distributed Hash Table. In: Proc of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS 2003). Berlin: Springer, 2003. 98~107
    [33] P. Fraigniaud, P. Gauron. An Overview of the Content-Addressable Network D2B (Brief Announcement). In: Proc. of the 22nd Annual Symposium on Principles of Distributed Computing (PODC’03). ACM Press, 2003. 151~151
    [34] H. J. Siegel. Interconnection Networks for SIMD Machines. Computer, 1979, 12(6): 57~65
    [35] D. Malkhi, M. Naor, D. Ratajczak. Viceroy: A Scalable and Dynamic Lookup Network. The 21st ACM Symp on Principles of Distributed Computing (PODC), Monterey, California, 2002
    [36] W. Pugh. Skip Lists: A Probabilistic Alternative to Balanced Trees. Communications of the ACM,1990, 33(6): 668～676
    [37] J. Aspnes, G. Shah. Skip Graphs. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, Baltimore, Maryland, USA, 2003. 384～393
    [38] N. J. A. Harvey, J. Dunagan, M. B. Jones et al. SkipNet: A Scalable Overlay Network with Practical Locality Properties. Technical Report MSR-TR-2002-92, Microsoft Research, 2002
    [39] F. M. Cuenca-Acuna, T. D .Nguyen. Text-Based Content Search and Retrieval in Ad Hoc P2P Communities. Technical Report DCS-TR-483, Department of Computer Science, Rutgers University, 2002
    [40] B. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 1970, 13(7): 422~426
    [41] M. J. Lin, K. Marzullo. Directional Gossip: Gossip in a Wide-Area Network. Technical Report CS1999-0622, Dept of Computer Science and Eng., Univ. ofCalifornia, San Diego, June 1999
    [42] C. Tang, Z. Xu, M. Mahalingam. pSearch: Information Retrieval in Structured Overlays. ACM SIGCOMM Computer Communication Review, 2003. 33(1): 89~94
    [43] C. Tang, Z. Xu, S. Dwarkadas. Peer-to-Peer Information Retrieval Using Self- Organizing Semantic Overlay Networks. In: F. Anja, Z. Martina, C. Jon, W. David, eds. Proc. of the ACM SIGCOMM 2003. Karlsruhe: ACM Press, 2003. 175~186
    [44] Z. Xu, Z. Zhang. Building Low-Maintenance Expressways for P2P Systems. Technical Report HPL-2002-41, HP Laboratories Palo Alto, 2002
    [45] D. A. Tran. A Hierarchical Semantic Overlay Approach to P2P Similarity Search. In Proceedings of USENIX Annual Technical Conference (USENIX 2005), 2005. 355~358
    [46] D. A. Tran, K. A. Hua, T. T. Do. Zigzag: An Efficient Peer-to-Peer Scheme for Media Streaming. In: Proc. of the IEEE INFOCOM 2003. New York: IEEE Computer and Communications Societies, 2003. 1283~1293
    [47]程学旗,吕建明,周昭涛.基于对等网络的全文信息检索.计算机研究与发展, 2004, 41(12): 2148~2155
    [48]周晋,路海明,卢增祥等.基于部分匹配方式的可扩展P2P搜索算法,清华大学学报(自然科学版), 2004, 44(10):1389~1393
    [49] S. Waterhouse, D. M. Doolin, G. Kan et al. Distributed Search in P2P Networks. IEEE Internet Computing, 2002, 6(1): 68~72
    [50] J. Lu, J. Callan. Content-based Retrieval in Hybrid Peer-to-Peer Networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management , New Orleans, LA, USA, 2003.199~206
    [51]凌波,陆志国,黄维雄等. PeerIS:基于Peer-to-Peer的信息检索系统.软件学报,2004, 15(9): 1375~1384
    [52] A .Y. Zhou, W. N. Qian, S .G. Zhou et al. Data Management in Peer-to-PeerEnvironment: A Perspective of BestPeer. J .Comput. Sci.& Technol, 2003, 18(4) :.452~461
    [53] I. Clarke, O. Sandberg, B. Wiley et al. Freenet: A Distributed Anonymous Information Storage and Retrieval System. In: Workshop on Design Issues in Anonymity and Unobservability. Berkeley: ICSI, 2000. 311~320
    [54] Y. Liu, Z. H. Zhuang, L. Xiao et al. AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems. In: Proceedings of IEEE Global Telecommunications Conference (Globecom). San Francisco, USA, December 2003. 4186~4190
    [55] Y. Liu, X. Liu, L. Xiao et al. Location-Aware Topology Matching in P2P Systems. In: Proceedings of the IEEE INFOCOM 2004, Twenty-third Annual Joint Conference of the IEEE Computer and Communications Societies. Hong Kong, 2004. 2220~2230
    [56] L. Xiao, Y. Liu, L. M. Ni. Improving Unstructured Peer-to-Peer Systems by Adaptive Connection Establishment. IEEE Trans. on Computers, 2005, 54(9):1091~1103
    [57] B. Y. Zhao, Y. Duan, L. Huang et al. Brocade: Landmark Routing on Overlay Networks. In: P. Druschel, M. Kaashoek, A. Rowstron, eds. Proceedings of the 1st Int’l Workshop on Peer-to-Peer Systems (IPTPS 2002). Berlin: Springer-Verlag, 2002. 34~44
    [58] B. Krishnamurthy, J. Wang, Y. L. Xie. Early Measurements of a Cluster-Based Architecture for P2P Systems. In: Proceedings of the ACM SIGCOMM Internet Measurement Workshop. New York: ACM Press, 2001. 105~109
    [59] M. A. Jovanovic. Modeling Large-Scale Peer-to-Peer Networks and a Case Study of Gnutella. MS. Thesis. United States: University of Cincinnati, 2001
    [60] S. Ratnasamy, M. Handley, R. Karp et al. Topologically-Aware Overlay Construction and Server Selection. In: Proceedings of IEEE INFOCOM’02, NewYork, NY, 2002. 1190~1199
    [61] D. R. Karger, E. Lehman, F. Leighton et al. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In: Proceedings 29th Annu. ACM Symp. Theory of Computing, El Paso, TX, May 1997. 654~663
    [62] J. Kubiatowicz, D. Bindel, P. Eaton et al. OceanStore: An Architecture for Global-Scale Persistent Storage. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’00), Cambridge, MA, Nov 2000.190~201
    [63] F. Dabek, M. Frans Kaashoek, D. Karger et al. Wide-Area Cooperative Storage with CFS. In: Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01), Banff, Alberta, Canada, Oct. 2001. 202~215
    [64] A. Mahanti. Web Proxy Workload Characterisation and Modelling. Master Thesis. Department of Computer Science, University of Saskatchewan, September 1999
    [65]华东师范大学数学系编.数学分析(第二版).北京:高等教育出版社, 1992
    [66] D. Watts, S. Strogatz. Collective Dynamics of Small-World Networks. Nature, 1998, 393: 440~442
    [67] J. Kleinberg. Small-World Phenomena and the Dynamics of Information. In: Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2002. 14~25
    [68] J. Kleinberg. The Small-World Phenomenon: An Algorithmic Perspective. Cornell Computer Science, Technical Report 99-1776, 2000
    [69] H. Zhang, A. Goel, R. Govindan. Using the Small-World Model to Improve Freenet Performance. In Proceedings of IEEE INFOCOM 2002, the 21st Annual Joint Conference of the IEEE Computer and Communications Societies, New York, USA, 2002. 1228~1237
    [70]刘次华.随机过程(第二版).武汉:华中科技大学出版社,2003
    [71] E. W. Zegura, K. Calvert, S. Bhattacharjee. How to Model an Internet Work. In Proceedings of the 15th Annual Joint Conf. of the IEEE Computer and Communications Societies (INFOCOM'96). San Francisco: IEEE Communications Society, 1996. 594~602
    [72] G. Salton, A. Wang, C. Yang. A Vector Space Model for Information Retrieval. Journal of the American Society for Information Science, 1975, 18: 613~620
    [73] Y. Yang, X. Liu. A Re-Examination of Text Categorization Methods. In: J. Callan, ed. Proceedings of the ACM SIGIR. Berkeley: ACM Press, 1999. 42~49
    [74] P. W. Foltz. Using Latent Semantic Indexing for Information Filtering. In: F. H. Lochovsky, R. B. Allen, eds, Proceeding of Conference on Office Information Systems. New York: ACM Press, 1990. 40~47
    [75] R. K. Belew. Adaptive Information Retrieval: Using a Connectionist Representation to Retrieve and Learn about Documents. In: N. J. Belkin and C. J. van Rijsbergen, eds, Proc. of the Twelfth Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, 1989. 11~20
    [76] R. Khare, A. Rifkin. XML: A Door to Automated Web Applications. IEEE Internet Computing, 1997, 1(4):78~ 87
    [77] S. Decker, S. Melnik, F. van Harmelen et al. The Semantic Web: the Roles of XML and RDF. IEEE Internet Computing, 2000, 4(5):63~73
    [78] I. Witten, A. Moffat, T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images (Second Edition). San Francisco: Morgan Kaufmann, 1999
    [79] T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1997, 1(2):141~182
    [80] D. Widyantoro, J. Yen. An Incremental Approach to Building a Cluster Hierarchy. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM02), 2002. 705~708
    [81] D.Comer. The Ubiquitous Btree. ACM Computing Surveys, 1979, 11(2): 121~137
    [82] Hai Jin, Jie Xu, Bin Cheng et al. A Fault-Tolerant TCP Scheme Based on Multi-Images. In: Proceedings of 2003 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM'03), IEEE Press, August 28-30, 2003. 968~971
    [83]徐婕,金海,程斌等. MI-TCP--一种基于多映像的TCP连接容错策略.计算机研究与发展, 2004, 41(11): 1889~1894
    [84] A. K. Jain, M. N. Murty, P. J. Flynn. Data Clustering: A Review. ACM Computing Survey, 1999, 31(3): 264~323
    [85] M. Y. Chen, J. W. Han, S. Y. Yu. Data Mining: An Overview from a Database Perspective. IEEE Transaction Knowledge and Data Engineering, 1996, 18(6): 866~883
    [86] M. F. Usama. Data Mining and Knowledge Discovery: Making Sense out of Data. IEEE Expert, 1996, 11 (5) : 20~25
    [87] I. T. Jolliffe. Principal Component Analysis. New York : Springer-Verlag, 1986
    [88] A.M. Martinez, A. C. Kak. PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23 (2) :228～233.
    [89]陈涛,谢阳群.文本分类中的特征降维方法综述.情报学报, 2005, 24(6): 690~695
    [90] Z. Pawlak. Rough Set. International Journal of Computer and Information Science, 1982, 11(4): 341~356
    [91] S. Wong, W. Ziarko. On Optimal Decision Rules in Decision Tables. Bulletin of Polish Academy of Sciences, 1985, 33(11-12): 693~696
    [92]徐燕,怀进鹏,王兆其.基于区分能力大小的启发式约简算法及其应用.计算机学报,2003, 26 (1):97~103
    [93]李玉榕,乔斌,蒋静坪.基于熵的粗糙集属性简约算法.电路与系统学报,2002, 7 (3):8~12
    [94]刘少辉. Rough集高效算法的研究.计算机学报,2003, 26 (5):524~529
    [95] X. H. Hu, N. Cercone. Learning in Relational Databases: A Rough Set Approach.International Journal of Computational Intelligence, 1995, 11 (2) : 323~338
    [96]叶东毅. Jelonek属性约简算法的一个改进.电子学报, 2000, 28(12): 81~82
    [97] A. Skowron, C. Rauszer. The Discernibility Matrics and Functions in Information Systems. In: Slowinski R ed. Intelligent Decision Support: Handbook of Application and Advances of the Rough Sets Theory. Dordrecht: Kluwer Academic Publishers, 1992. 331~362
    [98]常犁云等.一种基于Rough set理论的属性约简及规则提取方法.软件学报, 1999, 10 (11): 1206~1211
    [99] J. Wang. Reduction Algorithms Based on Discernibility Matrix: the Ordered Attributes Method. Journal of Computer Science & Technology, 2001, 16 (6): 489~504
    [100] J. Yang, V. Honavar. Feature Subset Selection Using a Genetic Algorithm. In Proceedings of the 2nd International Conference on Genetic Programming (GP-97). CA, USA: Morgan Kaufmann, 1997. 380~385
    [101] D. Pan. A Novel Self-Optimizing Approach for Knowledge Acquisition. IEEE Transaction on Systems, Man, and Cybernetics, 2002, 32 (4): 505~514
    [102]张文修. Rough集理论与方法.北京:科学出版社, 2001.
    [103]王国胤. Rough集理论和知识获取.西安:西安交通大学出版社,2001.
    [104] D. J. Newman, S. Hettich, C. L. Blake et al. UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science, 1998
    [105] Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data. Boston: Muwer Academic Publishers, 1991
    [106]安利平,全凌云.粗糙集理论中一种属性离散化算法.河北工业大学学报,2002, 31(3): 39—43
    [107] S. D. Kamvar, M. T. Schlosser, H. Garcia-Molina. The EigenTrust Algorithm for Reputation Management in P2P Networks. In: Proceedings of the 12th Int'l Conf. onWorld Wide Web. Budapest: ACM Press, 2003. 640~651
    [108]吴增得.异构环境下结构化对等网络路山算法的研究[博士论文].上海交通大学, 2003
    [109] M. Waldvogel, R. Rinaldi. Efficient Topology-Aware Overlay Network. Computer Communication Review, 2003, 33(1): 101~106
    [110] B. Wilcox-O'Hearn. Experiences Deploying a Large-Scale Emergent Network. In: Proceedings of the First International Workshop on Peer-to-Peer Systems (IPTPS '02), 2002. 104~110
    [111] K. Ranganathan, A. Iamnitchi, I. Foster. Improving Data Availability Through Dynamic Model-Driven Replication in Large Peer-to-Peer. In the proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02), 2002. 376~381
    [112] B.R .Yates. Modern Information Retrieval. Addison Wesley,1999
    [113] E. Rich, K .Knight. Artificial Intelligence. McGraw-Hill Inc., 1991
    [114] I. Witen, E. Frank. Data Mining. Morgan Kaufmann Publishers,1999

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700