用户名: 密码: 验证码:
面向大图的可达性查询处理算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Efficient Processing Algorithm for Reachability Queries Based on Big Graph
  • 作者:陈子阳 ; 陈伟 ; 李娜 ; 周军锋
  • 英文作者:CHEN Zi-Yang;CHEN Wei;LI Na;ZHOU Jun-Feng;School of Information Management,Shanghai Lixin University of Accounting and Finance;School of Information Science and Engineering,Yanshan University;Department of Information Engineering,Hebei University of Environmental Engineering;School of Computer Science and Technology,Donghua University;
  • 关键词:大图 ; 有向无环图 ; 可达性查询处理 ; 最优生成树
  • 英文关键词:big graph;;directed acyclic graph;;reachability query processing;;optimal spanning tree
  • 中文刊名:JSJX
  • 英文刊名:Chinese Journal of Computers
  • 机构:上海立信会计金融学院信息管理学院;燕山大学信息科学与工程学院;河北环境工程学院信息工程系;东华大学计算机科学与技术学院;
  • 出版日期:2017-08-19 00:16
  • 出版单位:计算机学报
  • 年:2019
  • 期:v.42;No.435
  • 基金:国家自然科学基金(61472339,61572421,61272124)资助~~
  • 语种:中文;
  • 页:JSJX201903008
  • 页数:14
  • CN:03
  • ISSN:11-1826/TP
  • 分类号:132-145
摘要
图的可达性查询处理是生物信息领域的热点问题之一,用于测定蛋白质交互网络中任意两个蛋白质分子间是否存在交互作用.针对已有在可达查询比例增大时在线搜索算法效率下降明显及性能不稳定的问题,提出优化的OPT-R算法.首先,提出最优生成树的概念,使得采用最优生成树的OPT-R算法可以在常量时间回答更多的可达查询;同时提出基于栈的互逆拓扑顺序,使得OPT-R可以在常量时间回答更多的不可达查询.作者还提出相应的最优生成树及互逆拓扑顺序生成算法,并通过实验对基于20个不同规模的真实数据集从不同角度对算法的高效性进行了验证.
        Given a directed graph,a reachability query asks whether there exists a path from a node to another one in the directed graph.Reachability query processing is one of the fundamental operations in graph data processing,and is one of the hot issues in the field of bioinformatics,which focuses on the study of the interaction between any two proteins in the protein interaction network.Considering that each node can reach every other node in a strongly connected component(SCC)and all SCCs can be identified in linear time,existing algorithms always assume that the input graph is a directed acyclic graph(DAG).The two kinds of extreme methods to answer a reachability query are(a)computing transitive closure and answering the query in constant time by affording expensive storing cost,and(b)traversing the DAG to get the final answer without any pre-processing cost of index construction.With the increase of graph data,the two kinds of extreme methods are not scalable in practice.Existing methods always make trade-off between the two extremes,such that to accelerate index construction,reduce the index size,and improve the performance of query processing as more as possible.Existing methods can be generally classified into two categories:(1)the Label-Only methods,which assign each node a label to store the complete reachable information,such that when answering the given query,we do not need to traverse the input graph,(2)the Online-Search methods,which also assign each node a label,but only store partial reachable information to achieve fast index construction speed.However,we may need to traverse the input graph when we cannot get the final answer by the node labels.As a comparison,the Label-Only methods are not scalable for big graphs,and by designing efficient pruning techniques,the Online-Search methods can achieve similar query performance,and therefore are more applicable in practice.However,the current Online-Search methods are not efficient when processing reachable queries.In fact,many reachability queries will return positive result,and the pruning techniques of existing Online-Search methods focus on answering negative queries,they may not be efficient for reachable queries.Considering that existing algorithms on reachability query processing are inefficient and unstable when the ratio of reachable queries increases,we propose an optimized algorithm,namely OPT-R,to accelerate reachability query processing.To make OPT-R work,we first propose optimal spanning tree,which contains the maximum number of reachable queries compared with other spanning trees,such that OPT-R can answer more reachable queries in constant time on average.The main idea behind the optimal spanning tree is that to make a spanning tree covering the maximal number of reachable node pairs,each node in the spanning tree should have the most number of ancestors.Based on this idea,we take the topological level of each node as its tree height,and construct a spanning tree in linear time,for which each node satisfies that its tree height equals its topological level in the given graph.The optimal spanning tree is used to accelerate testing of reachable queries.To accelerate the testing of unreachable queries,we propose the DFS based topological order,namely ST-order,and the reverse ST-order to help OPT-R answer more unreachable queries in constant time.We further propose efficient index construction algorithms to get the optimal spanning tree and the two ST-orders that are reverse to each other.We conduct extensive experimental study based on 20 real datasets.The experimental results show that OPT-R is more efficient in index construction and query processing.
引文
[1]Zhang Yu,Liu Yan-Bing,Xiong Gang,et al.Survey on succinct representation of graph data.Journal of Software,2014,25(9):1937-1952(in Chinese)(张宇,刘燕兵,熊刚等.图数据表示与压缩技术综述.软件学报,2014,25(9):1937-1952)
    [2]Ding Yue,Zhang Yang,Li Zhan-Huai,Wang Yong.Research and advances on graph data mining.Journal of Computer Applications,2012,32(1):182-190(in Chinese)(丁悦,张阳,李战怀,王勇.图数据挖掘技术的研究与进展.计算机应用,2012,32(1):182-190)
    [3]Chen L,Gupta A,Kurul M E.Stack-based algorithms for pattern matching on DAGs//Proceedings of the International Conference on Very Large Data Bases(VLDB).Trondheim,Norway,2005:493-504
    [4]Cheng J F,Yu J X,Ding B L.Cost-based query optimization for multi reachability joins//Proceedings of the 12th International Conference on Database Systems for Advanced Applications(DASFAA).Bangkok,Thailand,2007:18-30
    [5]Tarjan R E.Depth-first search and linear graph algorithms.SIAM Journal on Computing,1972,1(2):146-160
    [6]Yildirim H,Chaoji V,Zaki M J.GRAIL:Scalable reachability index for large graphs.The Proceedings of the VLDB Endowment(PVLDB),2010,3(1):276-284
    [7]Veloso R R,Cerf L,Junior W M,Zaki M J.Reachability queries in very large graphs:A fast refined online search approach//Proceedings of the 17th International Conference on Extending Database Technology(EDBT).Athens,Greece,2014:511-522
    [8]Jagadish H V.A compression technique to materialize transitive closure.ACM Transactions on Database Systems,1990,15(4):558-598
    [9]Chen Y J,Chen Y B.An efficient algorithm for answering graph reachability queries//Proceedings of the IEEE International Conference on Data Engineering(ICDE).Cancun,Mexico,2008:893-902
    [10]Jin R M,Xiang Y,Ruan N,Wang H X.Efficiently answering reachability queries on very large directed graphs//Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD).Vancouver,Canada,2008:595-608
    [11]Jin R M,Ruan N,Xiang Y,Wang H X.Path-tree:An efficient reachability indexing scheme for large directed graphs.ACM Transactions on Database Systems,2011,36(1):7:1-7:44
    [12]Agrawal R,Borgida A,Jagadish H V.Efficient management of transitive relationships in large sata and knowledge bases//Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD).Portland,USA,1989:253-262
    [13]Wang H X,He H,Yang J,et al.Dual labeling:Answering graph reachability queries in constant time//Proceedings of the IEEE International Conference on Data Engineering(ICDE).Atlanta,USA,2006:75
    [14]Cheng J F,Yu J X,Lin X M,et al.Fast computation of reachability labeling for large graphs//Proceedings of the10th International Conference on Extending Database Technology(EDBT).Munich,Germany,2006:961-979
    [15]Jin R M,Xiang Y,Ruan N,Fuhry D.3-HOP:A highcompression indexing scheme for reachability query//Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD).Providence,USA,2009:813-826
    [16]Cheng J,Huang S L,Wu H X,Fu W C.TF-Label:Atopological-folding labeling scheme for reachability querying in a large graph//Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD).New York,USA,2013:193-204
    [17]Jin R M,Wang G.Simple,fast,and scalable reachability oracle.The Proceedings of the VLDB Endowment,2013,6(14):1978-1989
    [18]Yano Y,Akiba T,Iwata Y,Yoshida Y.Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths//Proceedings of the Conference on Information and Knowledge Management(CIKM).San Francisco,USA,2013:1601-1606
    [19]Schaik S J V,Moor O D.A memory efficient reachability data structure through bit vector compression//Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD).Athens,Greece,2011:913-924
    [20]Chen Y J,Chen Y B.Decomposing DAGs into spanning trees:A new way to compress transitive closures//Proceedings of the IEEE International Conference on Data Engineering(ICDE).Hannover,Germany,2011:1007-1018
    [21]Silke T,Leser U.Fast and practical indexing and querying of very large graphs//Proceedings of the ACM SIGMODInternational Conference on Management of Data(SIGMOD).Beijing,China,2007:845-856
    [22]Yildirim H,Chaoji V,Zaki M J.GRAIL:A scalable index for reachability queries in very large graphs.The International Journal on Very Large Data Bases,2012,21(4):509-534
    [23]Seufert S,Anand A,Bedathur S J,Weikum G.FERRARI:Flexible and efficient reachability range assignment for graph indexing//Proceedings of the IEEE International Conference on Data Engineering(ICDE).Brisbane,Australia,2013:1009-1020
    [24]Wei H,Yu J X,Lu C,Jin R M.Reachability querying:An independent permutation labeling approach.The Proceedings of the VLDB Endowment,2014,7(12):1191-1202
    [25]Jin R M,Ruan N,Dey S,Yu J X.SCARAB:Scaling reachability computation on large graphs//Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD).Scottsdale,USA,2012:169-180
    [26]Cheng J,Shang Z C,Cheng H,et al.K-Reach:Who is in your small world.The Proceedings of the VLDB Endowment,2012,5(11):1292-1303
    [27]Zhu A D,Lin W Q,Wang S B,Xiao X K.Reachability queries on large dynamic graphs:A total order approach//Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD).Snowbird,USA,2014:1323-1334
    [28]Simon K.An improved algorithm for transitive closure on acyclic digraphs.Theoretical Computer Science,1988,58:325-346
    [29]Wu H H,Huang Y Z,Cheng J,et al.Reachability and time-based path queries in temporal graphs//Proceedings of the IEEE International Conference on Data Engineering(ICDE).Helsinki,Finland,2016:145-156
    [30]Gurajada S,Theobald M.Distributed set reachability//Proceedings of the ACM SIGMOD International Conference on Management of Data(SIGMOD).Francisco,USA,2016:1247-1261
    [31]Cheng J,Shang Z C,Cheng H,et al.Efficient processing of k-hop reachability queries.The International Journal on Very Large Data Bases,2014,23(2):227-252
    [32]Zhou Jun-Feng,Chen Wei,Fei Chun-Pin,Chen Zi-Yang.BiRch:A bidirectional search algorithm for k-step reachability queries.Journal on Communications,2015,36(8):50-60(in Chinese)(周军锋,陈伟,费春苹,陈子阳.BiRch:一种处理k步可达性查询的双向搜索算法.通信学报,2015,36(8):50-60)
    [33]Jiang H F,Wang W,Lu H J,Yu J X.Holistic twig joins on indexed XML documents//Proceedings of the 29th International Conference on Very Large Data Bases(VLDB).Berlin,Germany,2003:273-284
    [34]Evgenios M K,Ioannis G T.Overloaded orthogonal drawings//Proceedings of the Graph Drawing-19th International Symposium.Eindhoven,The Netherlands,2011:242-253
    [35]Boldi P,Santini M,Vigna S.A large time-aware web graph.SIGIR Forum,2008,42(2):33-38
    (1)ecocyc.org
    (2)www.geneontology.org
    (3)www.uniprot.org
    (4)arxiv.org
    (5)citeseerx.ist.psu.edu
    (6)snap.stanford.edu
    (7)dbpedia.org

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700