用户名: 密码: 验证码:
基于文件相关性的云存储缓存策略
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Cloud storage caching strategy based on file correlation
  • 作者:肖芳 ; 周可
  • 英文作者:XIAO Fang;ZHOU Ke;School of Computer Science and Technology,Huazhong University of Science and Technology;Library,Huazhong University of Science and Technology;
  • 关键词:云存储 ; 缓存策略 ; 命中率 ; 文件相关性 ; 文件预取
  • 英文关键词:cloud storage;;caching strategy;;hit rate;;file correlation;;file prefetching
  • 中文刊名:HZLG
  • 英文刊名:Journal of Huazhong University of Science and Technology(Natural Science Edition)
  • 机构:华中科技大学计算机科学与技术学院;华中科技大学图书馆;
  • 出版日期:2019-04-12 11:28
  • 出版单位:华中科技大学学报(自然科学版)
  • 年:2019
  • 期:v.47;No.436
  • 语种:中文;
  • 页:HZLG201904001
  • 页数:6
  • CN:04
  • ISSN:42-1658/N
  • 分类号:6-11
摘要
分析云存储数据访问的长尾现象,设计一种基于文件相关性的缓存策略MSU(mostsimilarunit).该策略通过判断文件之间的相关性完成大容量缓存中的文件预取与替换.首先,MSU选择文件的多个访问特征作为计算余弦距离值的输入,从而得到文件相关性的度量.然后,MSU将缓存中的文件作为替换待选集合,将一段时间内从缓存中替换出来的文件作为预取待选集合.当出现文件不命中时,从替换待选集合中取得缺失文件的k-非近邻作为替换文件,从预取待选集合中取得缺失文件的1-近邻作为预取文件.仿真实验表明MSU在命中率和字节命中率方面优于LRU(最近最少使用策略)、ARC(自适应替换策略)和GDS(多参数贪心策略)算法.
        By considering the long tail distributions of cloud storage data access,a caching strategy named MSU(most similar unit)was designed based on the file correlation.According to the correlation between files,file prefetching and replacement in large-capacity cache could be completed by the strategy.Firstly,some access features were chosen as input for cosine distance by MSU,and the measurement of file correlation was obtained.Then,MSU established two file sets,replacing set that consist of the files in cache and prefetching set that consist of the files which were replaced from cache in a time period.When a file was missed,k-non-nearest neighbor of the missing file from the replacing set was used as replacement file,and 1-nearest neighbor of the missing file from the prefetching set was used as the prefetching file.Results of simulation experiment show that MSU outperforms LRU(least recently used),ARC(adaptive replacement cache) and GDS(greedy dual-size) in hit rate and byte hit rate.
引文
[1]HUMMEN R,HENZE M,CATREIN D,et al.A cloud design for user-controlled storage and processing of sensor data[C]//Proc of International Conference on Cloud Computing Technology and Science.Los Alamitos:IEEE Computer Society,2013:232-240.
    [2]DRAGO I,MELLIA M,MUNAFO M M,et al.Inside dropbox:understanding personal cloud storage services[C]//Proc of Internet Measurement Conference.Boston:Association for Computing Machinery,2012:481-494.
    [3]STAFF S.Dealing with data:Challenges and opportunities[J].Science,2011,331(6018):692-693.
    [4]FERGUSON A R,NIELSON J L,CRAGIN M H,et al.Big data from small data:data-sharing in the long tail of neuroscience[J].Nature Neuroscience,2014,17(11):1442-1447.
    [5]LIANG J,LUO J,MARK D,et al.Storage and performance optimization of long tail key access in a social network[C]//Proc of International Workshop on Cloud Data and Platforms.Prague:Association for Computing Machinery,2013:1-6.
    [6]黄启峰,郑纬民,沈美明.一种机群文件系统的缓存模型[J].小型微型计算机系统,2003,24(10):1748-1752.
    [7]LEE S,HYUN S J,KIM H Y,et al.APS:adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems[J].Journal of Supercomputing,2018,74(8):1-33.
    [8]GONG Y,HU C,XU Y,et al.A distributed file system with variable sized objects for enhanced random writes[J].Computer Journal,2016,59(10):1536-1550.
    [9]RODRIGUEZ J R.Variable size prefetch cache:US,US20030105926[P].2003-02-16.
    [10]SHI L,DING X,WEI L,et al.SPN model for Web prefetching and caching[C]//Proc of 2007 International Conference on Semantics.Beijing:IEEE Computer Society,2007:158-163.
    [11]ROADKNIGHT C,MARSHALL I,VEARER D.File popularity characterisation[J].Acm Sigmetrics Performance Evaluation Review,2000,27(4):45-50.
    [12]VANICHPUN S,MAKOWSKI A M.The output of a cache under the independent reference model:where did the locality of reference go[J].Acm Sigmetrics Performance Evaluation Review,2004,32(1):295-306.
    [13]CHERKASOVA L,CIARDO G,et al.Characterizing temporal locality and its impact on Web server performance[C]//Proc of Ninth International Conference on Computer Communications and Networks.Las Vegas:IEEE,2000(82):434-441.
    [14]CAO P,IRANI S.Cost-aware WWW proxy caching algorithms[C]//Proc of the USENIX Symposium on Internet Technologies and Systems.Berkeley:USENIXAssociation,1997:193-206.
    [15]游小容,曹晟.海量教育资源中小文件的存储研究[J].计算机科学,2015,42(10):76-80.
    [16]LIU Z,DONG F,ZHANG J,et al.A client-side directory prefetching mechanism for GlusterFS[C]//Proc of 2016IEEE International Conference on Systems,Man,and Cybernetics.Piscataway:IEEE,2017:3942-3947.
    [17]LIAO J,TRAHAY F,XIAO G,et al.Performing initiative data prefetching in distributed file systems for cloud computing[J].IEEE Transactions on Cloud Computing,2017,5(3):550-562.
    [18]李晓明,闫宏飞,王继民.搜索引擎:原理、技术与系统[M].北京:科学出版社,2012.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700