用户名: 密码: 验证码:
SSD文件系统优化技术的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机系统的发展,外部存储和CPU处理速度之间的差距越来越大,磁盘作为当前主流的外部存储设备是提高整个系统性能的一个瓶颈。和磁盘相比,基于Flash存储的固态盘SSD有很小的读延迟,快速随机读写访问,低功耗,可靠性高等优点,并且固态存储制造工艺将会越来越成熟,I/O性能也会越来越高,促使SSD逐步取代目前的HDD磁盘。
     SSD的技术特性和磁盘不同,当前文件系统的设计和优化假定的存储是磁盘,使得传统文件系统没有完全发挥SSD的优势。本文研究SSD文件系统优化技术,考虑如何充分利用SSD的技术特性提高系统性能,为将来实现面向SSD的高性能文件系统打下了基础。
     从两个层次对SSD的文件系统优化技术进行研究:一个层次是文件系统的缓冲机制。SSD读、写和擦除操作的I/O性能开销不对称,面向SSD的页面置换算法不仅要考虑命中率还要考虑失效开销,应该在保持一定命中率情况下尽可能延迟置换脏页减少开销较大的写失效,减少写回操作,提高I/O效率加速系统运行。我们提出了一个面向SSD的高效页面置换算法AML(Adaptively Mixed List),利用冷检测机制分别实现冷干净页优先置换和基于访问模式的自适应窗口机制。通过一个全面系统的trace-driven测试说明AML在各种应用下和其他算法相比写次数和运行开销都有较大的改善,比CFLRU算法平均快了14%。
     另一个是文件系统本身的优化技术包括块分配策略和延迟隐藏技术,其中延迟隐藏技术有空间的延迟分配,延迟写,文件预读等。SSD有高效的读性能,重点应该提高文件系统随机写性能,削弱写前擦除等特性的影响。通过对基本块分配策略的分析,提出了FBA(Flash-aware Block Allocation)块分配策略,根据大小对文件分类并分别采取不同的块分配方式,另外还涉及了文件系统理想块大小的选择。FBA减少了文件系统碎片,从而降低写操作引起的合并开销,再加上延迟隐藏技术中对空间延迟分配和延迟写的优化和改进,提高了文件系统的随机写性能。由于SSD可供隐藏的读延迟很小,预读机制影响有限,我们对传统文件系统的预读机制进行优化。最后,在传统文件系统上实现这些优化技术并进行测试,证明优化之后的文件系统性能确有提升,随机写的性能有11%的提高。
With the development of computer system, the performance gap between external storage devices and CPU is increasingly larger. Hard disk driver (HDD) as mainstream storage is the bottleneck to improve overall system. Comparing to HDD, the flash-based solid state disk (SDD) has the advantages of low read delay, fast random read and write, low power consumption and high reliability. Moreover, the manufacturing process of SSD will be more mature and I/O performance will be improving. Eventually, it is advised that SSD will replace the current HDD in the future.
     The SSD`s technical characteristics is different from HDD, and traditional file system design and optimization is on the assumption that storage system consists of HDDs, so the file system cannot fully play the advantages of SSD. An investigation and implementation of optimized file system technologies for SSD is made in the paper which focuses on how to take full advantage of the SSD. What we do lays the foundation for high-performance file system for SSD.
     The research is divided into two levels. First, it is from the view of file system buffer cache. SSD has asymmetric write/read operation. The overhead of a write operation is many times higher than that of a read operation. Hence, we must keep acceptable hit ratio and delay evicting the dirty pages as late as possible to let down the write miss ratio. The page replacement algorithm for SSD aims to reduce the write counts so that it can enhance the overall I/O performance of storage system. We suggest a novel algorithm called AML (Adaptively Mixed List) which prefers to evict the cold clean page and changes the window size adaptively according the history of I/O access pattern. Finally, we conduct an overall trace-driven simulation. The result shows AML reduces the runtime on average by 14% compared to CFLRU.
     Second, the attention is paid to the file system technologies themselves including block allocation scheme, allocation delay of requested space, write delay, read ahead and so on. Due to SSD’s high read performance and overwriting before erase, the optimization of random-write performance is the most important for it. Through the analysis of basic block allocation scheme in the traditional file system, we propose the flash-aware block allocation scheme called FBA (Flash-aware Block Allocation). FBA classifies files by size, and takes different approach according the classification. Besides, ideal file system block size is involved in the FBA. By reducing the file system fragmentation, FBA decrease the write overhead caused by the merger. Along with space allocation delay and write-delay optimization of hidden delay technology, random-write performance of file system is enhanced a lot. Since the SSD's read latency which can be hidden is very small and read-ahead mechanism has a limited impact on system performance, we make an optimization on it. Eventually, we achieve these optimized technologies in the traditional file system and conduct tests to demonstrate higher performance of the file system after optimization. The performance of random write improves at most by 11%.
引文
[1] Fred Douglis, Ramon Caceres, M. Frans Kaashoek, et al. Storage alternatives for mobile computers [C]. Operating Systems Design and Implementation, 1994, 25-37.
    [2] L. Han, Y. Ryu. Performance Comparison of File Systems on Flash Disk and Hard Disk [C]. Proceeding of Korea Multimedia Society Fall Conference, 2004.
    [3] R. Cáceres, F. Douglis, K. Li, B. Marsh. Operating System Implications of Solid-State Mobile Computers [C]. Proceeding of the 4th IEEE Workshop on Workstation Operating Systems, October 1993.
    [4] N. Agrawal, V. Prabhakaran, T. Wobber, et al. Design Tradeoffs for SSD Performance [C]. Proceeding of the 2008 USENIX Annual Technical Conference, June 22-27, 2008. 57–70.
    [5] D. Robb. Intel sees gold in solid state storage [EB/OL]. http://www.enterprisestorageforum.com/technology/article.php/3782826, 2008.
    [6] Feng Chen, David A. Koufaty, and Xiaodong Zhang. Understanding Intrinsic Characteristics and System Implications of Flash Memory based Solid State Drives [C]. SIGMETRICS/Performance’09, Seattle, WA, USA. June 15–19, 2009.
    [7] L. Bouganim, B. T. Jónsson, P. Bonnet.μFLIP: Understanding Flash IO Patterns [C]. Proceeding of the 4th Biennial Conference on Innovative Data Systems Research, 2009.
    [8] E. Gal, S. Toledo. Algorithms and data structures for Flash memories [C]. Computing Survey’05, 2005.
    [9] Flash File System [P]. US Patent 540,448. In Intel Corporation.
    [10] FTL Logger Exchanging Data with FTL Systems [R]. Intel Corporation.
    [11] Flash-memory Translation Layer for NAND Flash (NFTL) [S]. M-Systems, 1998.
    [12] Understanding the Flash Translation Layer (FTL) Specification [EB/OL], http://developer.intel.com/. Intel Corporation, Dec 1998.
    [13] L.P. Chang and T.W. Kuo. An Adaptive Striping Architecture for Flash Memory Storage Systems of Embedded Systems[C]. IEEE Real-Time and Embedded Technology and Applications Symposium, 2002. 187-196.
    [14] L.P. Chang and T.W. Kuo. An Efficient Management Scheme for Large-Scale Flash-Memory Storage Systems [C]. ACM Symposium on Applied Computing (SAC), Mar 2004. 862-868.
    [15] J.W. Hsieh, L.-P. Chang and T.-W. Kuo. Efficient On-Line Identification of Hot Data for Flash-Memory Management [C]. Proceeding of the 2005 ACM symposium on Applied computing, Mar 2005. 838-842.
    [16] J. C. Sheng. An Active Space Recycling Mechanism for Flash Storage Systems inReal-Time Application Environment [C]. 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Application (RTCSA'05), 2005. 53-59.
    [17] D. Rich. Authentication in Transient Storage Device Attachments [J]. Computer, 2007, Vol. 40(4):102-104,
    [18] K. Bellam, A. Manzanares, Xiaojun Ruan et al.Improving reliability and energy efficiency of disk systems via utilization control [C]. Computers and communications 2008(ISCC2008). IEEE Symposium on Marrakech, Morocco. 2008, 462-467
    [19]李锴,杨长兴.最新SSD技术与PC存储系统结构改进的研究[J].电脑知识与技术,2007,03:762-763
    [20] M-Systems. Two technologies compared: NOR vs NAND [R]. In White Paper, 2003.
    [21] Investigating Flash memory wear levelling and execution modes [R].
    [22] Super Talent Technology, Inc. SLC vs. MLC: An Analysis of Flash Memory[R]. In SLC vs. MLC Whitepaper.
    [23] Soojun Im and Dongkun Shin. Storage Architecture and Software Support for SLC/MLC Combined Flash Memory [C]. SAC’09, March 8-12, 2009, Honolulu, Hawaii, U.S.A.
    [24] T. Cho et al. A dual-mode NAND flash memory: 1-Gb multilevel and high-performance 512-Mb single-level modes [J]. IEEE Journal of Solid-State Circuits, Vol. 36, Issue 11, 2001.
    [25] K9F2808U0B 16M*8 Bit NAND Flash Memory Data Sheet [S], Samsung Electronics, 2001.
    [26] NAND Flash Memory MLC [R], Spectek, 2003.
    [27] H. J. Choi, S. Lim, and K. H. Park. JFTL: a flash translation layer based on a journal remapping for flash memory[C]. In ACM Transactions on Storage, volume 4, Jan 2009.
    [28] T. Chung, D. Park, S. Park, et al. System software for flash memory [C]. In Proceeding of ICEUC’06, 2006.
    [29] Gupta, Y. Kim, and B. Urgaonkar. DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings [C]. In Proceeding of ASPLOS’09, 2009.
    [30] J. Kang, H. Jo, J. Kim, and J. Lee. A superblock-based flash translation layer for NAND flash memory [C]. In Proceeding of ICES’06, 2006.
    [31] Kawaguchi, S. Nishioka, and H. Motoda. A flash-memory based file system [C]. In Proceeding of USENIX Winter, 1995.
    [32] J. Kim, J. M. Kim, S. H. Noh, et al. A space-efficient flash translation layer forcompact flash systems [C]. In IEEE Transactions on Consumer Electronics, 2002 Vol. 48(2):366-375,
    [33] S. Park. K-leveling: an efficient wear-leveling scheme for Flash memory [C]. In UKC05: Proceedings of The 2005 US-Korea Conference on Science, Technology, and Entrepreneurship, 2005.
    [34] L.P.Chang. An Efficient Wear-Leveling for Large-Scale Flash-Memory Storage Systems [C]. Proceedings of the 22nd ACM Symposium on Applied Computing. 2007.
    [35] Sandisk Flash Memory Cards Wear Leveling [EB/OL], http://www.sandisk.com/Assets/File/OEM/WhitePapersAndBrochures/RS-MMC/WPaperWearLevelv1.0.pdf, 2003.
    [36] Wear Leveling in Single Level Cell NAND Flash Memories [J], ST Microelectronics Application Note (AN1 822), 2006.
    [37] D.Jung, Y.Chae, H.Jo, et al. A Group-based Wear-Leveling Algorithm for Large-Capacity Flash Memory Storage Systems [C]. Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), ISBN 978-59593-826-8. September 2007.
    [38] L. P. Chang and T. W. Kuo. Real-time garbage collection for flash memory storage systems of real-time embedded systems [C]. ACM Transactions on Embedded Computing Systems, 2004, Vol. 3: 837-863.
    [39] M. L. Chiang and R. C. Chang. Cleaning policies in mobile computers using flash memory [J]. The Journal of Systems and Software, 1999, Vol. 48:213-231.
    [40] M. L. Chiang, P. C. H. Lee, and R. C. Chang. Using data clustering to improve cleaning performance for flash memory [J]. Software-Practice and Experience, 1999, Vol.29: 267-290.
    [41] M. Wu and W. Zwaenepoel. eNvy: A non-volatile, main memory storage system [C]. Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, 1994, 86-97.
    [42] Ban.Flash file system [P]. United States Patent, No. 404, 485. 1995.
    [43] Amir Ban. Flash file system optimized for page-mode flash technologies [P]. United States Patent, No.5, 937, 42.1999
    [44] Bum Kim, Gui Lee. Method of driving remapping in flash memory and flash memory architecture suitable therefore [P]. United Stated Patent. No. 6, 381, 176. 2002.
    [45] S.W. Lee, D.J. Park, T.S. Chung, et al. A log buffer-based flash translation layer using fully-associative sector translation [C]. ACM Transactions on Embedded Computing Systems, 2007 Vol. 6(3).
    [46] S. Lee, D. Shin, Y. Kim, and J. Kim. Last: locality-aware sector translation for nandflash memory-based storage systems [C]. In Proceeding of IEEE International Workshop on Storage and I/O Virtualization, Performance, Energy, Evaluation and Dependability (SPEED08), 2008.
    [47] Cagdas Dirik, Bruce Jacob. The Performance of PC Solid State Disks (SSDs)as a Function of Bandwidth, Concurrency, Device Architecture, and System Organization [C]. Proceeding 36th International Symposium on Computer Architecture (ISCA 2009). Austin TX, June.2009.
    [48] H. Kim and S. Ahn. BPLRU: a buffer management scheme for improving random writes in Flash storage [C]. Proceeding of the 6th USENIX Conference on File and Storage Technologies, 2008
    [49] SSD [EB/OL]. www.superSSD.com
    [50] David Woodhouse. JFFS: The Journalling Flash File System [C]. Ottawa Linux Symposium, 2001.
    [51] C. Manning. YAFFS: Yet another flash file system [EB/OL]. http://www.aleph1.co.uk/yaffs, 2004.
    [52] Mendel Rosenblum and John K. Ousterhout. The design and implementation of a log-structured file system [C]. ACM Transactions on Computer Systems, 1992, Vol. 10(1):26–52.
    [53] Birrell, M. Isard, C. Thacker, et al. A design for high-performance flash disks [R]. In Microsoft Research Technical Report, 2005.
    [54] J. Bonwick and B. Moore. ZFS: The Last Word in File Systems [EB/OL]. http://opensolaris.org/os/community/zfs/docs/zfs last.pdf.
    [55] Sun Microsystems. ZFS On-Disk Specification [EB/OL]. http://www.opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf.
    [56] R. Bitar. Deploying Hybrid Storage Pools with Sun Flash Technology and the Solaris ZFS File System [R]. Technical Report SUN-820-5881-10, Sun Microsystems, October 2008.
    [57] V. Aurora. A short history of btrfs [EB/OL]. www.LWN.net ,2009.
    [58] Btrfs [EB/OL]. http:// btrfs.wiki.kernel.org.
    [59] Oracle Corporation. Btrfs: A Checksumming Copy on Write Filesystem [EB/OL]. http://oss.oracle.com/projects/btrfs/.
    [60] B+tree [EB/OL]. http://en.wikipedia.org/wiki/B+tree
    [61] S. Jiang, X. Ding, F. Chen, et al. DULO: An effective buffer cache management scheme to exploit both temporal and spatial localities [C]. In Proceeding of FAST05, 2005.
    [62] Tanenbaum, S. Andrew. Operating Systems: Design and Implementation (Second Edition) [M]. New Jersey: Prentice-Hall 1997.
    [63] Samsun Semiconductor, Inc [R]. Product Selection Guide Memory and Storage January 2009.
    [64] S. Y. Park, D. Jung. CFLRU: A Replacement Algorithm for Flash Memory[C]. Proceeding of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 2006, 234-241.
    [65] H. Jung, H. Shim, S. Park, et al. LRU-WSR: Integration of LRU and writes sequence reordering for Flash memory [J]. IEEE Trans. Consumer Electron, 2008. Vol. 54(3): 1215-1223
    [66] H. Jo, J. Kang, S. Park, J. Kim et al. FAB: Flash-Aware Buffer Management Policy for Portable Media Players [J]. IEEE Trans on Consumer Electronics, 2006. Vol. 52(2): 485- 493.
    [67] Junseok Park, Hyejeong Lee. A Cost-aware Page Replacement Algorithm for NAND Flash Based Mobile Embedded Systems [C]. EMSOFT’09, Grenoble, France, October 12–16, 2009.
    [68] Y. S. Yoo, H. Lee, Y. Ryu, et al. Page Replacement Algorithms for NAND Flash Memory Storages [C]. Proceeding of International Conference on Computational Science and its Applications, 2007. 201-212.
    [69] S. Hyun, H. Bahn and K. Koh. LeCramFS: An Efficient Compressed File System for Flash-Based Portable Consumer Devices [J]. IEEE Trans on Consumer Electronics, 2007. Vol. 53(2):481-488.
    [70] SquashFS [EB/OL], http://squashfs.sourceforge.net/
    [71] CramFS [EB/OL], http://lxr.linux.no/source/fs/cramfs/README
    [72] S. Jiang, F. Chen and X. Zhang. CLOCK-Pro: An Effective Improvement of the CLOCK replacement [C]. Proceeding of USENIX Annual Technical Conference, 2005. 323-336.
    [73] S. Bansal and D. S. Modha. CAR: clock with adaptive replacement [C]. Proceedings of the 3rd USENIX Conference on File and Storage Technologies, 2004.
    [74] T. Johnson and D. Shasha. 2Q: A low overhead high performance buffer management replacement algorithm [C]. Proceedings of the VLDB Conference, 1994.
    [75] Y. Zhou and J.F. Philbin. The multi-queue replacement algorithm for second level buffer caches[C]. Proceedings of the USENIX Annual Technical Conference, 2001.
    [76] S. Ryu, C. Lee, S. Yoo and S. Seo. Flash-Aware Cluster Allocation Method Based on Filename Extension for FAT File System [C]. SAC’10, Sierre, Switzerland. March 22-26, 2010.
    [77] J. R. Douceur, W. J. Bolosky. A large-scale study of file-system contents [C]. Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Atlanta, Georgia, United States, 59-70
    [78] S. K. Kim, D. H. Lee, and S. L. Min. An Efficient Cluster Allocation Scheme forNAND Flash Memory Based FAT File Systems [C]. International Workshop on Software Support for Portable Storage, 2005.
    [79] O`REILLY.深入理解linux内核3版[M].北京:中国电力出版社,2007, 263-266.
    [80]毛德操,胡希明.Linux内核情景分析[M].杭州:浙江大学出版社,2001, 309-311.
    [81] J. D. C. Little. A proof of the queueing formula l=λw [J]. Operations Research, 1961, 9, 383-387.
    [82] Stephen W. Sherman. Trace Drive Modeling [C]. Symposium on the Simulation of Computer Systems IV, 1976.
    [83] X. Su, P. Q. Jin and L. H. Yue. Flash-DBSim: A Simulation Tool for Evaluating Flash-based Database Algorithms [C]. 2nd IEEE International Conference on Computer Science and Information Technology.
    [84] W. Norcott. IOzone file system benchmark [EB/OL]. http://www.iozone.org.
    [85] S. Kang, S. Park, H. Jung. Performance Trade-Offs in Using NVRAM Write Buffer for Flash Memory-Based Storage Devices [J]. IEEE Trans on Computers, 2009. Vol. 58(6):744-758.
    [86] Sang-Won Lee, Sooyong Kang. Report on Workshop on Operating Systems Support for Next Generation Large Scale NVRAM (NVRAMOS 2009) [J]. SIGMOD Record, June 2009 .Vol. 38(2):544-551.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700