摘要
随着藏文信息处理技术的发展,藏文压缩也成了必不可少的一门研究内容。但是目前的研究成果只有一个,然而现实环境中需要一个适应于不同场合的藏文文本压缩技术。该文根据藏文文本的特点,提出两种改进的LZW数据压缩算法对藏英文混合文本进行数据压缩并无损解压。通过实验结果表明,该算法是一个适应于不同场合的文本压缩技术。
With the development of Tibetan information processing technology,Tibetan compression has become an essential research content.However,there is only one research result at present.But in reality,the Tibetan text compression technology is needed for different occasions.Based on the characteristics of Tibetan text,two improved LZW data compression algorithms are proposed for both compression and lossless decompression of Tibetan-English bilingual text.Experimental results show that the algorithm is an effective text compression technique adapted to different occasions.
引文
[1]David Salomon.Data Compression:The Complete Reference[M].Springer-Verlag New York Inc.,2000.
[2]CHO Gyoun-yon,CHO Dong-ho.A study on the efficient compression algorithm of the voice/data integrated multiplexer[A].IEEE International Conference on Communications,1995:18-22.
[3]HAYASHJ S,et al.A new source coding method based on LZW adopting the least recently used deletion heuristic[A].IEEE Pacific Rim Conference On Communications[C].Computers and Signal Processing,1993:190-193.
[4]边巴旺堆,等.基于LZ77算法的藏文文本压缩算法设计与实现[J].西藏大学学报,2010,25(2):50-55.
[5]祁坤钰.藏文分词与标注研究[M].甘肃:甘肃民族出版社,2015.
[6]Unicode9.0.http://www.unicode.org/charts/PDF/U0F00.pdf.