用户名: 密码: 验证码:
多字体印刷蒙文字识别技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
文字识别是集模式识别、人工智能与文字处理于一体的高新技术,能自动地把文字或其他信息通过智能识别输入计算机,用以代替人工输入。文字识别不仅有广泛的应用领域,而且也促进了模式识别、文字处理技术的发展。该领域一直是国际上计算机智能技术的研究热点,也是我国高技术研究计划(863计划)重点支持的主题。蒙古文是内蒙古自治区的主体民族语言,在中国,使用蒙古文的地区除了内蒙古自治区,还有黑龙江、吉林、辽宁、新疆等省和自治区。目前,大部分输入方法的研究集中在键盘编码输入方式上,对蒙文字识别的研究非常少,关于印刷体蒙文识别输入的研究还是一个空白点,这严重地制约了信息技术在少数民族地区的普及和应用。针对这种现状,我们提出研制多字体印刷蒙文识别系统,为蒙古文的输入提供一种智能的输入方式,这对继承和发展少数民族文化、促进民族地区的社会进步具有重要的意义。
     蒙文在内蒙古自治区使用广泛,但输入均使用键盘编码输入方式,蒙文自动识别输入还是一个空白点。因此本课题的研究为蒙文输入提供了一种新的自动化和智能的方式,使蒙文信息处理达到一个新的水平。蒙古文字是拼音文字,但其书写方式在当今世界是非常独特的,与汉文和西文有很大不同。蒙文是从左到右、从上到下竖写,每个词中所有字母连着写,形成一个竖直的主干线,且每一个字母在一个词中的词首、词中和词尾所取的字形不一样。这些特点给蒙文的识别带来很大的困难。因而在研究的过程中,我们不仅要充分消化和吸收西文和汉文识别所采用的技术,还要结合蒙文书写的特点有所创新,才能较好地解决所遇到的困难。研究课题的目的是:从文字识别的角度来研究蒙文字特征的选择及特征提取、基元分割、匹配等一系列问题,开发出一个有良好人机界面,操作方便的多字体印刷蒙文识别系统。
Character Recognition is a newly sophisticated technology, which involves Pattern Recognition, Artificial Intelligence, Character Processing. Automatic input of characters and other information can be realized through this intelligent recognition. Character Recognition not only has a wide range of application, but also facilitates the progress of Pattern Recognition and Character Processing. This technology is the research focus of international Computing Intelligence as well as the important subject of sophisticated technology research program in China. Mongolian is the main body language in Inner Mongolian autonomous region. In China, Mongolian is also used in Heilongjiang, Jilin, Liaoning, Xinjiang and so on. At present, most of input modes are using keyboard. Almost nothing was done about Mongolian recognition at that time, which seriously impeds the development and application of information technology in the Minority region. Under this circumstance, we propose the research of Multi-Font printed Mongoli
    an characters recognition, which can not only provide an automatic method of Mongolian input, but also has a far-reaching meaning about inheriting and developing the Minority culture.
    Mongolian is widely used in Inner Mongolian autonomous region. But most of the Mongolian input is still using keyboard, automatic recognition input is just beginning. This subject provides a new, automatic and intelligent input mode, which carries Mongolian processing to a new and higher level. Mongolian is a kind of spelling characters, which has a very special written structure different from Chinese and English characters. Mongolian is written from left to right, from top to bottom, all letters are connected together to form a vertical backbone, and every letter may have different shapes in different positions. All these characteristics bring many difficulties to recognition procedure. So during the process of research, we should assimilate the experience and technology used in Chinese and English recognition, and at the same time create some new methods according to Mongolian written structure. Our research aim is: from the character recognition point of view, accomplishing Mongolian feature selection, f
    eature extraction, primitive segmentation, matching etc, developing a Multi-Font printed Mongolian recognition system with desirable man-machine interface.
引文
[1] A.Amin, G.Masini, Machine recognition of multi-fonts printed Arabic texts, 8th International Conf.on Pattern Recognition, Paris, 392-395, 1986.
    [2] Koefich A L, Sabourin R, Suen C Y, El-Yacoubi A. A syntax-directed level building algorithm for large vocabulary handwritten word recognition. 4th IWDAS, 2000, Rio de Janeiro, Brazil.
    [3] H. Goto, H. Aso. Robust and fast text-line extraction using local linearity of the text-line. Syst Comput Japan 26(13). 21-31, 1995.
    [4] H. Hase, T. Shinokawa, M. Yoneda, M. Sakai, H.Maruyama. Character string extraction from a color document. Prec. 5th Int. Conf. Document Analysis and Recognition ICDAR'99 75-78, 1999.
    [5] H. Kasuga, M. Okamoto, H. Yamamoto. Extraction of characters from color documents. Proc. SPIE 3967-31, 2000.
    [6] P. Parodi, R. Fontana. Efficient and flexible text extraction from document pages. Int J Doc Anal Recognition 2(2). 67-79, 1999.
    [7] K. Sobottka, H. Kronenberg, T. Perroud, H. Bunke. Text extraction from colored bookand journal covers. Int J Doc Anal Recognition 2(4). 163-176, 2000.
    [8] L. O'Gorman, R. Kasturi. Document image analysis. IEEE Comput Soc, 1995.
    [9] G. Dimauro, S. Impedovo, G. Pirlo, A. Salzo. Automatic bankcheck processing: a new engineered system. In, S. Impedovo, P.S.P. Wang, H. Bunke (eds) Automatic Bankcheck Processing. World Scientific, Singapore, 5-42, 1997.
    [10] J. Geeo X. Ding, Y. Wu. A segmentation algorithm for handwritten Chinese character strings. Prec. 5th Int. Conf. on Document Analysis and Recognition, ICDAR'99, 633-636, 1999.
    [11] C.H. Leung, L. Sze. Feature selection in the recognition of handwritten Chinese characters. Eng. Appl. Artif. Intell. 10(5). 495-502, 1997.
    [12] L.Y. Tseng, R.C. Chen. A new method for segmenting handwritten Chinese characters. Prec. 4th Int. Conf. On Document Analysis and Recognition, Vol. 2, 568-571, 1999.
    [13] J.M.Westall, M.S. Narasimha. An evolutionary approach to the use of neural networks in the segmentation of handwritten numerals. In: S. Impedovo, P.S.P. Wang, H. Bunke (eds) Automatic Bankcheck Processing. World Scientific, Singapore, 255-272, 1997.
    [14] Henry S. Baird, George Nagy. A self-correcting 100-font classi_er. Proc of the SPIE 2181, San Jose, CA, 1994.
    [15] Wenyin L, Doff D. Genericity in graphics recognition algorithms. In: Tombre K, Chhabra A (eds) Proceedings of the Graphics recognition: algorithms and systems. Lecture Notes in Computer Science 1389. Berlin: Springer, 9-20, 1998.
    [16] A. Lawrence Spitz. Using character shape codes for word spotting in document images. In: D Dori, A
    
    Bruckstein (eds.) Shape, Structure and Pattern Recognition, World Scientific, Singapore, 382-389, 1995.
    [17] S.W. Jeong, S.H. Kim, W.H. Cho. Performance comparison of statistical and neural network classifiers in handwritten digits recognition. In: S-W. Lee (ed.), Advances in Handwriting Recognition, World Scientific, Singapore, 406-415, 1999.
    [18] T.M. Ha, J. Zimmermann, H. Bunke. Off-line handwritten numeral string recognition by combining segmentation-based and segmentation-free methods. Pattern Recognition 31(3), 257-272, 1998.
    [19] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Cradientbased learning applied to document recognition. Proc. IEEE 86(11), 2278-2324, 1998.
    [20] 娄震,胡钟山,杨靖宇,基于轮廓分段特征的手写体阿拉伯数字识别,计算机学报,1999,(10),1065-1073.
    [21] 邹荣金,蔡士杰,字符粘连与字线相交的分割与识别方法,软件学报,1999,(3):241-247.
    [22] 丁晓青,郭繁夏,中文OCR技术最新进展,电子出版,1995,(12):652-653,1996,(1):7-15.
    [23] 王永庆,《人工智能原理与方法》,西安,西安交通大学出版社出版,1999.
    [24] 史忠植,《智能主体及其应用》,北京,科学出版社,2000.
    [25] 张炘中,汉字识别技术,北京,清华大学出版社 广西科学技术出版社,1992.
    [26] 田捷,数字图书馆技术与应用,北京,科学出版社,2002.
    [27] 朱志刚,Kenneth.R.Castleman 著,朱志刚,林学,石定机等译,北京,电子工业出版社,1998.
    [28] 冈萨雷斯,数字图像处理,第二版,北京,电子工业出版社,2003.
    [29] 刘传憬,一个实用的多字体多字号印刷汉字OCR系统.计算机应用研究,1995,(4):57-59.
    [30] 金忠,胡钟山,杨静宇,手写体数字有效鉴别特征的抽取与识别,计算机研究与发展,1999(12):1484-1489.
    [31] 薛炳如,杨静宇,小类别数手写汉字识别,计算机研究与发展,2000(4):1484-1489.
    [32] 马少平,夏莹,朱小燕,汉字的层次轮廓特征及其应用,清华大学学报,1995(5):79-83.
    [33] 崔怀林,基于笔划特征的手写汉字分类与识别字典的构造方法,模式识别与人工智能,1998(6):228-232.
    [34] 赵明,用于手写汉字识别的二维扩展属性文法中的文法归约,计算机学报,1990(7):521-527.
    [35] 徐志明,王晓龙等,联机手写体汉字识别后处理技术的研究.计算机研究与发展,1999(5):608-612.
    [36] 刘春刚,梁德群,带有手写干扰的印刷体数字识别.西安交通大学学报,1998(1):22-24.
    [37] 吕岳,施鹏飞,张克华,基于汉字结构特征的自由格式手写体汉字切分,电子学报,2000(5):102-104.
    [38] 谢光毅,钟义信,神经网络用于手写体数字识别.模式识别与人工智能,1994(4):334-337.
    [39] 郭军,马跃等,发展中的文字识别理论及技术.电子学报,1995(10):184-187.
    [40] 田学东,郭宝兰,汉字识别系统中的版面分析算法.微机发展,1999(1):8-9.
    [41] 费越,汪力新等,竞争监督学习法在集成型识别系统中的应用.自动化学报,1995(5):301-308.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700