印刷体数学公式识别系统的设计与实现

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

印刷体数学公式识别系统的设计与实现

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

副题名：分割、识别与重组
作者：侯利昌
论文级别：硕士
学科专业名称：计算数学
中文关键词：模式识别 ; 粘连字符分割 ; 矩特征 ; 主分量分析 ; 自组织映射 ; BP神经网络 ; 公式重构
英文关键词：pattern recognition ; character segmentation ; moment invariants ; principal component analysis ; self-organizing feature map ; bp neural network ; expression formation
学位年度：2004
导师：吴微
学科代码：070102
学位授予单位：大连理工大学
论文提交日期：2004-06-01

摘要

随着计算机的普及，人们越来越多的使用计算机处理日常工作和存储信息。目前广泛应用的OCR系统对手写、印刷体文本都有很高的识别率，已经广泛应用于办公自动化、快速录入等领域，克服了人工输入费时费力的缺点。但是，对于一篇科技文献，其中有大量的数学公式，它们是由特殊的符号、希腊字母、英文字符和数字组成的复杂的结构体。当前的OCR系统只能识别单个字符，还不能分析公式结构，这样识别出来的公式只是一组毫无关系的字符串，失去了它所表达的数学含义。为此，我们提出了一种新的关于表达式识别的设计思想，并给出了完整的算法，将印刷体的数学公式(图像格式)转换成可编辑的电子格式(如LATEX，Word公式编辑器)。
     按照表达式识别系统的流程，本文相应的分为以下四部分：
     粘连字符的分割。由于纸质文档的印刷质量、纸张的光洁度、扫描仪的分辨率、二值化等因素的影响，扫描得到的图像中的字符可能是粘连的。这为字符识别带来了困难。本文提出用自组织映射作字符分割的方法，对经典的自组织学习规则做了一些改进，使其能以较少的神经元结点、较快的速度逼近粘连字符的白像素点的分布。文中对最短路径分割方法和自组织映射法分割做了对比，后者能分割一些前者不能处理的粘连字符。
     特征提取与选择。一个字符图像只是模式空间中的特征，还不能用来分类，必须在它上面提取抗旋转、缩放、平移的几何不变性特征。文中介绍三种常用的矩方法：规则矩、Zernike矩和样条小波矩。通过计算这三种矩可分性度量，发现Zernike矩更适于做字符的特征。文中还介绍了基于神经网络的主分量分析方法，在38维矩特征中选取18维的主特征，保留信息量的同时，大大降低了特征矢量的维数，消除了样本间的相关性，突出了差异性。
     字符识别。分类器是整个识别系统的核心。神经网络已经被广泛用于模式识别，克服了当前常用的模式识别方法的缺点，有效提高了识别率。文中用自组织特征映射做字符的粗分类，将特征相近的字符分在一组。然后BP神经网络对各组字符做细分类，识别出同一组的不同字符，有效地提高了分类精度。
     公式重构。如何从一组字符中判断它们复杂的结构至今也没有很好的解决。文中将介绍一种新的公式重构的方法。主要包括上下标定位的方法、符合LL(1)文法的数学表达式构成规则和语法分析器。无序的字符串通过语法分析器生成语法树，最终被转换成可编辑的LATEX公式格式。
     文章最后，以一定数量的英文数学资料作实验，结果表明该系统具有一定的实际应用价值，但是还有待进一步改进。
The computerized document-handling systems have been widely used, but few systems have provided functions for recognizing and understanding mathematics expressions printed in document. The system proposed in this article has the ability to recognize mathematics expressions in files scanned directly from paper and to reconstruct the recognized expressions into particular publication format such as LATEX or WORD.
    The system works as follows :
    merged-symbol segmentation. Due to the quality of printer, cleanliness of paper, resolution of scanner, binarization etc., symbols in scanned document may be merged, therefore, can not be easily recognized. In this article, we proposed a new method, self-organizing feature map, to segment merged-symbol. By modifying the classic updating rule of self-organizing map, we obtained a network that can approximate the distribution of white-pixels between two symbols in less training time and with less units.
    feature extraction and selection. A symbol in image file can not be classified directly, cause it is not invariant with respect to image translation, orientation and size changes. In this article, we investigated three kinds of moment features that used as a shape descriptor: regular moments, Zernike moments and B-spline wavelet moments. We also used PCA neural network to select principal features, which reduced dimensions of feature space while retaining useful information.
    character recognition. Recognizer is key part in our system. Neural networks, which overcome the disadvantages of traditional pattern recognition methods, have been used extensively on OCR and have achieved higher recognition rate. In this article, we used SOFM network as rough-classifier, which classify similar symbols into same group. After that, we used BP network as fine-classifier, which identified symbols within one group.
    expression formation. So far, the problem of understanding a complicated mathematics expressions in a printed document has not been completely solved yet. We introduced a formation algorithm for locating the superscript and subscript, and for analyzing the two-dimensional layout structure of the symbols within a expression. Then the structure of a recognized expression was represented by a tree structure and the original expression could be reproduced by using a suitable formatter like LATEX.
    The experimental results at the end of article have demonstrated the feasibility of the system. But the model we proposed still needs further improvement for commercial application.

引文

[1]D. O. Hebb. The Organization of Behavior: A neuropsychological theory. New York: Wiley, 1949.
    [2]J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the USA,1982, 79:2554-2558
    [3]徐秉铮，张百灵，韦岗．神经网络理论与应用[M]．广州：华南理工大学出版社，1994．
    [4]焦李成．神经网络系统理论[M]．西安：西安电子科技大学出版社，1996．
    [5]AUBIN. J-P. Neural Networks and Qualitative Physics[M]. Cambridge: Cambridge University Press, 1996.
    [6]阎平凡，张长水．人工神经网络与模拟进化计算．北京：清华大学出版社，2000．
    [7]陈世福，陈兆乾．人工智能与知识工程．南京：南京大学出版社，1997．
    [8]傅京孙．模式识别应用．北京：北京大学出版社，1990，
    [9]赵卫海．基于神经网络的数学公式分割与识别系统．大连理工大学硕士学位论文，2002．3．
    [10]郭力宾．交叉点的神经网络识别及联机手写字符的概率神经网络识别初探．大连理工大学硕士学位论文，2003．3．
    [11]王秋分．计算机图像识别[M]．铁道出版社，1986．
    [12]夏波涌，刘政凯．中文地图文字图像理解的研究[J]．中国图像图形学报，1997，3(12)：979-982
    [13]张远鹏．计算机图象处理技术基础．北京：北京大学出版社，1996：129-137
    [14]Y.Lu. Machine Printed Character Segmentation-an Overview. Pattern Recognition,1995,28(1):67-80
    [15]M.C.Jung, Y.C.Shin, S.N. Srihari. Machine Printed Character Segmentation Mcthod using Side Profiles. Proc. IEEE Int. Conf. Systems, Man and Cybernetics, Tokyo, Japan, Oct. 1999.
    [16]J.H.Bae, K.C.Jung. Segmentation of touching characters using an MLP. Pattern Recognition Letters,1998,19:701-709
    [17]J.Wang. Segmentation of Merged characters by neural network and shortest path. Pattern Recognition, 1994,27(5) :649-658
    [18]A.Datta, T.Pal, S.K.Parui. A modified self-organizing neural net for shape extraction. Neurocomputing, 1997,14:3-14
    [19]T.Kohonen. The Self-Organizing Map. Proc. of the IEEE,1990,78(9):1464-1480
    [20]J.A.Lee, M.Verleysen, Self-organizing maps with recursive neighborhood adaptation, Neural Networks,2002,15 (8-9) :993-1003
    [21]CHO-HUAK TEH, ROLAND T CHIN. On Image Analysis by the Methods of Moments. IEEE Trans. On PAMI,1988.7,10(4):496-513
    [22]M K Hu. Visual pattern recognition by moment invariants. IEEE Trans. Inf. Theory,1962,12:179-187
    [23]Chaur-Chin Chen. Improved Moment Invariants for Shape Discrimination. Pattern Recognition,1993,26(5):683-686
    [24]Jan Flusser, Tomáě Suk. Pattern Recognition by Affine Moment Invariants. Pattern Recognition, 1993,26 (1): 167-174
    [25]Yajun Li. Reforming the Theory of Invariant Moments for Pattern Recognition. Pattern Recognition, 1992,25(7):723-730
    [26]M R Teague. Image Analysis via the General Theory of Moments. Journal of Opt. Soc. Amer., 1980,70:920-930
    [27]Alireze Khotanzad, Yaw Hua Hong. Invariant Image Recognition by Zernike Moments. IEEE TRANS.on Pattern Analysis and Machine Intelligence,1995.5,12(5):489
    [28]C Kintner Eric. On the mathematical properties of the Zernike polynomials. Optica

    Acta,1976,23(8):679-680
    [29]Robert J Noll. Zernike Polynomials and Atmospheric Turbulence. Journal Opt. Soc. Am.,1976,66(3):207-211
    [30]R Mukunkan, K R Ramakrishnan. Fast Computation of Legendre and Zernike Moments. Pattern Recognition, 1995,28(9):1433-1442
    [31]Simon X Liao, Miroslaw Pawlak. On the Accuracy of Zernike Moments for Image Analysis. IEEE Trans. On Pattern Analysis and Machine Intelligence,1998.11,20(12).
    [32]Shen D, HoraceHSIp. Discriminative wavelet shape de-scriptors for recognition of 2-Dpatterns. Pattern Recognition, 1999, 32(2):151—165
    [33]Jianchang Mao, Anil K Jain. Artificial Neural Networks for Feature Extraction and Multivariat Data Projection. IEEE Trans. On Neural Networks,1995.3,6(2):296-317
    [34]Erkki Oja, Hidemitsu Ogawa. Principal Component Analysis by Homogeneous Neural Network. IEICE. Trans. INF. &SYST.,1992.5,E75-D(3).
    [35]Juha Karhunen, Jyrki Joutsensalo. Generalizations of Principal Component Analysis, Optimization Problems and Neural Networks. Neural Networks,1995,8(4):549-562
    [36]李金宗．模式识别导论[M]．北京：高等教育出版社，1994．
    [37]孙即祥，王晓华等．模式识别中的特征提取与计算机视觉不变量[M]．北京：国防工业出版社，2001．
    [38]孔俊，吴微，赵卫海．识别数学符号的神经网络方法[J]．吉林大学学报，2001，3：11-16
    [39]辛大欣，徐永久．BP神经网络在数字识别方面的应用[J]．西安工业学院学报，1995，15(1)：72-76
    [40]S N Jack, Jin Wang. Weight Smoothing to Improve Network Generalization. IEEE Trans. On Neural Networks, 1994.9,5 (5).
    [41]Guang-Bin Huang, Haroon A Babri. Upper Bounds on the Number of Hidden Neurons in Feedforward Networks with Arbitrary Bounded Nonlinear Activation Functions. IEEE Trans. on Neural Networks, 1998,9(1) :224-229
    [42]Vitaly Maiorov, Allan Pinkus. Lower Bounds for Approximation by MLP Neural Networks. Neurocomputing, 1999,25:81-91
    [43]Shin'ichi Tarmura. Capabilities of a Four-Layered Feedforward Neural Network: Four Layers Versus Three. IEEE Trans. on Neural Networks,1997,8(2):251-255
    [44]Guang-Bin Huang. Learnig Capability and Storage Capacity of Two-Hidden-Layer Feedforward Networks. IEEE Trans. on Neural Networks,1998,14(2):274-281
    [45]K Hornik. Approximation capabilities of Multilayer Feedforward Networks. Neural Networks, 1991,4:251-257
    [46]Vladimir N Vapnik. An Overview of Statistical Learning Theory. IEEE Trans. on Neural Networks, 1999,10(5):988-999
    [47]Anselm Blumer, Andrzej Ehrenfeucht, David Haussler. Learnability and the Vapnik-Chervonenkis Dimension. Journal of the Association for Computing Machinery,1989,36(4):929-965
    [48]Pascal Koiran, Eduardo D Sontag. Neural Networks with Quadratic VC Dimension. Journal of Computer and System Sciences,1997,54:190-198
    [49]M Cottrell, J C Fort, G Pagès. Theoretical Aspects of the SOM Algorithm. Neurocomputing, 1998,21:119-138
    [50]Thomas Villmann, Ralf Der, Michael Herrmann. Topology Preservation in Self-Organizing Featrue Maps: Exact Definition and Measurement. IEEE Trans. on Neural Networks,1997,8(2):256-266
    [51]Yi Zheng, James F Greenleaf. The Effect of Concave and Convex Weight Adjustments on Self Organizing Maps. IEEE Trans. on Neural Networks,1996,7(1):87-96


    [52]Nasser M Nasrabadi, Robert A King. Image Coding Using Vector Quantization: A Review. IEEE Trans. on Communications,1988,36(8):957-969
    [53]Thomas M Martinetz, Stanislav G Berkovich, Klaus J Schulten. "Neural-Gas" Network for Vector Quantization and its Application to Time-Series Prediction. IEEE Trans. on Neural Networks,1993,4(4) :558-568
    [54]Juha Vesanto, Esa Alhoniemi. Clustering of the Self-Organizing Map. IEEE Trans. on Neural Networks,2000,11 (3):586-600
    [55]Pierre Demartines, Jeanny Hérault. Curvilinear Component Analysis: A Self-Organizing Neural Network for Nonlinear Mapping of Data Sets. IEEE Trans. on Neural Networks,1997,8(1):148-154
    [56]Thomas Martinetz, Klaus Schulten. Topology Representing Networks. Neural Networks,1994,7(3):507-522
    [57]Wei Wu, Zhiqiong Shao. Convergence of Online Gradient Methods for Continuous Perceptrons with Linearly Separable Training Patterns. Applied Mathematics Letters,2003,16:999-1002
    [58]Zhengxue Li, Wei Wu, Yuelong Tian. Convergence of an Online Gradient Method for Feedforward Neural Networks with Stochastic Inputs. Journal of Computational and Applied Mathematics,2004,163(1):165-176
    [59]Jun Kong, Wei WU. Online Gradient Methods with a Punishing Term for Neural Networks. Northeastern Mathematical Journal,2001,17(3):371-378
    [60]Zhengxue Li, Wei Wu, Hongwei Zhang. Convergence of On-line Gradient Methods for Two-Layer Feedforward Neural Networks. Journal of Mathematical Research and Exposition,2001,21(2):219-228
    [61]Wei Wu, Yuesheng Xu. Deterministic Convergence of an Online Gradient Method for Neural Networks. Journal of Computational and Applied Mathematics,2002,144(1-2):335-347
    [62]Wei Wu, Yuesheng Xu. Convergence of Online Gradient Methods for Neural Networks. accepted by Computational and Applied Mathematics.
    [63]His-Jian Lee, Min-Chou Lee. Understanding mathematical Expressions Using Procedure-oriented Transformation[J]. Pattern Recognition,1994,27(3):447
    [64]Hashim M Twaakyondo, Masayuki Okamoto. Structure Analysis and Recognition of Mathematical Expressions[J]. Proceedings of the Third International Conference on Document Analysis and Recognition, 1995,1.
    [65]His-Jian Lee, Jiumn-Shine Wang. Design of a Mathematical Expression Recognition System[J]. Proceedings of the Third International Conference on Document Analysis and Recognition,1995,2.
    [66]Jekyu Ha, Robert M.Haralick, Ihsin T Phillips. Understanding Mathematical Expressions from Document Images[J]. Proceedings of the Third International Conference on Document Analysis and Recognition, 1995,2.
    [67]邓建松，彭冉冉，陈长松．LATEX2_ε 科技排版指南[M]．北京：科学出版社，2001．
    [68]Kenneth C Louden. Compiler Construction: Principles and Practice[M]. Brooks Cole,1997.
    [69]吕迎芝．编译原理[M]．北京：清华大学出版社，1998
    [70]卢达，谢铭培，浦炜．基于印刷字符模糊结构分析的字符预分类方法[J]．软件学报，2000，11(10)：1397-1404

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700