用户名: 密码: 验证码:
文本挖掘在中医药中的若干应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
文本挖掘是人工智能、机器学习、自然语言处理、数据挖掘及相关自动文本处理如信息抽取、信息检索、文本分类等理论和技术相结合的产物,它得到了越来越多研究人员的关注。文本挖掘是数据挖掘研究面向文本数据的自然延伸,其研究仍处于婴儿期,在方法和应用方面均未成熟。中医药学作为生命科学具备中国特色的传统医学组成部分,在疾病诊治和方药使用等方面具有特色和显著的临床疗效,并包含着丰富的知识,几千年的医学实践积累获得了大量的数据。在中医药学信息化建设的基础上进行KDD研究具有重要意义。中医药领域未存在文本挖掘的相关研究,本文在多个方面如文献临床复方药物组成和科属配伍知识发现、中医术语及关系抽取和中医证候基因关系知识发现等进行了研究。本文研究内容包括如下四个方面:
     ● 进行基于字特征的中文文本分类研究,实验表明字特征是中文文本分类的高效特征表示方法。提出了分布字聚类方法,该方法无需分词、具有低达10~2数量级的特征维数和高性能的特点,其与NB结合的性能接近基于词特征的SVM分类器,微平均准确率达到86%。
     ● 进行中医药文献信息抽取研究,提出了Bubble-bootstrapping和ATP方法,该方法无需任何浅层中文自然语言处理、专业词库和已标注的训练语料,是一种接近无导师的可缩放性、可移植性信息抽取方法。在近40万文献题录的复方名称和疾病名称抽取实验中,取得了平均准确率达99%,F1值65%左右的结果。应用于中医药文献自动标引的副主题词抽取,达到80%的F1值。ATP是一种semi-hard的模式方法,是未来信息抽取研究的技术方向之一。
     ● 进行文献临床复方药物组成文本挖掘研究,提出了复方科属配伍的概念,并进行了临床复方科属配伍知识发现研究,实现了MeDisco/3T文本挖掘系统。MeDisco/3T实验表明,复方文本挖掘研究具有较高的质量和实际应用价值,复方用药中存在科属配伍的规律,并能进行挖掘发现。
     ● 整合利用中医药文献库和生物医学文献库(Medline)进行中医证候和基因相关关系知识发现研究,实现了原型系统MeDisco/3S,并进行了初步实验和分析,表明MeDisco/3S能为辅助中西医结合研究和生命科学交叉研究提供智能化的知识发现平台,是进行生物医学文本挖掘和多学科信息整合研究的典型范例。
Text Mining is a new interdisciplinary field that combines the disciples of Artificial Intelligence,
    Machine Learning, Data Mining and text automatic processing techniques (e.g. Information
    Extraction, Information Retrieval and Text Classification). Many researches have been intensively
    conducted on it. It is said that Text Mining is the natural extension of traditional KDD to
    unstructured text data. However, Text Mining is still in its infancy. There is much work to be done
    on the approaches and applications of Text Mining. Traditional Chinese Medicine (TCM) is an
    important component of traditional medicine in life science, which has some special Chinese
    characteristics. TCM has been played a significant role in the healthcare life of Chinese people. It
    has clinical effectives and characteristics in disease diagnosis and treatment, and Chinese Medical
    Formula&drug therapies. Immense high valuable medical data has been accumulated during the
    several thousand years' practice. The huge data storages build the foundations of KDD and push it
    to significant practical use. Due to rare text mining studies in TCM field, this thesis gives several
    techniques and applications in TCM text mining researches. These studies include as follows:
     Focusing on the study of character based Chinese text classification. A systematic comparative
    experiment has been conducted on character based Chinese text classification, and the results
    show that character is an efficient and effective feature in Chinese text classification,
    furthermore, a novel feature generation method named Distributional Character Clustering is
    proposed and gets a state of the art performance. It has some special advantages such as very
    low and almost fixed dimensionality (e.g. 102 features), no word segmentation and with high
    performance (DCC based NB gets the similar performance as word based SVM). This is a
    novel promising feature representation method in Chinese text classification.
     Due to the necessarily of extraction of TCM terms such as Chinese Medical Formula and
    diseases names from TCM bibliographic literature, this thesis also focuses on boostrapping
    method to terminology extraction. A new bootstrapping method called Bubble-bootstrapping
    and ATP is proposed. It is a scalable and almost unsupervised information extraction method
    with no need of any shallow Chinese NLP techniques and labeled training corpus. The
    experiments on 400,000 bibliographic records show that the ATP based Bubble-bootstrapping
    method gets very high performance (about 99% precision and 65% Fl score). Furthermore, it
    gets about 80% Fl score when applied to automatic subject indexing as subheading extraction
    method.
     Focusing on drug component frequent itemset discovery in clinical Chinese Medical Formula from literature. This thesis proposes the concepts of CMF drug plant family composition and gives knowledge discovery study on it. A prototype system named MeDisco using text mining techniques is proposed, which aims to implement drug component frequent itemset discovery on clinical Chinese Medical Formula from TCM bibliographic literature. The experiments show that CMF drug knowledge discovery using text mining is practical useful and valuable. There exist some drug plant family compositional rules on CMF use, and they can aumatically be mined from data.
    
    
    Another text mining system called MeDisco/3S has been developed to uncover the hidden knowledge among TCM literature and modern biomedical literature (Medline), which gives an approach to find the functional relationships between TCM Symptom Complex and gene. MeDisco/3S will propose a promising intelligent knowledge discovery platform to facilitate the interdisciplinal researches of life science. It is the first example effort of biomedical literature discovery and information integration in life science.
引文
[1].Adamic L. A., Wilkinson D., Huberman B. A., Adar E., A Literature Based Method for Identifying Gene-Disease Connections. In Proceedings of the IEEE Computer Society Bioinformatics Conference (CSB2002). pp. 109-117. 2002
    [2].Agichtein E. and Gravano L., Snowball: Extracting relations from large plaintext collections. Proceedings of the 5th ACM International Conference on Digital Libraries, June 2000. http://citeseer.ist.psu.edu/agichtein00snowball. html.
    [3].Agrawal R., Srikant R., Fast Algorithms for Mining Association Rules, Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.
    [4].Ahonen H., Finding all maximal frequent sequences in text. In ICML-99 Workshop, Machine Learning in Text Data Analysis, Bled, Slovenia, 1999. http://citeseer.ist.psu.edu/ahonen-myka99finding. html.
    [5].Ahonen-Myka H., Heinonen O., Klemettinen M., and Inkeri-Verkamo A., Finding co-occurring text phrases by combining sequence and frequent set discovery. In R. Feldman, editor, Proc. 16th Int. Joint Conference on Arti. cial Intelligence IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, pages 1 - 9, 1999.
    [6].Ahonen H., Heinonen O., Klemettinen M., and Verkamo A. I., Mining in the Phrasal Frontier, 1st European Symposium of Principle on Data Mining and Knowledge Discovery (PKDD-97), Lecture Notes In Computer Science, Springer-Verlag, London, UK, pp. 343-350. 1997.
    [7].Andrade M. A., Borka P., Automated extraction of information in molecular biology. FEBS Letters 476 (2000) Issue: 1-2, 12-17.
    [8].Andrade M. A. and Valencia A, Automatic annotation for biological sequences by extraction of keywords from Medline abstracts. Development of a prototype system. Proc Int Conf Intell Syst Mol Biol. 5: 25-32.1997.
    [9].Armstrong S., editor. Using Large Corpora. MIT Press. 1994.
    [10].Arrowsmith home page. http://arrowsmith.psych.uic.edu/arrowsmith_uic/index.html.
    [11].Baeza-Yates R. A. and Ribeiro-Neto B. A., Modern Information Retrieval. ACM Press / Addison-Wesley, 1999.
    [12].Baker L. D., McCallum A., Distributional Clustering of words for text classification. Proceedings of 21st SIGIR. ACM Press, New York, NY, USA, pp. 96-103, 1998.
    [13].Barnes, J. C., Conceptual biology: a semantic issue and more. Nature 417, 587-588. 2002.
    [14].Bechhofer S., Horrocks I., Goble C, Stevens R., OilEd: a Reason-able Ontology Editor for the Semantic Web. In Proc. of the Joint German/Austrian Conf. on Artificial Intelligence, in Lecture Notes in Artificial Intelligence, number 2174, pages 396-408. Springer-Verlag, 2001.
    [15].Becker K, G., Hosack D. A., Dennis G. J., Lempicki R. A., Bright T. J., Cheadle C. and Engel J., PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics 2003, 4: 61. This article is available from: http://www.biomedcentral.com/1471-2105/4/61.
    [16].Beil F., Ester M., Xu X., Frequent Term-Based Text Clustering. KDD-02. ACM Press, New York, NY, USA Pages: 436-442, 2002.
    [17].Bekkerman, R. et. al.: Distributional Word Clusters vs. Words for Text Categorization. JMLR, 1 (2002) 1-48.
    [18].Bekkerman R., EIYaniv R., Tishby N. and Winter Y., On feature distributional clustering for text categorization. Proc. of the 24st SIGIR, ACM Press, New York, NY, USA, pp.146-153. 2001.
    [19].BITOLA home page. http://www.mf uni-lj.si/bitola/.
    
    
    [20]. Blagosklonny M. V., & Pardee A. B., Unearthing the gems. (2002). Nature, 2002 416(6879). 373.
    [21]. Blake C. and Pratt W., Automatically Identifying Candidate Treatments from Existing Medical Literature. AAAI, 2002. http:/www.ischool.washington.edu/wpratt/publications.html.
    [22]. Blaschke C., Andrade M.A., Ouzounis C. et al., Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions, Proc. of ISMB, pp. 60-67, 1999.
    [23]. Blum A. and Mitchell T., Combining labeled and unlabeled data with co-training. In COLT: Proceedings of the Workshop on Computational Learning Theory. ACM Press, New York, NY, USA. pp. 92-100, 1998.
    [24]. Bow library, http://www-2.cs.cmu.edu/~mccallum/bow/.
    [25]. Brin S., Extracting patterns and relations from the world wide web. In WebDB Workshop at EDBT-98, Lecture Notes In Computer Science, Springer-Verlag, London, UK. pp. 172-183, 1998.
    [26]. Bruijn B. and Maratin J., Getting to the (c)ore of knowledge: mining biomedical literature. International Journal of Medical Informatics 67 (2002) 7 -18.
    [27]. Bunescu R., Ge R., Rohit J. K. et al, Comparative Experiments on Learning Information Extractors for Proteins and their Interactions, Special Issue in the Journal Artificial Intelligence in Medicine on Summarization and Information Extraction from Medical Documents. 2004 (in press).
    [28]. Bunescu R., Ge R., Rohit J. K. et al, Learning to Extract Proteins and their Interactions from Medline Abstracts. Proceedings of ICML-2003 Workshop on Machine Learning in Bioinformatics, pp. 46-53, Washington DC, August 2003.
    [29]. Califf M. E. and Mooney R. J., Relational Learning of Pattern-Match Rules for Information Extraction. In Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp. 6-11, Standford, CA, March 1998.
    [30]. Califf, M. E., Relational Learning Techniques for Natural Language Information Extraction. Ph.D. thesis, Department of Computer Sciences, University of Texas, Austin, TX. Also appears as Artificial Intelligence Laboratory Technical Report AI 98-276 (see http://www.cs.utexas.edu/users/ai-lab).
    [31].曹素丽,曾伏虎,曹焕光,基于汉字字频向量的中文文本自动分类系统,山西大学学报(自然科学版)22(2):144—149,1999.
    [32]. Chang J. T., Raychaudhuri S., Altman R. B., Including biological literature improves homology search. Pac Symp Biocomput 2001, 24 (1): 374-83.
    [33]. Cherfi H., Napoli A., Toussaint Y., Towards a Text Mining Methodology Using Frequent Itemsets and Association Rule Extraction. JIM Knowledge Discovery and Discrete Mathematics, France. 2003.
    [34]. Chilibot home page. http://www.chilibot.net/.
    [35]. Cios K. J. and Moore G. W., Uniqueness of Medical Data Mining, Artificial Intelligence in Medicine Volume: 26, Issue: 1-2, September-October, 2002, pp. 1-24.
    [36]. Ciravegna F., Learning to Tag for Information Extraction from Text. In Proceedings of the ECAI-2000 Workshop on Machine Learning for Information Extraction, F. Ciravegna et al. (Eds.), Berlin, August 2000.
    [37]. Clifton C. and Cooley R., TopCat: Data Mining for Topic Identification in a Text Corpus. IEEE trasactions on knowledge and data engineering, Vol. 16, No.8, pp.949-964.August, 2004.
    [38]. Clearforest homepage, http://www.clearforest.com/.
    [39]. R. Cooley, B. Mobasher, and J. Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web. 9th International Conference on Tools with Artificial Intelligence, p.558, IEEE Computer Society, Washington, DC, USA, November, 1997.
    [40]. Cowie, J., & Lehnert, W. Information Extraction. Communications of the ACM, 39(1), 1996, 80-91.
    [41]. Craven M., Learning to Extract Relations from Medline. AAAI-99 Workshop on Machine Learning for Information Extraction - July 19, 1999, Orlando Florida.
    
    
    [42].Craven M., DiPasquo D., Freitag D., McCallum A., Mitchell T., Nigam K. and Slattery S., Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence, Volume 118, Issue 1-2, pp. 69-113, April, 2000.
    [43].Craven M. and Kumlien J., Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99), pp. 77-86, 1999.
    [44].Craven M., The genomics of a signaling pathway: A kdd cup challenge task. Technical report, University of Wisconsin, December 2002.
    [45].Delgado M., Martin-Bautista M. J., Sanchez D. and Vila M. A., Mining Text Data: Special Features and Patterns, D. J. Hand et al. (Eds.): Pattern Detection and Discovery, LNAI 2447, pp. 140-153, 2002.
    [46].Delgado M., Martin-Bautista M. J., Sanchez D. and Vila M. A., Association Rule Extraction for Text Mining, T. Andreasen et al. (Eds.): FQAS 2002, LNAI 2522, pp. 154-162, 2002.
    [47].Dhillon I. S. and Mallela S., Enhance word clustering for hierarchical text classification. SIGKDD-02, pp. 23-26.
    [48].Diao Li-li, Hu Ke-yun, Lu Yu-chang, Shi Chun-yi, Improved Stumps Combined by Boosting for Text Categorization. Journal of Software, Vol.13, No.8, pages: 1361-07.
    [49].Ding J., Berleant D., Nettleton D. and Wurtele E., Mining medline: abstracts, sentences, or phrases? Pacific Symposium on Biocomputing, pages 326-337, 2002.
    [50].Dorre J., Gerstl P., Seiffert R., Text Mining: Finding Nuggets in Mountains of Text Data. Fifth SIGKDD, pp. 398-401, 1999.
    [51].Sebastiani F., Machine Learning in Automated Text Categorisation. ACM Computing Surveys, Vol. 34, No. 1, March 2002, pp. 1 - 47.
    [52].Fayyad U. M., Piatesky-Shapiro, G., Smyth, P., and Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAi/MIT Press, 1996.
    [53].Feldman R., eds. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99) Workshop on Text Mining: Foundations, Techniques and Applications. 1999.
    [54].Feldman, R. & Dagan, I., Knowledge discovery in textual databases (KDT). In proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, August 20-21, AAAI Press, 112-117. 1995
    [55].Feldman R., Dagan I., and Klosgen W, Efficient algorithms for mining and manipulating associations in texts. In Cybernetics and Systems, Vol. 2, The 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria, April 1996.
    [56].Feldman R., Fresko M., Kinar Y. et al, Text Mining at the Term Level. Lecture Notes in Computer Science. Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery., pp.65-73, 1998.
    [57].Feldman R., Hirsh H., Exploiting Background Information in Knowledge Discovery from Text. Journal of Intelligent Information Systems, 9(1): 83-97, 1996.
    [58].Feldman R., Hirsh H., Mining Association in Text in the Presence of Background Knowledge. Proceedings of the 2nd International Conference on Knowledge Discovery (KDD-96), Portland. pp. 343-346, Aug 1996.
    [59].Feldman R., Aumann Y., Amir A., Klsgen W., Zilberstien A., Maximal Association Rules: a New Tool for Mining for Keyword co-occurrences in Document Collections. In: Proceedings of the 3rd International Conference on Knowledge Discovery (KDD), Newport Beach, pp.167-170, CA, Aug 1997.
    [60].Feldman R., Klosgen W., Ben-Yehuda Y., Kedar G., and Reznikow V., Pattern based browsing in document collections. Principles of data mining and knowledge discovery, Vol. 1263: 112-122, June 1997.
    
    
    [61].Feldman R., Regev Y., Finkelstein-Landau M., Hurvitz E. & Kogan B., Mining biomedical literature using information extraction, www.inpharm.com/static/intelligence/pdf/MAG_13758.pdf.
    [62].Pereira F., Tishby N. and Lee L., Distributional clustering of English words. ACL-93, pp. 183-190.1993.
    [63].Friedman C., Kra P., Kranthammer M., Yu H. and Rzhetsky A., GENIES: A Natural-Language Processing System for the Extraction of Molecular Pathways from Complete Journal Articles. Bioinformatics. 17.Suppl. 1 2001.pages.74-82.
    [64].Freitag D., Using Grammatical Inference to Improve Precision in Information Extraction, in Working Papers of the ICML-97 Workshop on Automata Induction, Grammatical Inference and Language Acquisition, P. Dupont (Ed.), 1997.
    [65].Freitag D., Information Extraction from HTML: Application of a General Learning Approach. Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence. AAAI Press, pp. 517-523, 1998.
    [66].Freitag D., Toward General-Purpose Learning for Information Extraction. in Proceedings of the Seventeenth International Conference on Computational Linguistics (COLING-ACL-98), Association for Computational Linguistics, Morristown, NJ, USA. pp. 404-408, 1998.
    [67].Freitag D., Multistrategy learning for information Extraction. In the Proceedings of the Fifteenth Machine Learning Conference (ML-98), J. Shavlik (Ed.), Madison, USA, Morgan Kaufmann, San Francisco, CA, pp. 161-169,1998.
    [68].Freitag D., Machine Learning for lnformation Extraction in Informal Domains. Machine Learning, 39 (2-3): 169-202, May-June, 2000.
    [69].Freitag D. and McCallum A., Information Extraction with HMMs and Shrinkage. Proc. Workshop on ML and IE, AAAI-99, 1999.
    [70].Freudenberg J. and Propping P., A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics, Vol. 18 Suppl.2 2002, Pages S110-S115.
    [71].Fu Y., Mostata J. and Seki K., Protein Association Discovery in Biomedical Literature. Proceedings of the third ACM/IEEE-CS joint conference on Digital libraries. Houston, Texas. pp. 113-115, 2003.
    [72].Fukuda K., Tamura A., Tsunoda T., and Takagi T., Toward information extraction: Identifying protein names from biological papers. In Proc. Pacific Symposium on Biocomputing'98, pages 707-718, Maui, Hawaii, January 1998.
    [73].Fuller S, Revere D, Bugni P, Reber L, Fuller H, Martin G. M., Modeling a Concept-based Information System to Promote Scientific Discovery: The Telemakus System. In Proc of AMIA Annu Fall Syrup, pp. 1023, 2002.
    [74].Gordon M. D., Lindsay R. K., Toward discovery support systems: A replication, re-examination, and extension of Swanson's work on literature-based discovery of a connection between Raynaud's and fish oil. J Am Soc Inf Sci, 47 (2): 116-128, 1996.
    [75].Gordon M. D., Lindsay R. K., Literature-based discovery by lexical statistics. J Am Soc Inf Sci 47 (2): 116-128, 1999.
    [76].Grishman R. and Kittredge R., editors. Analyzing Language in Restricted Domains: Sublanguage Description and Processing. Lawrence Erlbaum Assoc., Hillsdale, NJ, 1986.
    [77].Grishman R., Adaptive Information Extraction and Sublanguage Analysis. IJCAI-2001 Workshop on Adaptive Text Extraction and Mining. pp. 77-79, 2001.
    [78].Gruninger M. and Fox M. S., Methodology for the Design and Evaluation of Ontologies. Workshop on Basic Ontological Issues in Knowledge Sharing, IJCAI-95, Montreal. 1995.
    [79].G2D home page. http://www.bork.ernbl-heidelberg.de/g2d.
    [80].Hafner C. D., Baclawski K., Futrelle R. P., Fridman N., and Sampath S., Creating a knowledge base of
    
    biological research papers. ISMB, 2, 147-55. 1994.
    [81].HAPI Home Page. http://array.ucsd.edu/hapi/.
    [82].Hatzivassiloglou V. and Duboue P. A. and Rzhetsky A., Disambiguating proteins, genes, and RNA in text: a machine learning approach. Bioinformatics. 17 Suppl 1: S97-S106. 2001.
    [83].Hayes P. J., Andersen P. M., Nirenburg I. B. and Schmandt L. M., Tcs: a shell for content-based text categorization. In Proceedingsof CAIA-90, 6th IEEE Conference on Artificial Intelligence Applications (Santa Barbara, CA), 320-326. 1990.
    [84].He J. et. al., On Machine Learning Methods for Chinese Document Categorization. Applied Intelligence, 3,18, 311-322, 2003.
    [85].Hearst M. A., Untangling Text Data Mining, Proceedings of ACL'99: the 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, June 20-26, 1999 (invited paper).
    [86].Hearst M. A., Text data mining: Issues, techniques, and the relationship to information access. Presentation notes for UW/MS workshop on data mining, July 1997.
    [87].Hearst M. A., Information integration. IEEE Intelligent Systems (Sept./Oct. 1998) 12-24.
    [88].Hirschman L., Park J.C., Tsujii J. et al, Accomplishments and challenges in literature data mining for biology. Bioinformatics Review. 2002, 18(12): 1553-1561.
    [89].Hoffmann R., Valencia A., A gene network for navigating the literature. Nature Genetics 36, 664 (2004).
    [90].Hofmann T., Probabilistic Latent Semantic Indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp.50-57,1999.
    [91].Holt J. D. and Chung S. M., Efficient Mining of Association Rules in Text Databases. Proceedings of the eighth international conference on Information and knowledge management (CIKM). pp. 234-242, 1999.
    [92].Houston A., Chen H., Hubbard S. M., Schatz B. R. et al., Medical data mining on the internet: Research on a cancer information system. Artificial Intelligence Review, special issue on the Application of Data Mining, 13(5-6): 437-466, 1999.
    [93].Hristovski D. et al, Literature Based Discovery Support System and its Application to Disease Gene Identification. Book Chapter, 2001.
    [94].Hristovski D., Peterlin B., Mitchell J. A. et al, Improving literature based discovery support by genetic knowledge integration. Stud Health Technol Inform 2003, 95:68-73.
    [95].Hristovski D, Stare J, Peterlin B, Dzeroski S. Supporting discovery in medicine by association rule mining in Medline and UMLS. Medinfo. 2001, 10 (Pt 2):1344-8.
    [96].Hsu, C.-N., Dung M.-T., Generating finite-state transducers for semi-structured data extraction from the Web. Information Systems. Volume 23, Issue 9, pp.521-538, December 1998.
    [97].Hu X. et al. Extracting and Mining Protein-Protein Interaction Network from Biomedical Literature. Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (IEEE CIBCB).2004.
    [98].Hu X. and Yoo I., Scalable Learning Method to Extract Biological Information from Huge Online Biomedical Literature. Chapter 23 in Computational Web Intelligence: Intelligent Technology for Web Applications, World Scientific Publisher (in print), 2004.
    [99].Huffman, S. B., Learning information extraction patterns from examples. In Wermter, S., Riloff, E., & Scheler, G. (Eds.), Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, pp. 246-260. Springer, Berlin. 1996.
    [100].Humphreys B. L. & Lindberg D. A., The Unified Medical Language System: an informatics research collaboration. Am Med Inform Assoc. 5(1998) 1-11.
    [101].IBM Intelligent Miner for Text. http://www.software.ibm.com/data/iminer/fortext/index.html.
    
    
    [102]. Ideker T. et al, A new approach to decoding life: Systems Biology. Annu. Rev. Genomics Hum. Genet. 2001, 2.
    [103]. International Human Genome Sequencing Consortium, Initial Sequencing and Analysis of the Human Genome. Nature, 2001, 409: 860-921.
    [104]. Itskevitch J., Automatic Hierarchical E-mail Classification Using Association Rules. M. Sc. thesis, Computing Science, Simon Fraser University, July 2001.
    [105]. Jenssen T.-K., et al, A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28. 21-28(2001).
    [106]. Jhingran A. D., Mattos N., Pirahesh H., Information integration: a research agenda. IBM Systems Journal, Dec, 2002.
    [107]. Joachims T., A Statistical Learning Model of Text Classification for Support Vector Machines. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. pp. 128-136, 2001.
    [108]. Joachims T., Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proceedings of ECML-98. pp. 137-142.1998.
    [109]. Jones, R., McCallum, A., Nigam, K. and E. Riloff. Bootstrapping for Text Learning Tasks. In IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications. pp. 52-63. 1999.
    [110]. Keerthi S. S. et al, A machine learning approach for the curation of biomedical literature - kdd cup 2002 (task 1). Technical report, National University of Singapore, December 2002.
    [111]. Kim J. and Moldovan D., Acquisition of Semantic Patterns for Information Extraction from Corpora. In Proceedings of the Ninth IEEE Conference on Artificial Intelligence for Applications, pages 171-176, Los Alamitos, CA, IEEE Computer Society Press, 1993.
    [112]. Kodratoff Y., Knowledge Discovery in. Texts: A Definition, and Applications, Proc. ISMIS'99, Warsaw, June 1999.
    [113]. Koller D. and Sahami M., Hierarchically classifying documents using very few words. Proceedings of the Fourteenth International Conference on Machine Learning. pp. 170-178.1997.
    [114]. Kowalczyk A. and Raskutti B., One class svm for yeast regulation prediction. Technical report, Telstra Research Laboratories, December 2002.
    [115]. Kwok K. L., Comparing Representations in Chinese Information Retrieval, SIGIR'97, PAGES:34-41, Philadelphia, Pennsylvania, United States.
    [116]. Lagus K., Honkela T., Kaski S., Kohonen T., WEBSOM for Textual Data Mining. Artificial Intelligence Review. Volume 13, Issue 5-6, pp.345-364, December 1999.
    [117]. Lambert D. and Pinheiro J., Mining a stream of transactions for customer patterns. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001) (San Francisco, Aug. 26-29). ACM Press, New York, 2001, 305-310.
    [118].雷顺群,刘宝生.论中药品种与临床疗效的关系,《中国临床医生》2002年,第20卷第6期,56-57。
    [119]. Lent B., Agrawal R., Srikant R., Discovering Trends in Text Databases. In: Proceedings of the 3rd International Conference on Knowledge Discovery (KDD), AAAI Press, pp.227-230.1997.
    [120]. Lewis D. D., An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval (Kobenhavn, DK), pp. 37 - 50. 1992.
    [121]. Lewis, D. D, Feature Selection and Feature Extraction for Text Categorization. Proceedings of Speech and Natural Language Workshop. Morgan Kaufmann, San Mateo, California. pp. 212-217. 1992.
    [122]. Lewis, D. D, Representation Quality in Text Classification: An Introduction and Experiment. http://-
    
    www.research.att.com/~lewis/papers/lewis90g.ps.
    [123].Lewis D. D, Evaluating Text Categorization. Proceedings of Speech and Natural Language Workshop, Morgan Kaufmann, San Marco, California. pp. 312-318, 1991.
    [124].Li W., Han J., and Pei J., CMAR: Accurate and Efficient Classification Based on Multiple Class-association Rules (Regular paper). In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01), San Jose, California, Novermber 29-December 2, 2001.
    [125].Lin D. and Pantei P., DIRT-Discovery of Inference Rules from Text.KDD-01. http://www.isi.edu/~pantel/Download/Papers/kdd01-1.pdf.
    [126].Lindberg D. A. B., Humphreys B. L., McCray A. T., The Unified Medical Language System. Meth Inform Med. 32(1993) 281- 91.
    [127].Loh S., Wives L. K., & de Oliveira J. P. M.. Concept-based knowledge discovery in texts extracted from the Web. SIGKDD Explorations, 2(1), 29-39.2000.
    [128].Long J. M., Irani E. A., and Slagle J. R., Automating the discovery of causal relationships in medical records database. In G. Piatetsky-Shapiro and W. J. Frawley (eds.), Knowledge Discovery in databases, pages 465-476. AAAI Press/MIT Press, 1991.
    [129].Mannila H. and Toivonen H., Discovering generalized episodes using minimal occurrences. In KDD-96, pp. 146-151, Portland, Oregon, USA, August 1996. AAAI Press.
    [130].Marcotte E. M., Xenarios L. and Eisenberg D., Mining literature for protein-protein interactions. Bioinformatics, 2001, 17(4): 359-363.
    [131].McCallum, A., &Jesen, D., A note on the unification of information extraction and data mining using conditional-probability, relational models. In Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data. Acapulco, Mexico. 2003,
    [132].MedGene home page. http://hipseq.med.harvard.edu/MedGene/login.jsp.
    [133].MedMiner home page. http://discover.nci.nih.gov/textmining/main.jsp.
    [134].MeSHmap home page. http://geordi.info-science.uiowa.edu/cgi-bin/ManjalMain.cgi.
    [135].Mladenic D., Feature subset selection in text-learning. Proc. of the 10th European Conference on Machine Learning (ECML-98), LNCS, pp.95-100. 1998.
    [136].Nahm U. Y., Text Mining with Inforrnation Extraction. Ph. D. Thesis, Department of Computer Sciences, University of Texas at Austin, 217 pages, August 2004.
    [137].Nahm U. Y. and Mooney R. J., Text Mining with Information Extraction. To appear in the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002.
    [138].Nahm U. Y. and Mooney R. J., Mooney. Mining Soft-Matching Rules from Textual Data. Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), pp. 979 - 984, Seattle, WA, August 2001.
    [139].Nahm U. Y. and Mooney R. J., A Mutually Beneficial Integration of Data Mining and Information Extraction. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, pp. 627-632, July 2000.
    [140].Nahm U. Y. and Mooney R. J., Mooney.Using Information Extraction to Aid the Discovery of Prediction Rules from Text. Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, pp. 51-58, Boston, MA, August 2000.
    [141].Nahm U. Y. and Mooney R. J., A Mutually Beneficial Integration of Data Mining and Information Extraction. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, pp. 627-632, July 2000.
    [142].Nigam K., Lafferty J. and McCallum A., Using Maximum. Entropy for Text Classification. In IJCAI-99
    
    Workshop on Machine Learning for Information Filtering, pp. 61-67. 1999.
    [143]. Nigam K.P., Using Unlabeled Data to Improve Text Classification. PhD Thesis, May 2001. CMU-CS-01-126.
    [144]. Park J. C. and Kim H. S. and Kim J. J., Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar. Pac Symp Biocomput. 6(2001)396-407.
    [145]. Peng F.C., Schuurmans D. and Wang S. J., Augmenting Naive Bayes Classifiers with Statistical Language Models. JIR, 7, 317-345, 2004.
    [146]. Peng F. C., Huang, X.J., Schuurmans D., and Wang S, J., Text Classification in Asian Languages without Word Segmentation. In Proceedings of the The Sixth International Workshop on Information Retrieval with Asian Languages (IRAL 2003), July 7, 2003, Sapporo, Japan.
    [147]. Perez-Iratxeta C., Bork P. & Andrade M. A., Association of genes to genetically inherited diseases using data mining, letter to nature genetics, volume 31, july 2002.
    [148]. Piatetsky-Shapiro G., Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. Frawley, editors, Knowledge Discovery in Databases, pages 229-238. AAAI/MIT Press, 1991.
    [149]. Protégé 2000 home page at http://protege.stanford.edu.
    [150].卜东波,白硕,李国杰,聚类分类中的粒度原理,2002,25(8).-810-816。
    [151]. PubMatrlx home page. hap://pubmatrix.grc.nia.nih.gov/.
    [152].乔延江,李澎涛,苏钢强,肖培根,王永炎。中药(复方)KDD研究开发的意义。北京中医药大学学报,1998,21(3):15-17。
    [153]. Rajman M. and Besancon R., Text mining: Natural language techniques and text mining applications. In Proc. Of the 7th IFIP Working Conference on Database Semantics (DS-7). Chapam & Hall, 1997.
    [154]. Rajman M. and Besancon R., Text Mining-Knowledge extraction from unstructured textual data. 6th Conference of International Federation of Classification Societies (IFCS-98), Rome.
    [155]. Ramadan, N. M., et al. Low Brain Magnesium in Migraine. Headache 29: 416-419, 1989.
    [156]. Regev Y. et al, Rule-based extraction of experimental evidence in the biomedical domain - the kdd cup 2002 (task 1). Technical report, ClearForest and Celera, December 2002.
    [157]. Revere D, Fuller S. S., Bugni P, Martin G. M., An information extraction and representation system for rapid review of the biomedical literature. Accepted for presentation at: MedInfo-2004, 09/2004.
    [158]. Revere D, Fuller S. S, Bugni P, Martin G. M., A new system to support knowledge discovery: Telemakus. In Proc of the Am Soc for Inform Sci & Tech Annu Meet, 2003, pp. 52-58.
    [159]. Riloff E., Little Words Can Make a Big Difference for Text Classification. 18th {ACM} International Conference on Research and Development in Information Retrieval. pp. 130-136.1995.
    [160]. Riloff E., Wendy Lehnert. Information extraction as a basis for high-precision text classification. ACM Transactions on Information Systems (TOIS), Volume: 12, Issue: 3, (July 1994).
    [161]. Riloff E., Automatically constructing a Dictionary for Information Extraction Tasks. In Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI-93), pp. 811-816, AAAI Press/The MIT Press, 1993.
    [162]. RiloffE., Automatically Generating Extraction Patterns form Untagged Text. in Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pp. 1044-1049, 1996.
    [163]. Roberts R. J., Varmus H. E., Ashburner M., Brown P. O., Eisen M. B., Khosla C., Kirschner M., Nusse R., Scott M., Wold B., Building A GenBank of the Published Literature. Science, Vol 291, Issue 5512, 2318-2319, 23 March 2001.
    [164]. Ross, D.T., Scherf, U., Eisen, M. B. et al, EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature. Nature Genetics, 2000 March, 24(3):227-234.
    [165]. Sahami M., Hearst M., Saund E., Applying the Multiple Cause Mixture Model to Text Categorization.
    
    Proceedings of 13th International Conference on Machine Learning (ICML-96). pp. 435-443, San Francisco, CA: Morgan Kaufmann. 1996.
    [166].Salton G. and Buckley C., Term-weighting approaches in automatic text retrieval. Inform. Process. Man. 24, 5, 513-523. Also reprinted in Sparck Jones and Willett [1997], pp. 323-328. 1988.
    [167].SAS Text Miner home page. http://www.sas.com/technologies/analytics/datamining/textminer/.
    [168].Saund E., A multiple cause mixture model for unsupervised learning, Neural Computation, vol. 7, pp. 51-71, 1995.
    [169].Sehgal A. K and Qiu X. Y. and Srinivasan P., Mining Medline Metadata to Explore Genes and their Connections. Proceedings of the SIGIR-03 Workshop on Bioinformatics, Toronto, Canada, August 2003. http://mingo.info-science.uiowa.edu/padmini/.
    [170].Sekimizu T., Park H. S., Tsujii J., Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. In Genome Informatics. Unviersal Academy Press, Inc. pp.62-71.1998.
    [171].Shen Ziyin, The continuation of kidney study. Shanghai, Shanghai scientific & Technical Publishers. 1990.3-31.
    [172].Sholom M. Weiss, Chidanand Apte, Fred J. Damerau, David E. Johnson, Frank J. Oles, Thilo Goetz, Maximizing Text Mining Performance. IEEE Intelligent Systems, Volume 14, Issue 4, pp.63-69.July 1999.
    [173].Shusaku T., Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model. Information Sciences Volume: 162, Issue: 2, May 17, 2004, pp. 65-80.
    [174].Slonim N., Tishby N., The power of word clusters for text classification (Best Paper Award).in ECIR,2001. http://citeseer.nj.nec.com/stonim01power.html.
    [175].Slonim N., Tishby N., Document Clustering using Word Clusters via the Information Bottleneck Method. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. pp. 208-215.2000.
    [176].Soderland S., Fisher D., Aseltine J., & Lehnert W., CRYSTAL: Inducing a conceptual dictionary. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95), pp. 1314-1319 Montreal, Canada. 1995.
    [177].Soderland S., Learning to extract text-based information from the World Wide Web. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), pp. 251-254 Newport Beach, CA. 1997.
    [178].Soderland S., Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning Journal, Volume 34, Issue 1-3, pp. 233.272. February 1999.
    [179].SPSS Lexiquest home page. http://www.spss.com/SPSSBI/LexiQuest/.
    [180].SRA NetOwl TextMine home page. http://www.textmining.com/.
    [181].Srikant R. and Agrawal R., Mining generalized association rules. In Proc 21th Intl Conf. Very Large Data Bases, pages 407-419, September 1995.
    [182].Srinivasan P., Text Mining: Generating Hypotheses from Medline. Joumal of the American Society for Information Science. 55 (4)396-413. 2004.
    [183].Srinivasan P., MeSHmap: A Text Mining Tool for Medline. Proceedings of the Annual Conference (2001) of the American Medical Informatics Association (AMIA), March 2001.
    [184].Stapley B.J and Benoit G., Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in medline abstracts. In Pacific Symposium on Biocomputing, pages 529-40, 2000.
    [185].Stapley B.J. and Kelley L. A. and Sternberg M. J., Predicting the sub-cellular location of proteins from text using support vector machines. Pac Symp Biocomput. 2002. 374-385.
    
    
    [186].Stein L.D., Integrating biological databases. Nature Reviews Genetics, 4(5): 337-345, May 2003.
    [187].Stephens M., Palakal M., Mukhopadhyay S. et al, Detecting Gene Relations from Medline Abstracts. Pac Symp Biocomput. 2001: 483-95.
    [188].Swartout B., Patil R., Knight K., Russ T., Toward Distributed Use of Large-Scale Ontologies, in: AAAI-97 Spring Symposium on Ontological Engineering, March 1997, Stanford, California.
    [189].Swanson, D., Complementary structures in disjoint science literatures. In Proceedings of the 14th Annual International ACM/SIGIR Conference, 1991, pages 280-289.
    [190].Swanson D., Migraine and Magnesium: Eleven Neglected Connections. Perspeet. Biol. Med. 1988, 31(4): 526-557.
    [191].Swanson D., Medical literature as a potential source of new knowledge. Bull Med Libr Assoc 1990 Jan, 78(1): 29-37.
    [192].Swanson, D., Two medical literatures that are logically but not bibliographically connected, Journal of the American Society for Information Retrieval, 1987, 38(4): 228-233.
    [193].Swanson, D., and Smalheiser N. R., Assessing a gap in the biomedical literature: magnesium deficiency and neurologic disease, Neuroscience Research Communications, 1994, 15: 1-9.
    [194].Swanson, D. and Smalheiser N. R., An interactive system for finding complementary Literatures: a stimulus to scientific discovery, Artificial Intelligence, 1997, 91: 183-203.
    [195].Swanson D., Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge. Perspective in Biology and Medline. 30(1): 7-18: 1986.
    [196].Swanson D., Somatomedin C. and Arginine, Implicit Connections Between Mutually-Isolated Literatures. Perspective in Biology and Medline. 33(2): 157-186, Winter, 1990.
    [197].Tan A.-H. and Yu P., A Comparative Study on Chinese Text Categorization Methods, PRICAI 2000 Workshop on Text and Web Mining, Melboume, pp. 24-35, August 2000.
    [198].Tan A-H., Text Mining: The state of the an and the challenges. In proceedings, PAKDD-99 Workshop on Knowledge discovery from Advanced Databases (KDAD-99), Beijing, pp. 71-76, April 1999.
    [199].Tan K. C., Yu Q., Heng C. M., Lee T. H., Evolutionary computing for knowledge discovery in medical diagnosis. Artificial Intelligence in Medicine Volume: 27, Issue: 2, February, 2003, pp. 129-154.
    [200].Tanabe L., Scherf U., Smith L. H., Lee J. K., Hunter L., Weinstein J. N., MedMiner: An Internet Text-Mining Tool for Biomedical Information, with Application to Gene Expression profiling. Biotechnipues, vol. 27(6): 1210-4, 1216-7, December 1999.
    [201].TextAnalyst home page. http://www.megaputer.com/.
    [202].Text Mining tools on KDNuggets (A famous kdd website), http://www.kdnuggets.com.
    [203].Telemakus home page. http://www.telemakus.net/.
    [204].Thomas J., Milward D., Ouzounis C. et al, Automatic Extraction of Protein Interactions from Scientific Abstracts. PSB 2000, pp. 541-52.
    [205].Tsay J.-J. and Wang J.-D., Design and Evaluation of Approaches to Automatic Chinese Text Categorization. Computational Linguistics and Chinese Language Processing Vol. 5, No. 2, August 2000, pp. 43-58.
    [206].Tsay J.-J. and Wang J.-D., Improving Automatic Chinese Text Categorization by Error Correction. Proceedings of the 5th International Workshop Information Retrieval with Asian Languages. pp.1-8, November 2000, Hong Kong, China.
    [207].Zelenko D., Aone C., Richardella A., Kernel methods for relation extraction. JMLR, 3(Feb): 1083-1106, 2003.
    [208].Uschold M., Building Ontologies: Towards a Unified Methodology, in: 16th Annual Conf. of the British Computer Society Specialist Group on Expert Systems, 1996, Cambridge, UK.
    
    
    [209]. Viveros M. S., Nearhos J. P., and Rothman M. J., Applying data mining techniques to a health insurance information system. In T. M. Vijayaramam, A. Buchmann, C. Mohan, and N. L. Sarda, editors, 22 International Conference on Very Large Data Bases (VLDB '96), pages 286-293, Mumbal, India, Morgan Kaufmann Publishers, Inc. San Francisco, USA.
    [210]. Volz R., Oberle D., Staab S., Motik B., KAON Server - A Semantic Web Management System. In Alternate Track Proceedings of the Twelfth International World Wide Web Conference, WWW2003, Budapest, Hungary, 20-24 May 2003. ACM, 2003.
    [211]. Weeber M., Klein H., Aronson A. R. et al., Text-Based Discovery in Biomedicine: The Architecture of the DAD-system. Proc AMIA Symp. 2000: 903-7.
    [212]. Wilkinson D. and Huberman B. A., A Method for Finding Communities of Related Genes. Proc. Natl. Acad. Sci. USA, 10.1073/pnas.0307740100.
    [213]. Wong C. K. P., Luk R. W. P., Wong K. F., Kwok K. L., Text Categorization using Hybrid (Mined) Terms, Proceedings of the fifth international workshop on Information retrieval with Asian languages, pages: 217-218, November 2000, Hong Kong, China.
    [214].吴家睿.系统生物学面面观.中国科学.2002年第6期(论坛).
    [215].吴家睿.后基因组时代的交叉科学:从“Bio—X”到“X biology”.中国科学,2002年第1期(论坛).
    [216].吴家睿.新时代 大科学.中国科学.2002年第2期(论坛).
    [217]. Yakushiji A. and Tateisi Y. and Miyao Y. and Tsujii J., Event Extraction from Biomedical Papers Using a Full Parser. Pac Symp Biocomput. 6(2001)408-419.
    [218]. Yandell M. D. and Majoros W. H., Genomics and Natural Language Processing. Nature Reviews Genetics,3, 2002, 601-610.
    [219]. Yang Y. and Pedersen J. O., A Comparative Study on Feature Selection in Text Categorization. Proceedings of the Fourteenth International Conference on Machine Learning. pp. 412-420. 1997.
    [220]. Yang Y. and Liu X., A re-examination of text categorization methods. Proceedings of SIGIR-99, pages: 42-49.1999.
    [221]. Yang Y., A Evaluation of Statistical Approaches to text Categorization. Information Retrieval.Volume 1,Issue 1-2, pp. 69-90.1999.
    [222].姚美村,袁月梅,艾路,乔延江,数据挖掘及其在中医药现代化研究中的应用,北京中医药大学学报2002.Vol.25 No.5.20-23。
    [223]. Yarowsky D., Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora, in Proceedings of COLING'92, p. 454-460, Nantes, 1992.
    [224]. Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 189-196. 1995.
    [225].周孟霞,基于规则学习的中医药文献自动标引系统,浙江大学硕士论文,2003,指导老师(吴朝晖教授)。
    [226]. Zhou X., Wu Z., Text Knowledge Discovery: Text Mining based on Information Extraction. Computer Science(in Chinese). 2003, 30(1): 63-66.
    [227]. Zhou X., Fang Q., Wu Z., A Comparative Study on Text Representation and Classifiers in Chinese Text Categorization. ICCPOL-03, pp. 454-461.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700