移动商务导购系统的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

移动商务导购系统的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：The Research of Shopping-guide System Based on M-Commerce
作者：李辉
论文级别：博士
学科专业名称：管理科学与工程
中文关键词：移动商务 ; 导购 ; 信息传递 ; 相似度计算 ; 词义消岐 ; 垃圾信息
英文关键词：M-commerce ; Shopping-guide ; Message transfer ; Similarity computing ; Word Sense Disambiguation ; Spam text messages
学位年度：2008
导师：杨德礼
学科代码：1201
学位授予单位：大连理工大学
论文提交日期：2008-03-10

摘要

为了引领消费者快速的获取商品信息,不仅商场、超市、书店等一些大型购物中心采用了多种导购方式,就连机场、公园、医院都设置了各种指引方式,我们也可以广义的称之为导购。电子商务的导购更是便捷,消费者可以利用互联网的各种搜索工具通过分类查找或关键词检索进行商品信息的获取。在移动商务日益繁荣的今天,同样也需要适合移动商务的导购方式。
     本文旨在针对目前移动商务导购研究的现状,结合移动通信技术和信息技术对适合移动终端操作的导购方式进行了研究,提出了移动商务导购系统的模型和系统组成。主要研究内容包括导购系统的模型、导购系统信息传递方式研究、基于知网的词义消歧研究、基于知网的相似度计算研究和威胁导购系统安全的垃圾信息过滤研究五个方面。
     在导购系统模型的研究方面,通过分析移动商务的终端特征及导购的目的,设计了移动商务导购的模型和系统组成,定义了各部分功能,并阐述了研究意义。
     在导购系统信息传递方式的研究方面,本文通过研究移动设备的技术特征和输入输出方式、移动应用平台所能提供的三种信息服务交互方式,设计了导购系统的信息传递模型。并对导购系统信息传递方式进行了设计和实现,试验结果符合系统应用要求。
     在基于知网的词义消歧算法研究方面,通过对知网的介绍分析,选择知网作为词义消歧的资源,并以知网的知识描述性语言为基础改写了知网的词语-义项文件成为词语-义项号文件,创造性的引入了距离系数来量化不同距离的实词对歧义词的影响,以此为基础,实现了基于知网的词义消歧算法,试验证明算法是有效的。
     在基于知网的相似度计算研究方面,通过归纳各种相似度计算方法的使用范围和优缺点,在总结中文语义分析难点的基础上,提出融合消歧策略的相似度计算方法,给出了相似度算法模型。以知网作为语义辞典,实现了基于知网的语义相似度算法,在此基础上实现导购系统问答模块。试验说明相似度算法和导购问答是有效的。
     在导购系统安全方面,从垃圾信息引发的拒绝服务角度考虑,对导购系统的垃圾信息过滤进行了研究。结合导购问答信息的文本特点,在比较了各种特征提取算法、特征权重计算方法、分类算法在中文信息分类中的优劣的基础上,提出了基于最小风险贝叶斯信息过滤算法,使用自建的短信语料库测试了该算法的性能,实验结果表明该算法能够有效的阻止垃圾信息,满足导购系统的安全要求。
In order to guide consumers in masses of merchandise information effectively, multiple methods of guidance are adopted not only by shopping centers, such as emporiums, super markets and book stores, but also by airports, parks and hospitals. All the above are generally defined as shopping-guides. E-commerce provides shopping-guide more convenienc. As a result, consumers may obtain merchandise information by keyword browse or category-based search via internet. However, with the rapid development of Mobility-commerce (M-commerce), it is necessary to develop new methods of shopping-guide fitting for M-commerce nowadays.
     According to the existing M-commerce shopping-guide researches, this thesis puts forward a shopping-guide system based on M-commerce and its components, combining both mobile communication techniques and information techniques. This thesis mainly discusses the shopping-guide system module, methods of message transfer in shopping-guide system, word sense disambiguation based on Hownet, the computing of semantic similarity based on Hownet, and filtering of spam text messages, which are introduced specifically as followed.
     Depending on analyzing mobile devices' technology and shopping-guide targets, a shopping-guide system based on M-commerce is put forward. The formation of the system, the functions of the different components and the significance are also provided.
     The module of message transfer in M-commerce shopping-guide is designed. This module involves researches on mobile devices' technical features, input and output modes and three ways of interactive information services providing by mobile platform. It designs and realizes methods of message transfer and the experiment results satisfy the application requirements.
     The computing of word sense disambiguation is based on Hownet. The thesis introduces and analyzes Hownet as a resource. The computing changes word-sense files to word-sense number files using knowledge description language of Hownet. It also creatively brings in distance parameter to calculate influences from notional words with different distances on ambiguous words. And the computing is verified by experiments.
     On summing up the applying scope, merits and defects of various similarity computing methods, a new computing method integrated with strategy of diminishing the ambiguity is proposed, based on the summary of Chinese semantic difficulties. Adopting Hownet as a semantic dictionary, the similarity computing is realized, upon which Q& A. in the shopping-guide system is designed. Both the computing and the Q& A. are verified by experiments.
     In the aspect of the shopping-guide system safety, filtering of spam text messages is researched because of service refusal initiated by spam massages. Considering the text messages using in Q&A shopping-guide, a method of filtering adopting Bayes Risk Analysis is proposed. This thesis compares Algorithm on Feature Extraction, Algorithm on Feature Weights and Classification Algorithm in the Chinese information classification. And then it testifies the performances of self-building short message corpus. The experiments show that such computing can block spam text messages and satisfies the safety requirements.

引文

[1]王有为,胥正川,杨庆.移动商务原理与应用.北京:清华大学出版社,2006:18-19,131.
    [2]周舫.汉语句子相似度计算方法及其应用的研究:(硕士学位论文).开封:河南大学,2002.
    [3]高思丹,袁春风.语句相似度计算在主观题自动批改技术中的初步应用.计算机工程与应用,2004,14:132-135.
    [4]赵铁军.《机器翻译原理》.哈尔滨:哈尔滨工业大学出版社,2000.
    [5]秦兵,刘挺,王洋,郑实福,李生.基于常问问题集的中文问答系统研究.哈尔滨工业大学学报,2003,35(10):1179-1182.
    [6]G.Leusch,N.Ueffing,H.Ney.A novel string-to-string distance measure with applications to machine translation evaluation.Machine Translation Summit Ⅸ,New Orleans,Louisiana,USA,2003.
    [7]车万翔,刘挺,秦兵,李生.基于改进编辑距离的中文相似句子检索.高技术通讯,2004,(14)7:15-20.
    [8]郭艳华,周昌乐.一种汉语语句依存关系网协动生成方法研究.杭州电子工业学院学报,2000,20(4):24-32.
    [9]李彬,刘挺,秦兵,李生.基于语义依存的汉语句子相似度计算.计算机应用研究,2003,(20)12:15-17.
    [10]赵妍妍,秦兵,刘挺,张俐,苏中.基于多特征融合的句子相似度计算.全国第八届计算语言学联合学术会议,南京,2005.
    [11]鲁松,白硕.基于向量空间模型的有导词义消歧.计算机研究与发展,2001,38(6):662-667.
    [12]张刚.基于隐马尔可夫模型的词义消歧.(硕士学位论文).哈尔滨:哈尔滨工业大学,2002.
    [13]C.D.Manning,H.Schuetze.Foundations of Statistical Natural Language Processing.MIT Press,1999:230-261.
    [14]Gale,W.,K.Church,D.Yarowsky.A Method for Disambiguating Word Senses in a Large Corpus.Computers and the Humanities,1992,26:415-439.
    [15]刘小虎.英汉机器翻译中词义消歧方法的研究:(博士学位论文).哈尔滨:哈尔滨工业大学,1998.
    [16]Yarowsky,D.Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora.COLING-92.Nantes,1992:454-460.
    [17]Gale,W.,K.Church,D.Yarowsky.Work on Statistical Methods for Word Sense Disambiguation.AAAI Fall Symposium on Probabilistic Approaches to Natural Language.Cambridge,MA,1992:54-60.
    [18]李涓子,黄昌宁语言模型中一种改进的最大熵方法及其应用.软件学报,1999,10(3):257-263.
    [19]张刚.基于隐马尔可夫模型的词义消歧.(硕士学位论文).哈尔滨:哈尔滨工业在学,2002.
    [20]卢志茂,刘挺,郎君,李生.神经网络和贝叶斯网络在汉语词义消歧上的对比研究.高技术通讯,2004,8:15-19.
    [21]卢志茂,张刚,刘挺,李生.基于依存分析贝叶斯模型的词义消歧.高技术通讯,2003(5):1-6.
    [22]苟恩东,李生,赵铁军.基于汉语二元同现的统计词义消歧方法研究.高技术通讯,1998,10(8):21-25.
    [23]杨尔弘,张国清,张永奎.基于义原同现频率的汉语词义排歧方法.计算机研究与发展,2001,38(7):834-837.
    [24]Resnik.P,D.Yarowsky.A Perspective on Word Sense Disambiguation Methods and Their Evaluation.SIGLEX'97.Washington,DC,1997:79-86.
    [25]陈丹琪.统计与规则相结合的英语词性标注和基本名词短语分析:(硕士学位论文).哈尔滨:哈尔滨工业大学,1999.
    [26]丁江伟,刘挺,卢志茂,李生.隐马尔可夫模型和贝叶斯模型词义消歧对比研究.全国第七届计算语言学联合学术会议,哈尔滨,2003.
    [27]Michael Lesk.Automatic Sense Disambiguation using Machine-Readable Dictionaries:How to Tell a Pine Cone from an Ice Cream Cone.ACM SIGDOC.1986:24-26.
    [28]Nancy Ide,Jean Veronis.Introduction to the Special Issue on Word Sense Disambiguation:The State of the Art.Computational Linguistics.1998,24(1):1-40.
    [29]Hwee Tou Ng,Hian Beng Lee.Integrating Multiple Knowledge Sources to Disambiguate Word Sense:An Examplar-based Approach.34th Annual Meeting of the ACL.1996:40-47.
    [30]刘群,李素建.基于知网的词汇语义相似度计算.第三届汉语词汇语义学研讨会,台北,2002.
    [31]余晓峰,刘鹏远,赵铁军.一种基于《知网》的汉语词语词义消歧方法[A].第二届学生计算语言学研讨会论文集[C].北京,2004:128-133.
    [32]朱德熙.语法讲义.北京:商务印书馆,1982.
    [33]张宇,刘挺,高立琦,车万翔,朱传靖.基于常问问题集的在线客服实验研究.全国第八届计算语言学联合学术会议(JSCL-2005),南京,2005.8:474-480.
    [34]胡于进,周小玲,凌玲,王学林.基于向量空间模型的贝叶斯文本分类方法.计算机与数字工程,2004(6),32(6):28-30,77.
    [35]赵世奇,张宇,刘挺,陈毅恒,黄永光,李生.基于类别特征域的文本分类特征选择方法.中文信息学报,2005,19(6):21-27.
    [36]Wiener E,Pedersen J O,and Weigend A S.A neural network approach to topic spotting.Proceedings of 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR-95).1995:22-34.
    [37]Apte C,Damerau F J,and Weiss S M.Automated learning of decision rules for text categorization.ACM Transactions on Information Systems.1994,12:233-251.
    [38]Hwee Tou Ng,Wei Boon Gob,and Kok Leong Low。Feature selection,perceptron learning,and a usability case study for text categorization。In:Proceedings of the 20th ACM
    [39]Yiming Yang and Pedersen J O.A comparative study on feature selection in text categorization[C].Proceedings of the 14th International Conference on Machine Learning,Nashville(ICML-97).1997:412-420
    [40]孙国菊,张杰.中文文本分类的特征选取评价.哈尔滨理工大学学报,2005,10(1):76-78.
    [41]周茜,赵明生等.中文文本分类中的特征选择研究.中文信息学,2004,18(3):17-23
    [42]秦进,陈笑蓉等.文本分类中的特征抽取.计算机应用,2003,23(2):45-46
    [43]胡佳妮,徐蔚然,郭军,邓伟洪.中文文本分类中的特征选择算法研究.光通信研究,2005(3):44-46
    [44]Franca Debole,Fabrizio Sebastiani.Supervised Term Weighting for Automated Text Categorization.2003.http://cite.ist.psu.edu/572661.html(Accessed Sep.10,2004)
    [45]鲁松,李晓黎,白硕等.文档中词语权重计算方法的改进[J].中文信息学报,2000:14(6):8-20
    [46]陆玉昌,鲁明羽,李凡等.向量空间法中单词权重函数的分析和构造.计算机研究与发展,2002(10):1205-1210
    [47]景丽萍,黄厚宽,石洪波.用于文本挖掘的特征选择方法TFIDF及其改进.广西师范大学学报(自然科学版),2003(3):142-145
    [48]赵庆玉.决策树算法的研究与实现[D].北京:清华大学,2000
    [49]Schapire R E and Singer Y.BoosTexter:a boosting-based system for text categorization.Machine Leanring.2000,39(2/3):135-168.
    [50]Stephan Bloehdorn and Andreas Hotho.Boosting for Text Classification with Semantic Features.2004.
    [51]Lili Diao,Mingyu Lu,Keyun Hu,Yuchang Lu,Chunyi Shi.New Boosting Algorithms for Text Categorization.Proceedings of the 4th World Congress on Intelligent Control and Automation.2002,2326-2329.
    [52]张宁,贾自艳,史忠植.使用KNN算法的文本分类.计算机工程,2005(4),31(8):171-172,185.
    [53]Joachims T.Text categorization with support vector machines:learning with many relevant features.Proceedings of 10th European Conference on Machine Learning(ECML-97).1997:137-142.
    [54]翟林,刘亚军.支持向量机的中文文本分类研究.计算机与数字工程,2005,33(3):21-23,45
    [55]Ruiz,ME and Srinivasan,P.Automatic text categorization using neural networks.In E.Efthimiadis(Ed.) Proc.Of the 8th ASIS/SIGCR Workshop on Classification Research,Washington,US,1997.
    [56]李斗,李弼程.一种神经网络文本分类器的设计与实现.计算机工程与应用,2005(17),107-109,119.
    [57]Dumais S T,Platt J,Heckerman D,et al.Inductive learning algorithms and representations for text categorization.Technical report,Microsoft Research.1998.
    [58]张杰,战学刚,冯金平,陈文亮.中文文本分类器的评价.辽宁鞍山:鞍山科技大学学报,2005年6月,28(3-4):231-234,238.
    [59]程军.基于统计的文本分类技术研究.中国科学院博士论文,2003(5).
    [60]Yiming Yang.A re-examination of text categorization methods.Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval,1999:42-49.
    [61]黄萱菁.大规模中文文本的检索、分类、与摘要研究.复旦大学博士论文,1998(5).
    [62]陈雪天,李荣陆.使用最大熵模型进行文本分类.计算机工程与应用,2004(35):78-79,195.
    [63]Yiming Yang,Jian Zhang,and Bryan Kisiel.A scalability analysis of classifiers in text categorization.Proceedings of the 26th hCM International Conference on Research and Development in Information Retrieval(SIGIR-03).2003:96-103.
    [64]Fabrizio Sebastiani.Machine learning in automated text categorization,ACM Computing Surveys.2002,34(1):1-47.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700