用户名: 密码: 验证码:
区间型符号数据分析理论方法及其在金融中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
符号数据分析(Symbolic Data Analysis,简称SDA)是一种研究如何从海量数据中发掘系统知识的理论和方法。符号数据分析技术运用“数据打包”的思想,实现对庞大的样本空间的降维处理。相应的,样本数据的性质就发生了变化:由原来的“点数据”变为“符号数据”。区间数是最常用的一类符号数据。论文研究区间型符号数据分析的理论方法,并将其应用于金融中若干问题的求解,主要内容如下:
     1.区间型符号数据分析基础。主要研究区间型符号变量的统计描述。首先讨论了区间型符号变量的经验密度函数的计算,在此基础上,给出了直方图的绘制方法,并研究了均值和方差、协方差和相关函数的计算。这为后面几章对区间型符号数据进行多元分析的研究奠定了基础。
     2.区间型符号数据的主成分分析方法。首先对现有的区间数据主成分分析的两种主要方法——顶点法(V-PCA)和中点法(C-PCA)进行了比较研究。进一步,从公共主成分的角度出发,提出了基于公共主成分的区间PCA方法(Common PCA)。该方法还可适用于不同时间段的区间主成分分析,即进行动态的区间主成分分析。基于Hausdorff距离定义了一种效度指标,并通过模拟的方法,对V-PCA、C-PCA和Common PCA的方法有效性进行了比较研究。最后将上述方法应用于我国上海证券交易市场,研究上市公司的风险与公司规模之间的关系,以及股票的市场表现的动态评价。
     3.区间型符号数据的回归分析方法。首先给出了基于区间符号数据描述统计量的回归参数估计方法。其次对于可以线性化的非线性相关的区间变量,提出了基于误差传递理论对区间型符号变量进行非线性回归的方法。接下来鉴于因变量的估计y|^~i以及残差e_i均为区间数,论文提出了矩形残差图来进行回归诊断。进一步,基于Hausdorff距离定义了评价模型优劣的指标——反映绝对误差的RMSE_H和反映相对误差的UH。最后应用区间线性回归方法对沪深300指数与中信规模风格指数的相关性进行了实证分析。
     4.具区间型符号数据系数的多目标线性规划(简称区间多目标线性规划)问题的求解。首先讨论具区间系数的单目标线性规划问题的求解。在此基础上,研究区间多目标线性规划的Zimmermann模糊求解算法。最后将该算法应用于证券投资组合问题的实证分析。
Symbolic data analysis (SDA) is a new method analyzing and gleaning useful information from massive data. It is in such a way that summarizes large data to a dataset of a small size. One consequence of this is that the data may no longer be formatted as single values, but may be symbolic data. Interval number is a main type of symbolic data. This dissertation makes a study on the theory&methodology of interval-valued symbolic data analysis and its application in finance. The main points of this dissertation are as follows.
     1. Foundation of interval-valued symbolic data analysis Descriptive statistics for interval-valued symbolic data is mainly studied. Firstly, the empirical density function of interval-valued symbolic variable is defined. Based on this, methods of drawing the histogram and calculating of mean, variance, covariance and correlation functions for interval-valued symbolic variable are given. All of these become fundamentals for the next studies.
     2.Principal component analysis (PCA) for interval-valued symbolic data The two main methods of PCA for interval data are Vertices-PCA (V-PCA) and Centers-PCA (C-PCA). Comparative study is firstly made on them. Then a new method of PCA for interval -valued symbolic data called Common PCA from point view of common principal component. One advantage of this method is making dynamic PCA on time serial data. In order to make a further comparison on the three methods, an index which can indicate the goodness of fit of some method was defined by the Hausdorff distance. Then, comparative study of the three methods was made by means of simulation. Finally, an empirical research on Shanghai financial market is done by the given method. The relation between the risk and the company’s scale and the dynamic evaluation on several stocks’behavior on the market are studied, respectively.
     3. Regression analysis for interval-valued symbolic data Firstly, estimation of regression parameters based on descriptive statistics for interval-valued symbolic data is given. On the other hand, for non-linear correlation data which can be linearized, a method of non-linear regression analysis for interval-valued symbolic data is put forward based on error transferring theory. Because of y|^_i and e_i are interval numbers, a type of rectangular residual plot is proposed to make regression diagnotics. Then, indices which can indicate the goodness of the model is defined based on the Hausdorff distance, that are called RMSE_H reflecting absolute error and U_H reflecting the relative error. Finally, it can empirical research on the correlation between CSI 300 and style indices of Chinese international trust & investment company (CITIC) is done by the given method.
     4. Solving of multi-objective linear programming with interval-valued symbolic coefficient Firstly, method of solving single-objective linear programming with interval-valued symbolic coefficient is discussed. Then, Zimmermann fuzzy method of solving of multi-objective linear programming with interval-valued symbolic coefficient is put forward. Finally, an empirical research on the portfolio investment is made.
引文
[1] Moore R E. Interval Analysis. Prentice-Hall, Englewood Cliffs, New Jersey. 1966.
    [2] Bock H H, Diday E (Eds.). Analysis of Symbolic Data. Springer-Verlag Berlin, New York, 2000.
    [3] 胡艳,王惠文.一种海量数据的分析技术——符号数据分析及应用. 北京航空航天大学学报(社会科学版),2004,17(2): 40-44.
    [4] Scott F, Lev G, Vladik K et al. Computing Variance for Interval Data is NP-Hard. ACM SIGACT News, 2002, 3(2): 108-118.
    [5] Vladik K, Hung T N, Berlin W. On-line algorithms for computing mean and variance of interval data, and their use in intelligent systems. http://www.cs.utep.edu/vladik/2003/tr03-24c.pdf
    [6] Gang X, Scott A, Vladik K etc. New algorithms for statistical analysis of interval data. PARA'04 State-of-the-Art in Scientific Computing June. 2004:20-23.
    [7] Vavasis S A. Nonlinear optimization: complexity issues, Oxford University Press, N. Y. 1991.
    [8] Lauro N C, Palumbo F. Principal components analysis of interval data: A symbolic data analysis approach. Computational Statistics, 2000,15 (1):73–87.
    [9] Cazes P, Chouakria A, Diday E. Symbolic principal components analysis, in: Analysis of Symbolic Data ( Eds. Bock H H, Diday E). Springer-Verlag Berlin, New York, 2000.
    [10] Pierpaolo D, Paolo G. A least squares approach to principal component analysis for interval valued data . Chemometrics and Intelligent Laboratory Systems, 2004, 70(2): 179-192.
    [11] Antonio I. Spaghetti PCA analysis- An extension of principal components analysis to time dependent interval data. Pattern Recognition Letters, 2006, 27(5): 504-513.
    [12] Billard L, Diday E. Regression Analysis for Interval-Valued Data, In: Data Analysis, Classification, and Related Methods (eds. Kiers H A L, Rassoon J P, Groenen P J F, etc). Berlin: Springer-Verlag, 2000: 369-374.
    [13] Billard L, Diday E. Symbolic Regression Analysis, in: Classification, Clustering, and Data Analysis (eds. Jajuga K, Sokolowski A, Bock H H). Berlin:Springer-Verlag. 2002: 281-288.
    [14] Ishibuchi H, Tanaka H. Multiobjective programming in optimization of the interval objective function. European Journal of Operational Research, 1990, 48:219-225.
    [15] Chanas S, Kuchta D. Multiobjective programming in optimization of the interval objective functions-A generalized approach. European Journal of Operational Research, 1996, 94:594-598.
    [16] Tong S. Interval number and fuzzy number linear programming. Fuzzy Sets and Systems, 1994,66:301-306.
    [17] 刘新旺 , 达庆利 . 一种区间数线性规划的满意解 . 系统工程学报,1999,14(2):123-128.
    [18] 郭均鹏,李汶华. 区间线性规划的标准型及其最优值区间. 管理科学学报,2004,7(3):59-63.
    [19] Urli B, Nadeau R. An interactive method to multiobjective linear programming problems with interval coefficients. INFOR 1992, 30,127–137.
    [20] 陈世联. 具区间数的多目标线性规划. 农业系统科学与综合研究, 2001,17(2):94-95.
    [21] 赵玉梅,陈华友. 证券组合投资的多目标区间数线性规划模型. 运筹与管理, 2006,15(2):124-127.
    [22] 路应金,唐小我, 周宗放.证券组合投资的区间数线性规划方法.系统工程学报, 2004,19(1):33—37.
    [23] 成思危主编,诊断与治疗:揭示中国的股票市场. 北京:经济科学出版社. 2003:149-197.
    [24] 郭均鹏,吴育华,李汶华. 基于标准化区间权重向量的层次分析法研究. 系统工程与电子技术,2004,26(7):900-902.
    [25] 郭均鹏, 吴育华. 区间数据包络分析的决策单元评价. 系统工程理论方法应用,2004,13(4):339-342.
    [26] Guo J P, Wu Y H. Classifying and ranking DMUs in interval DEA. Journal of Harbin institute of technology ( New Series), 2005, 12(4):405-407.
    [27] 郭均鹏, 吴育华. 超效率 DEA 模型的区间扩展. 中国管理科学, 2005,13(2):40-43.
    [28] 王德人,张连胜,邓乃扬. 非线性方程的区间算法. 上海:上海科学技术出版社.1987.
    [29] Billard L, Diday E. Symbolic data analysis: definitions and examples. www.stat.uga.edu/faculty/LYNNE/tr_symbolic.pdf.
    [30] 张尧廷, 方开泰. 多元统计分析引论. 北京:科学出版社,1983.
    [31] Palumbo F, Lauro N C. A PCA for interval valued data based on midpoints and radii[A]. InNew Developments in Psychometrics (Eds. Yanai H, OkadaA , Shigemasu K, et al), Springer-Verlag, Tokyo, 2003.
    [32] Denoeux T, Masson M H. Principal component analysis of fuzzy data using autoassociative neural networks. IEEE Transactions on Fuzzy Systems, 2004,12: 336-349.
    [33] Giordani P, Kiers H A L. Three-way principal component analysis of interval-valued data. Journal of Chemometrics, 2004,18:253-264.
    [34] D’Urso P, Giordani P. A possibilistic approach to latent component analysis for fuzzy data. Fuzzy Sets and Systems, 2005,150:285-305.
    [35] Krzanowski W J. Between-groups comparison of principal components. J. Am. Statist. Assoc, 1979, 74:703-707.
    [36] Krzanowski W J. Between-groups comparison of principal components—Some sampling results. J. Statist. Comp. Simul. 1982,15:141-154.
    [37] Keramidas E M, Devlin S J, Gnanadesikan R. A graphical procedure for comparing the principal components of several covariance matrices. Comm. Statist,1987,16:161-191.
    [38] Flury B. Two generalizations of the common principal component model. Biometrika, 1987,74: 59-69.
    [39] Flury B. Common principal components and related multivariate models. New York:Wiley, 1988.
    [40] Schott J R. Common principal component subspaces in two groups. Biometrika, 1988, 75: 229-236.
    [41] Schott J R. Some Tests for common principal component subspaces in several groups. Biometrika,1991,78: 771-777.
    [42] Tyler D E. Asymptotic inferences for eigenvectors. Ann. Statist, 1981, 9: 725-736.
    [43] James G S. Tests of linear hypotheses in univariate and multivariate analysis when the ratios of the population variances are unknown. Biometrika, 1954, 41: 19-43.
    [44] Magnus J R. Linear structures. Oxford University Press, New York. 1988.
    [45] Francisco A T, Renata S, Marie C et al. Adaptive Hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognition Letters, 2006, 27(3): 167–179.
    [46] Francesco P, Antonio I. Multidimensional interval-data: metrics and factorial analysis. http://asmda2005.enst-bretagne.fr/article.php3?id_article=37. 2005-05-16.
    [47] Huttenlocher D P, Klanderman G A, Rucklidge W J. Comparing images using the Hausdorff distance. IEE Trans. Pattern Anal. Machine Intell.1993,15, 850-863.
    [48] Rote G. Computing the minimum Hausdorff distance between two point sets on a line under translation. Inform. Process. Lett, 1991,38:123–127.
    [49] Souza R M, Carvalho, F A T. Clustering of interval data based on city-block distances. Pattern Recognition Lett, 2004, 25 (3), 353–365.
    [50] Ichino M, Yaguchi H. Generalized Minkowski metrics for mixed feature type data analysis. IEEE Trans. Systems Man Cybernet, 1994.24 (4):698–708.
    [51] Murty M N, Babu T R, Agrawal V K. Clustering Large Symbolic Datasets. The Electronic Journal of Symbolic Data Analysis,2005,3(1):1-7.
    [52] Souza R M, Carvalho D. Clustering of interval data based on city-block distances. Pattern Recognition Letters, 2004,25 (3): 353–365.
    [53] Ishibuchi H, Tanaka H. Multiobjective programming in optimization of the interval objective function. European Journal of Operational Research, 1990, 48:219-225.
    [54] Chanas S, Kuchta D. Multiobjective programming in optimization of the interval objective functions-A generalized approach. European Journal of Operational Research, 1996, 94:594-598.
    [55] 刘新旺 , 达庆利 . 一种区间数线性规划的满意解 . 系统工程学报,1999,14(2):123-128.
    [56] 曾文艺,罗承忠,肉孜阿吉. 区间数的综合决策模型. 系统工程理论与实践,1997,17(11):48-50.
    [57] 郭 均 鹏 , 吴 育华 . 区间线性规划的标准型及其求解 . 系统工程,2003,21(3):79-82.
    [58] 李汶华,郭均鹏. 基于决策者满意度的区间 DEA 的求解. 管理工程学报, 2005,7(1):59-63.
    [59] Atanu S, Tapan K P, Debjani C. Interpretation of inequality constraints involving interval coefficients and a solution to interval linear programming. Fuzzy Sets and Systems, 2001, 119:129-138.
    [60] 江东明.主成分分析在证券市场个股评析中的应用. 数理统计与管理, 2001,20(3):28—31
    [61] 唐珂, 杨辉耀. 我国生物制药类上市公司的多维主成分分析. 广州大学学报(自然科学版), 2002,1(3):13-18
    [62] 姚奕. 因子分析在证券市场个股分析中的应用,南京师大学报(自然科学版), 2003,26(4):30-32,36
    [63] 田波平,王勇,郭文明等. 主成分分析在中国上市公司综合评价中的作用. 数学的实践与认识, 2004,34(4):74-80
    [64] 王 学 民 . 因 子 分 析 在 股 票 评 价 中 的 应 用 . 数 理 统 计 与 管 理 ,2004,23(3):6-10
    [65] 刘则毅. 科学计算技术与 Matlab. 北京:科学出版社,2001:279-285.
    [66] Torsten B, Jean-Philippe T. From error probability to information theoretic (multi-modal) signal processing. Signal Processing, 2005,85 (5):875-902.
    [67] Hemp J, Kutin J. Theory of errors in Coriolis flowmeter readings due to compressibility of the fluid being metered . Flow Measurement and Instrumentation, 2006,17(6): 359-369.
    [68] Oden J T, Ivo B, Fabio N. Theory and methodology for estimation and control of errors due to modeling, approximation, and uncertainty. Computer Methods in Applied Mechanics and Engineering, 2005,194(2):195-204.
    [69] 李金海. 误差理论与测量不确定度评定. 北京,中国计量出版社,2003.
    [70] Elebeoba E M, Mladen A V, Donald L B. An error-correcting code framework for genetic sequence analysis. Journal of the Franklin Institute, 2004,341(2):89-109.
    [71] 吴育华, 杜纲. 管理科学基础(修订版). 天津:天津大学出版社, 2004:235-238.
    [72] 《运筹学》教材编写组.运筹学. 北京:清华大学出版社, 1990.
    [73] Zimmermann H J. Fuzzy programming and linear programming with several objective functions. Fuzzy Sets and Systems, 1978, 1(1): 44-45.
    [74] 齐莫曼 H J.管理科学与计算智能(陈国青,阮达 主编). 北京:高等教育出版社, 2005:65-76.
    [75] 宋业新等. 一类模糊线性规划模型的模糊最优区间值. 模糊系统与数学,2002,16(2):86-91.
    [76] 徐 金 红 , 任 彪 . 集 值 线 性 规 划 及 其 满 意 解 . 运 筹 与 管 理 . 2001,10(2):75-78.
    [77] Chiang J. Fuzzy linear programming based on statistical confidence interval and interval-valued fuzzy set. European Journal of the Operational Research Society, 2001,129:65-86.
    [78] Atanu S, Tapan K P, Debjani C. Interpretation of inequality constraints involving interval coefficients and a solution to interval linear programming. Fuzzy Sets and Systems, 2001, 119:129-138.
    [79] Maleki H R, Tata M, Mashinchi M. Linear programming with fuzzy variables. Fuzzy Sets and Systems, 2000, 109:21-33.
    [80] Liang R. Application of grey linear programming to short-term hydro scheduling. Electric Power Systems Research,1997,41:159-165
    [81] Guu S M, Wu Y K.Two-phase approach for solving the fuzzy linear programming problems. Fuzzy Sets and Systems, 1999, 107:191-195.
    [82] Tanaka H, Guo P, Zimmermann H J. Possibility distributions of fuzzy decision variables obtained from possibilistic linear programming problems. Fuzzy Sets and Systems, 2000, 113: 323-332.
    [83] Sakawa M, Nishizaki I, Uemura Y. Interactive fuzzy programming for multi-level linear programming problems with fuzzy parameters. Fuzzy Sets and Systems, 2000,109: 3-19.
    [84] Stefan C, Doreta K. A concept of the optimal solution of the transportation problem with fuzzy cost coefficients. Fuzzy Sets and Systems, 1996, 82:299-305.
    [85] Tanaka H. On fuzzy mathematical programming. Journal of Cybernetics, 1984, 3(4):37-46.
    [86] Rommel F H. Linear programming with fuzzy objective. Fuzzy Sets and Systems, 1989, 29:31-48.
    [87] Chinneck J W, Ramadan K. Linear programming with interval coefficients. European Journal of the Operational Research Society, 2000, 51:209-220.
    [88] Ida M. Interval multiobjective programming and mobile robot path planning. In: New Frontiers in Computational Intelligence and its applications. (Eds. Mohammadian M, Mohammadian). IOS Press, 2000.313–322.
    [89] Ida M. Efficient solution generation for multiple objective linear programming based on extreme ray generation method. European Journal of Operational Research. 2005,160:242–251.
    [90] Urli B, Nadeau R. PROMISE/scenarios: An interactive method for multiobjective stochastic linear programming under partialuncertainty. European Journal of Operational Research, 2004, 155:361–372.
    [91] Wang M L, Wang H F. Decision analysis of the interval-valued multiobjective linear programming problems. In: Ko¨ksalan, M.,Zionts, S. (Eds.), Multiple Criteria Decision Making in the New Millennium, Lecture Notes in Economics and Mathematical Systems, 2001,507. Springer-Verlag :210–218.
    [92] Wang M L, Wang H F. Interval analysis of a fuzzy multiobjective linear programming. International Journal of Fuzzy Systems, 2001,34:558–568.
    [93] Sharpe W F.证券投资原理.杨秀苔,刘星,等编译.重庆大学出版社,1998.
    [94] Ida M. Portfolio selection problem with interval coefficients. Applied Mathematics Letters. 2003,16:709–713.
    [95] Fleming W H.Optimal Investment Models and Risk Sensitive Stochastic Control in Mathematical Finance (Ds Metal eds.). New York:SpringerVedag.1995:75— 88.
    [96] 刘海龙,樊治平,潘德惠.带有交易费用的证券投资最优策略.管理科学学报,1999,2(4):39—43.
    [97] 刘海龙,郑立辉,樊治平,潘德惠.证券投资决策的微分对策方法研究.系统工程学报,1999,18(1):69—72.
    [98] 路应金,唐小我.周宗放.证券组合投资的区间数线性规划方法.系统工程学报,2004,19(1):33—37.
    [99] Sharpe W F,Alexander G J, Bailey J V. 投资学(第五版)(赵锡军,龙永红,季冬生等 译). 北京:中国人民大学出版社.1998.
    [100] Bock H H. Diday E. Analysis of Symbolic Data, Exploratory Methods for Extracting Statistical Information from Complex Data. Springer-Verlag, Heidelberg. 2000.
    [101] Billard L, Diday E. From the statistics of data to the statistics of knowledge: Symbolic data analysis. J. Amer. Statist. Assoc, 2003,98 (4):470–487.
    [102] Hickey T, Ju Q, Emden V. Interval arithmetic: From principles to implementation. J. ACM, 2001, 48 (5):1038–1068.
    [103] Alefeld G, Mayer G. Interval analysis: Theory and applications. J. Comput. Appl. Math. 2000, 121:421–464.
    [104] DenW T, Masson M H. Principal component analysis of fuzzy data using autoassociative neural networks. IEEE Transactions on Fuzzy Systems. 2004, 12:336–349.
    [105] D’Urso P, Giordani P. A possibilistic approach to latent component analysis for fuzzy data. Fuzzy Sets and Systems. 2005,150: 285–305.
    [106] Giordani P, Kiers H A L. Three-way component analysis of interval-valued data. Journal of Chemometrics. 2004,18:253–264.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700