用户名: 密码: 验证码:
Box-Cox和Johnson方法在油气井数据预处理中的研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A study of the methods of the Box-Cox transformation and Johnson transformation in data preprocessing of the oil and gas well data
  • 作者:王跃 ; 张鹏新 ; 党跃武
  • 英文作者:WANG Yue;ZHANG Peng-xin;DANG Yue-wu;School of Public Administration,Sichuan University;Tarim Oilfield,CNPC;
  • 关键词:数据预处理 ; 正态变换 ; Box-Cox变换 ; Johnson变换
  • 英文关键词:data preprocessing;;normalization;;Box-Cox transformation;;Johnson transformation
  • 中文刊名:YNMZ
  • 英文刊名:Journal of Yunnan Minzu University(Natural Sciences Edition)
  • 机构:四川大学公共管理学院;中国石油塔里木油田公司;
  • 出版日期:2018-07-10 13:58
  • 出版单位:云南民族大学学报(自然科学版)
  • 年:2018
  • 期:v.27;No.110
  • 语种:中文;
  • 页:YNMZ201804015
  • 页数:8
  • CN:04
  • ISSN:53-1192/N
  • 分类号:80-87
摘要
Box-Cox方法和Johnson方法是数据预处理中2种最常用的正态变换方法,研究2种方法的差异和适用数据对象,能够为数据预处理提供参考,为提出适用性和效度更好的方法做理论上的探索.该研究从数理逻辑上对2种方法进行分析,再通过多组不同分布特征的随机数据样本对2种方法的准确性和适用数据对象进行验证.通过研究发现:Box-Cox方法是单向变换的,Johnson方法是双向对称变换的;Box-Cox方法对偏度的改变效果明显,Johnson方法对峰度的改变效果明显;在应用中,Johnson方法较复杂,整体效果也较好;Johnson方法适用于对偏态不明显的变量正态变换,但对偏态明显的变量的变换效果较差,Box-Cox方法在对偏态不明显的变量的变换中反而会增加偏度,在对偏态明显的变量的变换中表现较Johnson方法好.
        Box-Cox transformation and Johnson transformation are two most popular methods for normalization in data preprocessing. A comparative study of their differences and proper application is valuable for data preprocessing as well as further studies of better theories and methods. This paper analyzes the two methods in terms of mathematical logic first,and then tests their accuracy,precision and applicability through several sample data sets in different distributions. The study reaches the following conclusions: first,the former transforms the original data unidirectionally,but the latter is bidirectional and symmetrical; second,the former alters skewness markedly,while the latter influences the kurtosis much more effectively; finally,the latter is more complicated and has a more accurate result in transformation,especially when the variate has tiny skewness,but it is ineffective when the distinct skewness occurs; however the former has a better performance when distinct skewness in the variate occurs because it will increase skewness in this case.
引文
[1]付秀丽,王博.AMSR-E亮温数据的3种正态变换方法比较[J].遥感信息,2017,32(03):162-166.
    [2]HOSSAIN M Z.The use of box-cox transformation technique in economic and statistical analyses[J].Journal of Emerging Trends in Economics and Management Sciences,2011,2(1):32-39.
    [3]YANG Y E,CHRISTENSEN O F,SORENSEN D.Analysis of a genetically structured variance heterogeneity model using the Box–Cox transformation[J].Genetics Research,2011,93(1):33-46.
    [4]王娟娟,赵闻蕾,王兴强,等.基于Johnson分布直接转换法的风速预测[J].电力自动化设备,2014,34(06):20-24.
    [5]石文华,陈春良,吴宇华.基于Johnson转换的轴承装配过程质量控制[J].火力与指挥控制,2012,37(12):48-50+54.
    [6]崔玫意,张玉虎,陈秋华.Box—Cox正态分布及其在降雨极值分析中的应用[J].数理统计与管理,2017,36(1):8-17.
    [7]ZHANG T,YANG B.Box–Cox transformation in big data[J].Technometrics,2017,59(2):189-201.
    [8]杨洁荣,宋向东,明喆.Johnson转换与Box-Cox转换相比的优越性[J].佳木斯大学学报(自然科学版),2010,28(03):449-452.
    [9]李长江,邓文平,曹元元,鲍宇.基于Box-Cox变换与Johnson变换非正态过程能力分析[J].齐齐哈尔大学学报(自然科学版),2015,31(01):66-70.
    [10]SENVAR O,SENNAROGLU B.Comparing performances of clements,box-cox,Johnson methods with weibull distributions for assessing process capability[J].Journal of Industrial Engineering and Management,2016,9(3):634.
    [11]DANIEL T.LAROSE,CHANTAL D.LAROSE.数据挖掘与预测分析[M].王念滨,译.2版.北京:清华大学出版社,2017:30.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700