北京富士通研发中心实习报告邱诚

北京富士通研发中心实习报告邱诚

报告主题 • 富士通的工作 • Auto-Regressive and Moving Average Model (ARMA)介绍 • RHadoop介绍

富士通的工作 • 研究数据选择方式； • TBSC • 均值法 • 指示性片段 • 优化ARMA模型和SVR模型； • 动态结合ARMA模型和SVR模型；

均值法描述 • 基本步骤 • 查找与预测天1~9点的欧式距离最接近的五天； • 将所得到的五天通过10~20点的欧式距离进行展； • 将前两步得到的全部天通过k-means聚成两类； • 挑选预测天之前最接近的同一工作日作为判定天，和两个聚类中心计算欧式距离，挑选距离较小的聚类； • 将所得聚类中的各天求平均值作为预测结果。

ARMA模型介绍 • ARMA模型原理 • ARMA模型优化 • R中ARMA模型的使用

ARMA基本原理 Auto-Regressive model Moving Average model

X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

ARMA基本原理 • 自回归模型描述的是当前值与历史值之间的关系； • 滑动平均模型描述的是自回归部分的误差累计； • ARMA模型就是通过将自回归模型的预测值与累计误差相结合；

ARMA模型的优化 • Akaike’s Information Criterion (AIC) • AIC, Bias Corrected (AICc) • Bayesian Information Criterion (BIC) • 以上优化都是针对通过最大似然估计进行拟合得到的ARMA模型

AIC优化指标 ：代表最大似然；：代表模型的参数个数；

R中ARMA模型的使用 • arima • auto.arima

arima函数 arima ( x, order = c(0, 0, 0), seasonal = list(order = c(0, 0, 0), period = NA), xreg = NULL, include.mean = TRUE, transform.pars = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", "ML", "CSS"), n.cond, optim.method = "BFGS", optim.control = list(), kappa = 1e6 )

R中arima参数说明

auto.arima函数 auto.arima( x, d=NA, D=NA, max.p=5, max.q=5, max.P=2, max.Q=2, max.order=5, start.p=2, start.q=2, start.P=1, start.Q=1, stationary=FALSE, ic=c("aicc","aic", "bic"), stepwise=TRUE, trace=FALSE, approximation=(length(x)>100 | frequency(x)>12), xreg=NULL, test=c("kpss","adf","pp"), seasonal.test=c("ocsb","ch"), allowdrift=TRUE, lambda=NULL, parallel=FALSE, num.cores=NULL )

Nowadays, we have lots of data. BIG DATA!

What is R?

Why R?

What need? • There is a need for more than counts and averages on these big data sets • Analyzing all of the data can lead to insights that sampling or subsets can’t reveal

Why R and Hadoop?

RHadoop介绍

Rhadoop用途 The open-source RHadoop project makes it easier to extract data from Hadoop for analysis with R, and to run R within the nodes of the Hadoop cluster -- essentially, to transform Hadoop into a massively-parallel statistical computing cluster based on R.

Rhadoop

rhdfs • Manipulate HDFS directly from R • Mimic as much of the HDFS Java API as possible

rhdfs Functions

rmr • Designed to be the simplest and most elegant way to write MapReduce programs • Gives the R programmer the tools necessary to perform data analysis in a way that is “R” like • Provides an abstraction layer to hide the implementation details

rmr mapreduce Function

Thank you!

北京富士通研发中心实习报告 邱 诚