250 likes | 269 Views
Improving Click-Through Rate Prediction Accuracy in Online Advertising by Transfer Learning. Yuhan Su 1 , Zhongming Jin 2 , Ying Chen 2 , Xinghai Sun 2 , Yaming Yang 2 , Fangzheng Qiao 2 , Fen Xia 2 , Wei Xu 1 1 Tsinghua University, 2 Baidu, Inc. Advertisement.
E N D
ImprovingClick-ThroughRatePredictionAccuracyinOnlineAdvertisingbyTransferLearningImprovingClick-ThroughRatePredictionAccuracyinOnlineAdvertisingbyTransferLearning Yuhan Su1, ZhongmingJin2, Ying Chen2, XinghaiSun2, YamingYang2, FangzhengQiao2, Fen Xia2, Wei Xu1 1TsinghuaUniversity,2Baidu,Inc.
Onlineadsrevenue: three factors Revenue=PV *CTR* ACP NumberofPageViews AverageClickPrice Click-throughRate =#clicks/#views
Challenge:smallproductslackdata • Small,niche-marketproducts • Newlydevelopedproducts • LackdataforCTRpredictionmodel. • Largeproductdatato • helpsmallproduct • Differentproductshavedifferentdistribution
Transferlearning:fromsourcetotarget Source Differentdistribution ? Differentdistribution LargeProduct Target Small Product Transferlearning
Ourcontributions Aneffectivetransferlearningapproach Smallproducts CTRprediction Amap-reduceefficientimplementation Realadsdataexperiment
Relatedwork They • Singleadvertisementproduct We • Multipleproducts • CTRprediction • Model • Feature • Transferlearning • Instancetransfer • Featurerepresentationtransfer • Parametertransfer • Relationalknowledgetransfer • Deeptransfer We • Handleamuchlargerdataset They • Fewworkonlargeadsdata
BaiduAllianceAdssystem Product1 (3) Send info (4) Returnbidding priceandmaterials (2) Sendrequest andrelatedinfo ADX Website (1) Surf Product2 User (6) See ads (5) Return ads … Productn
Ourapproach: framework pre-trainatargetmodel; loopfor Ntimes { samplesourcedata; combinedtraining; datareweight; } outputtheensemblemodel;
Ourapproach:intuition Target Target Target Target Source Source Source Source Initialization Sampling Reweighting Training
Ourapproach: sampling strategy Target Source • Sourcedata sampling • Thesamplingprobabilityisproportionaltothegradientonthetrainedmodel. • Intuition:Thelargerthegradientis,themorethemodelneedsthisdatainstance. Sampling
Ourapproach: data reweighting Target Source • Datareweighting • correctly classified: weight will not change • target data misclassified: increase dataweight • source data misclassified: decrease data weight Reweighting
Ourapproach: model ensemble • Modelensemble • As TrAdaboostproves, if the algorithm runs for N iterations, the average weighted training loss on source data from the ⌈N/2⌉-thiteration to the N-thiteration converges to zero. • Theoutputvalueis[0,1]
Experimentsettings • Environment: • Internal MapReduce-like machine learning framework • 100computingnodes • Metric:AUC(AreaunderROCcurve) [0,1] • Datasets:
Experimentresults • Source and target has very different data distribution
Experimentresults • Directly combining source and target does not work
Experimentresults • Our method has better AUC • Our method has less training time. • ( TrAdaboost 220 min vs Our 70 min)
Parametersensitivity: small N makes the ensemble not work • N:numberoftotaliterations; number of ensemble model • N too small: the number of ensemble model is too small so the algorithm does not work well.
Parametersensitivity: large N makes the model overfit • N too large: the number of ensemble model is too large so that the algorithm tends to overfit.
Parametersensitivity: zero alpha only uses target data • Alpha:sampling parameter • α is zero: no source data will be used. The algorithm only uses target data and do model ensemble so this algorithm will become similar to Adaboost.
Parametersensitivity: large alpha uses every source data • α too large: the probability value will be larger then 1. Thus, every source data instance will be sampled.
Data size ratio: # target / # source, neither too small nor too large. • Fix the target size, vary the source size. • Max at about 0.8 • It is necessary to carefully adjust the data size ratio instead of over utilizing the source data.
Promising directions and approach limitations • Our approach shows promising directions • Directlyusedinaverage-click-price(ACP) prediction • Other similartransfer learning scenarios(e.g. user risk prediction) • Some limitations in our approach • Current sampling strategy only uses the information of gradient • Do not take the sparsity of the advertisement data into consideration • How to efficiently do multiple-source transfer is challenging.
Summary • An iterative transfer learning method to deal with CTR prediction in online ads • A map-reduce like implementation makes the approach scalable • Real data experiment shows the effectiveness and promising direction syhmartin@yeah.net http://iiis.tsinghua.edu.cn/en/2014311424/ Thankyou!