250 likes | 271 Views
This study focuses on improving Click-Through Rate (CTR) prediction in online advertising through a transfer learning approach. Challenges arise with small, niche, or newly developed products lacking sufficient data for accurate CTR prediction. By leveraging transfer learning from large to small product data sets, the research presents an effective method for enhancing CTR prediction accuracy and optimizing revenue. Real ad data experiments demonstrate the feasibility and scalability of the proposed approach, highlighting its potential for application in various advertising scenarios. However, limitations include the reliance on gradient information for sampling and a need to address advertisement data sparsity. Future research directions include exploring applications beyond CTR prediction and addressing challenges in multiple-source transfer learning.
E N D
ImprovingClick-ThroughRatePredictionAccuracyinOnlineAdvertisingbyTransferLearningImprovingClick-ThroughRatePredictionAccuracyinOnlineAdvertisingbyTransferLearning Yuhan Su1, ZhongmingJin2, Ying Chen2, XinghaiSun2, YamingYang2, FangzhengQiao2, Fen Xia2, Wei Xu1 1TsinghuaUniversity,2Baidu,Inc.
Onlineadsrevenue: three factors Revenue=PV *CTR* ACP NumberofPageViews AverageClickPrice Click-throughRate =#clicks/#views
Challenge:smallproductslackdata • Small,niche-marketproducts • Newlydevelopedproducts • LackdataforCTRpredictionmodel. • Largeproductdatato • helpsmallproduct • Differentproductshavedifferentdistribution
Transferlearning:fromsourcetotarget Source Differentdistribution ? Differentdistribution LargeProduct Target Small Product Transferlearning
Ourcontributions Aneffectivetransferlearningapproach Smallproducts CTRprediction Amap-reduceefficientimplementation Realadsdataexperiment
Relatedwork They • Singleadvertisementproduct We • Multipleproducts • CTRprediction • Model • Feature • Transferlearning • Instancetransfer • Featurerepresentationtransfer • Parametertransfer • Relationalknowledgetransfer • Deeptransfer We • Handleamuchlargerdataset They • Fewworkonlargeadsdata
BaiduAllianceAdssystem Product1 (3) Send info (4) Returnbidding priceandmaterials (2) Sendrequest andrelatedinfo ADX Website (1) Surf Product2 User (6) See ads (5) Return ads … Productn
Ourapproach: framework pre-trainatargetmodel; loopfor Ntimes { samplesourcedata; combinedtraining; datareweight; } outputtheensemblemodel;
Ourapproach:intuition Target Target Target Target Source Source Source Source Initialization Sampling Reweighting Training
Ourapproach: sampling strategy Target Source • Sourcedata sampling • Thesamplingprobabilityisproportionaltothegradientonthetrainedmodel. • Intuition:Thelargerthegradientis,themorethemodelneedsthisdatainstance. Sampling
Ourapproach: data reweighting Target Source • Datareweighting • correctly classified: weight will not change • target data misclassified: increase dataweight • source data misclassified: decrease data weight Reweighting
Ourapproach: model ensemble • Modelensemble • As TrAdaboostproves, if the algorithm runs for N iterations, the average weighted training loss on source data from the ⌈N/2⌉-thiteration to the N-thiteration converges to zero. • Theoutputvalueis[0,1]
Experimentsettings • Environment: • Internal MapReduce-like machine learning framework • 100computingnodes • Metric:AUC(AreaunderROCcurve) [0,1] • Datasets:
Experimentresults • Source and target has very different data distribution
Experimentresults • Directly combining source and target does not work
Experimentresults • Our method has better AUC • Our method has less training time. • ( TrAdaboost 220 min vs Our 70 min)
Parametersensitivity: small N makes the ensemble not work • N:numberoftotaliterations; number of ensemble model • N too small: the number of ensemble model is too small so the algorithm does not work well.
Parametersensitivity: large N makes the model overfit • N too large: the number of ensemble model is too large so that the algorithm tends to overfit.
Parametersensitivity: zero alpha only uses target data • Alpha:sampling parameter • α is zero: no source data will be used. The algorithm only uses target data and do model ensemble so this algorithm will become similar to Adaboost.
Parametersensitivity: large alpha uses every source data • α too large: the probability value will be larger then 1. Thus, every source data instance will be sampled.
Data size ratio: # target / # source, neither too small nor too large. • Fix the target size, vary the source size. • Max at about 0.8 • It is necessary to carefully adjust the data size ratio instead of over utilizing the source data.
Promising directions and approach limitations • Our approach shows promising directions • Directlyusedinaverage-click-price(ACP) prediction • Other similartransfer learning scenarios(e.g. user risk prediction) • Some limitations in our approach • Current sampling strategy only uses the information of gradient • Do not take the sparsity of the advertisement data into consideration • How to efficiently do multiple-source transfer is challenging.
Summary • An iterative transfer learning method to deal with CTR prediction in online ads • A map-reduce like implementation makes the approach scalable • Real data experiment shows the effectiveness and promising direction syhmartin@yeah.net http://iiis.tsinghua.edu.cn/en/2014311424/ Thankyou!