1 / 22

机器学习在互联网广告中的应用

机器学习在互联网广告中的应用. 庄宝童. Agenda. 介绍 机器学习应用 Common utility Advertiser Publisher user 总结. 为什么需要互联网广告?. 流量(用户)是互联网 公司的重要资产 互联网内容免费模式,需要流量变现来维持运营 广告收入占比: Google : 95% (2012 , http ://investor.google.com/financial/tables.html ) Facebook : 83% ( 2011 ) Baidu :? Alibaba :?

Download Presentation

机器学习在互联网广告中的应用

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 机器学习在互联网广告中的应用 庄宝童

  2. Agenda • 介绍 • 机器学习应用 • Common utility • Advertiser • Publisher • user • 总结

  3. 为什么需要互联网广告? • 流量(用户)是互联网公司的重要资产 • 互联网内容免费模式,需要流量变现来维持运营 • 广告收入占比: • Google :95% (2012,http://investor.google.com/financial/tables.html) • Facebook:83% (2011) • Baidu:? • Alibaba:? • 特点:效果量化可追踪,运营销售参与少,曝光成本低 • 对互联网广告公司而言,是一种理想的“印钞机”商业模式(吴军,《浪潮之巅》)

  4. 我们需要什么样的广告? Find the best match between a given userin a given contextand a suitable advertisement -- Andrei Broder and Dr. Vanja 2011

  5. Pick best ads Ads Ad Network Page User Publisher Response rates (click, conversion, ad-view) Bids conversion Auction Statistical model Select argmax f(bid, rate) Advertisers

  6. Players in the ecosystem • Publisher’s utility:Revenue,user engagement • Advertiser ‘s utility:ROI • User’s utility:relevance

  7. mechanism design • 合同定价( futures market),CPM 或 CPT 计价 • 拍卖定价(spot market) • GFP • GSP • VCG • 计价方式 • CPM (Cost per Mille-impressions): publisher 风险最小,如 yahoo,sina的品牌广告 • CPC (Cost per Click) : publisher 和 advertiser 风险共担,googleadwords,百度凤巢等大部分属于此类 • CPA (cost per Action):advertiser 风险最小,如淘宝客。

  8. CPC 的ranking functions • Bid ranking:bid • 源于goto.com (overture 前身,后被yahoo收购) • Revenue ranking:CTR * bid • Google 首创 • 核心问题:CTR prediction

  9. model P(click | user, ad, context) • ad : creative, bid-terms, landing page, campaign, advertiser, format (text/image/video), size, etc. • user : cookie, demo, geo, behavioral, activity history • context : query, publisher, page-content, session, time

  10. algorithms • Logistic Regression + feature engineering (google, yahoo, baidu, facebook , etc) • Microsoft (BaysianProbit Regression) • Google : boosting http://users.soe.ucsc.edu/~niejiazhong/slides/chandra.pdf • Taobao (Mixture of Logistic Regression) • trends:big data + nonlinear/feature learning

  11. challenges • Sparsity: use Natural hierarchies or Auto-generated hierarchies • Missing data • Bias:position,ad category,etc • Dynamical /seasonal effects • Spam/noisy data

  12. features • Features: • Click feedback features (COEC) • Query features • Query-ad text matching features • Preprocess: • 离散化 分段 • 特征交叉 • 层次特征—处理稀疏性(variance bias trade-off) • 特征平滑,变换

  13. training • 训练集 • 正负样本分层采样 – imbalance training 问题 • Instances:1B • Features:10B • 分布式训练 • MPI (baidu, taobao) • map reduce (google)

  14. Evaluation • Offline evaluation • MSE, MAE • AUC • Online A/B test • 分层实验平台(google,Overlapping Experiment Infrastructure: More, Better, Faster Experimentation) • 正态/二项分布样本的假设检验

  15. 实践 • 实时计算,性能问题 • 简单有效的候选集选取 • 精确计算 • Online learning

  16. Ad 2 Ad 1 Probability density CTR Explore/Exploit • 低 mean ,高方差的 ads 应该給予展示机会 • E.g. Consider 2 ads (same bids) • Goal: Select most popular • CTR1 ~ (mean=.01,var=.1), CTR2~ (mean=.05,var~0)

  17. E&E 常用算法 • Upper confidence bound policy (UCB) • Mean + uncertainty-estimate • mean + k* sd(estimator) • Thompson sampling • 从 posterior 里随机采样,比较适合 Bayesian 类的算法 • 问题 • 广告集合巨大,explore 代价过大 • 跟传统 Multi-Arms bandits 问题不太一样,广告集合是动态的,且每次会选择多个

  18. Advertiser’s perspective • Keyword selection • Bid optimization • Smart pricing • Anti fraud • Impression forecasting: time series • Smooth delivery: allocation algorithms

  19. CVR prediction • 用途: • Smart pricing :外部流量千差万别,广告主没有精力也能力做分媒体的出价,需要按照点击价值进行智能出价 (Google, smart pricing grows the pie),以保证广告主的ROI • DSP: real time bidding • CPA 模式的rank function:ctr * cvr * bid • 做法:与CTR预估问题类似,但更困难 • 转化数据获取困难,且更为稀疏 • 不同广告主的转化定义不一致

  20. User’s perspective • User fatigue • User privacy • Behavioral targeting / retargeting • Query intent • Low quality ads detection(google, detecting adversarial advertisements in the wild)

  21. Publisher’s perspective • Revenue • User engagement

  22. 谢谢

More Related