490 likes | 515 Views
0. Modeling the Dynamics of Online Auctions Using a Functional Data Analytic Approach. Galit Shmueli (+ Wolfgang Jank) Dept of Decision & Information Technologies Robert H. Smith School of Business University of Maryland, College Park. December 2004. Overview. 0. Online auctions
E N D
0 Modeling the Dynamics of Online Auctions Using a Functional Data Analytic Approach Galit Shmueli (+ Wolfgang Jank) Dept of Decision & Information Technologies Robert H. Smith School of Business University of Maryland, College Park December 2004
Overview 0 • Online auctions • Importance • How they work • “Classical” empirical research and new opportunities • Where are the statisticians? • Using FDA for • Representing auctions • Studying auction dynamics • Comparing auctions • Exploring relations with other variables • Current & Future directions
Online Auctions 0 • Central in the eMarket place (eBay, Yahoo!, Amazon.com…) • High accessibility, low transaction costs • eBay has more than 27M active users (from over 61M registered). Every moment there are ~10M items across more than 43,000 product categories amounting to nearly $15 billion in gross merchandise sales (BusinessWeek, 2003)
We’re looking at this from a whole new perspective! (and lots of this can be applied to other eCommerce data) Online Auctions 0 The focus of much empirical research Players: IS and economists
eBay.com 0 • Is by far the largest C2C auction site • Buy/sell anything imaginable • (Almost) anyone can buy/sell. You need a credit card to register (free). • In lots of countries
How eBay auctions work:Selling an item 0 Set some auction features (duration, opening price,…) Describe item Bells & whistles + more info on shipping, text description, payment options, etc.
How eBay auctions work: Bidding on an item 0 • Choose auction • Proxy bidding: • Place max bid • eBay bids for you • Price increases by one increment • Highest bidder pays 2nd highest bid • Highest bid is not disclosed!
Bidding on an item – cont. 0 • Auction theory: bid your max and leave • In practice: lots of sniping • Sniping agents (wow – more data!)
Research Q’s Asked by Economists and IS researchers 0 • Auction design mechanisms – mostly regressions on final price • Lucking-Reiley et al: Opening Bid, Number of Bidders, Number of Bids, Length of Auction, Reputation of Seller • Bapna et al: Bid increments • Winner’s Curse – structural model + prior • Winner likely to over-pay (Bajari & Hortacsu) • Bid Shilling – t-tests • Fraudulent “price-pushing” by the seller (Kauffman & Wood) • Reputation and trust – regression, probit model • Seller rating effect on price or P(+ rating) (Wood et al; Ba & Pavlov) • Bid Sniping – bid time CDF • Last minute biding to increase chances of success (Roth & Ockenfels) • But early bidding also prevalent • Bidding strategies – k-means clustering • 3 strategies: Participators, evaluators, opportunists (Bapna et al.)
0 No statisticians playing the game!
Why? Data Accessibility? 0 • eBay displays data for all auctions completed in the last 30 days. • Millions of auctions (how do you sample?) • Data are on in HTML format!!!! • Researchers use spiders (web agents) • People usually write their own code • eBay changes the rules and formats • eBay does NOT like spiders • You really need some programming expertise • Commercial software (Andale, Hammertap) • data directly from eBay • limited (mostly aggregates) • Expensive, unreliable
Lots of opportunities there! 0 • No statistical framing (sample/pop, type of data, etc) • No data visualization • Mostly “traditional” statistical methods • Ignoring data • Sampling issues • and more….
Unstated assumptions in current (static) approach 0 • An auction is an observation from a population of eBay auctions (US market, certain time-frame, etc.) • Sample collected by web-spider is random and representative of population. • Data structure: multivariate, with a fixed set of measurements on each auction • Auctions are independent
Visualizing Online Auction Data 0 • Lots of empirical research, but no-one is LOOKING at the data! • Ordinary displays not always useful Shmueli & Jank, “Visualizing online auctions”, JCGS, forthcoming
Enlightening Visualizations 0 Detecting Fraud (color = seller rating)
Advanced visualizations for interpreting modeling results 0 Surplus from eBay auctions (Bapna, Jank, & Shmueli, 2004) • Data from sniping agent gives highest bid • What are factors that affect surplus? • Advanced, interactive visualizations help learn the multidimensional structure of the data and to interpret results of complicated models! • Beats heavy statistical software like SAS
Back to current research 0 • Almost exclusively static • Auction = Snapshot at end • response: price, # bids,… • But eBay does show complete bid histories!
Our new dynamic approach 0 • Auction = complete bid history • Response: • Price over time • # of bidders over time • Average bidder rating over time… • Interested in auction dynamics! • Car/horse race
Data Structure: Challenges 0 • Each bid history = time series measured at unequally-spaced time points, closed interval. • Bidding is usually sparse at mid-auction and dense at auction end • Different auctions • Different number of bids, placed at different times • Different durations • Much variability across auctions • We have LOTS of auctions! • How to represent an auction?
Alternative representation: Curves! 0 • Functional Data Analysis is a modern statistical approach suitable for modeling objects (curves, 3D objects, etc), not just scalars/vectors. • Made famous by the two monographs of Ramsay & Silverman • http://ego.psych.mcgill.ca/misc/fda
Example of FDA: Handwriting 0 • Possible goal: detect fraudulent signature • Twenty traces of writing “fda” by same person • We can think of these traces as functions with X,Y coordinates • Use FDA to explore and model similarities and differences between the 20 traces.
FDA for bidding data 0 • Bids from single auction are represented by single entity • Assume a very flexible underlying curve for all auctions • Storage and computation: represent each auction by some basis function and a set of coefficients • Perform statistical analyses on • the coefficients, or • a grid taken on the curves
The bidding path (=the functional object) 0 • An auction is represented by its bidding path, a continuous function relating $ (or other!) over time • In practice, bidding paths are observed at random discrete time points. These are in the observed bid histories • We aim to reconstruct the unobservable continuous profile from the observed discrete bid history
fit curvature Recovering the bidding path 0 • Use smoothing to recover the bidding path • One useful smoother is the Penalized Smoothing Spline • Piecewise polynomial with smooth breakpoints • Penalize curvature by minimizing
Smoothing Splines for recovering bidding paths 0 • Strengths • Good tradeoff between fit and local variability • Computationally cheap (+ numerically stable): well approximated by a finite set of Bspline basis functions • For smooth derivatives penalize higher order derivatives • Challenges • Must determine l and knots • Requires prior interpolation+smoothing • Curves not necessarily monotone
From bid histories to bidding paths: potential enhancements 0 • Use live-bids rather than proxy-bids • Use monotone splines (non-decreasing) • Integrate auction theory into curve requirements (knot positions, polynomial order, etc)
Auction #2 Learning about Auction dynamics (the auction as a car race) 0 • 1st derivative = velocity, 2nd = acceleration, 3rd=? Auction #1
A sample of auctions 0 • 158 auctions for new Palm M515 PDAs • 7-days, new $250
Curve fitting: Sensitivity Analysis • Smoothing splines + pre-smoothing monotone smoothing splines • Choice of knots hardly influential • Smoothing parameter chosen ad-hoc
Basis function expansions • Splines: linear combination of B-splines • Monotone: The ratio can be approximated by a linear combination of basis functions j • Fitted function:
“Handling” the curves 0 • Two approaches • Functional datum (fd): • Use curve coefficients directly in analysis • When: linear representation + linear operations • Grid • Use a set of discrete values from a grid taken on the curves. • When: nonlinear operations and nonlinear representation (e.g. monotone splines)
Exploring & Modeling The Auction Curves 0 • Summaries of curves • Average curve • 95% CI for curve • Bid paths and/or derivative curves • Compare subsets of auctions
Exploratory analysis: Auction Clustering 0 • Using the bidding curve coefficients we apply cluster analysis (k-medoids) Early bidding Sniping
Comparing cluster dynamics: Phase-plane plots 0 Sniping Early bidding
Characterizing the 2 Profiles 0 Opening BidSeller RatingBidder Rating # Bids Early 46.01(7.94) 908.16 (106.08) 101.86 (10.42) 7.04 (0.52) Late 22.31(6.94) 1171.54 (292.89) 94.29 (13.29) 11.13 (0.83) • Two profiles diverse wrt Opening Bid • Investigate this influence dynamically via Functional Regression
functional-PCA : When do auctions behave differently? 0 Principal components as perturbations of the mean • When during the auction do bid curves deviate most/least? • PCA+ varimax • 300 premium wristwatches
Functional Regression Models 0 • Involve a curve as a response/predictor • In our case, response = bidding path • Predictors: • Static: opening price, seller rating, etc. • Dynamic: current # bidders, current avg bidder rating • Grid: fit a regression model at each grid point and then interpolate the coefficients
Functional Regression of Bidding Path vs. Opening Bid 0 Estimated Parameter Curve
Functional Regression of Bidding Acceleration vs. Opening Bid 0 Estimated Parameter Curve
Interpretation: Opening Bid and Auction Energy 0 Value of Item Value of Item Potential Market Energy left in the auction Open Bid Open Bid
Current & Future Directions 0 • Real-time forecasting of bidding paths of ongoing auctions • Representing an auction in 2D (price + #bids over time) • Modeling other aspects of auction data • Consumer surplus – with Ravi Bapna • Bid arrival process – with Ralph Russo (Iowa) • New predictors: currency, category, and dynamic ones • Effects of auction design changes • eBay addiction • Other eCommerce and IT applications • Papers: http://www.smith.umd.edu/ceme/statistics
0 Extras
Smoothing Spline Parameters • Order of the Spline • cubic spline: popular, provides smooth fit; 2nd derivative (curvature), no breakpoints • To obtain m smooth derivatives, use spline of order m+2. • Knot locations (breakpoints) • The more knots, the more flexible (wiggliness) • Tradeoff between data-fit and variability of function • Smoothness penalty parameter l • l 0: fit approaches exact interpolation • l : fit approaches linear regression
Alternatively: bspline basis functions • B-splines on fixed grid of knots (s1<s2<…sq) give good approximation to most smooth functions • Computational aspect: numerical stability, especially for irregularly distributed time-points • They form a set of natural cubic splines with limited support Basis function i coefficients