620 likes | 879 Views
Users‘ Search Behaviour Analysis. Berlin, 9 November 2016. Introduction : our site. Introduction : Car categories. We have 9 car categories corresponding to Acriss codes. Acriss code example: CDMR. Car category. Here to insert the screenshot from the site. Goals.
E N D
Users‘ Search Behaviour Analysis Berlin, 9 November 2016
Introduction: Car categories Wehave 9 carcategoriescorrespondingtoAcrisscodes Acriss code example: CDMR Car category Here to insert the screenshot from the site
Goals • Analysis part: find anyinterestingpatterns in users‘ searchbehaviour • Modeling part: build a modelthatwouldbeabletopredictthecarcategoryuserismorelikelytochooseusingknowledgesfromtheanalysis • Usageofdiscovery: Testing
Analysis part: find anyinterestingpatterns in users‘ searchbehaviour.
Search flowsteps Step2 Offerselectstep Step1 Search widget Location: Berlin Flughafen Rental days: 2 days Pickup date: 28 Nov, Friday
Search flowsteps Step3 Offerviewstep Location: Berlin Flughafen Duration: 2 days Pickup date: 28 Nov, Friday Car category: BMW 3er Too expensive? Maybe not airportandcheapercarmodel
Search flowsteps Step3 Offerviewstep Location: Berlin Alexanderplatz Duration: 2 days Pickup date: 28 Nov, Friday Car category: BMW 1er Too expensive? I want BMW at least 1er, maybe 1 rentalday, just Saturday?
Search flowsteps Step3 Offerviewstep Location: Berlin Alexanderplatz Duration: 1 days Pickup date: 28 Nov, Friday Car category: BMW 1er Almostthe same expensive? Then back to 2 days, maybethenextweekend? And so on…
Analysis. Introduction We will highlightapproachesforthefollowingquestionsandsomeofthefindings will beused in a model Attributes impact on thenumberofsearchesusersmake Changingwhichattributesaffecttheuser‘sdecisiontogotothestep 3 (offerviewpage) Isthereanygeographicaleffect? In termsofthevisitor‘sregionandthestationwherecarissearched? Howdoestherentalcostchangefromsearchtosearchandcanweclusterusersbased on this? Can weclusterusers on theiractivites? I.e. howmuchtheyinteractwithwhichattributes. Can weclusterusersbased on theirpatterns? I.e. thewayofchanging different attributes.
Analysis: Part 0. Data preparation Sample ofthedatasetforonevisitor. Eachlinerepresents an action: eitherOfferselectstep, Offerviewstep, Customer detailsor Order. Search view step 2 Offer view step 3 *We exclude corporate users **Data logs from Omniture web analytics system
Analysis: Part 0. Data preparation Convertingcarlinestoordinal variables F (Fullsize) -> 6 M category (Mini) -> 1 L (Luxury) -> 7 E (Economy) -> 2 P (Premium) -> 8 C (Compact) -> 3 X (Special) -> 9 I (Intermediate) -> 4 S (Standart) -> 5
Analysis: Part 1 Let‘sstartwith a simple regression. The modelwouldlooklike Search number ~ daystopickup + pickuphour+ returnhour + rentaldays + weekday + city Problem Repeatedmeasuresasoneusercanmake multiple searches in a row. (markedgreen in tablebelow)
Analysis: Part 1 Ourapproachtosolvetheproblem: generalizedmixturemodels GLMMs as an extensionofgeneralized linear models (e.g., logisticregression) toincludebothfixedandrandomeffects (hencemixedmodels). The general form ofthemodel (in matrixnotation) is: wheretheleftpartoftherightsideisaccountableforfixedeffectsandtheright design matrixZisaccountableforrandomeffects
Analysis: Part 1 • We don’t separately characterize subpopulations: instead the joint model is used. • Helps with repeating measures • The response variables cancomefrom different distributionsbesidesgaussian. In addition, ratherthanmodelingtheresponsesdirectly, some link functionisoftenapplied, such as a log link. In our case of searches we are dealing with count outcome and will be using Poisson distribution. Generalizedmixturemodels: advantages
Analysis: Part 1 Updated model Search number ~ daystopickup + pickuphour + returnhour + rentaldays + weekday + (1|visitorID) + (1|StationID) + (1|city) Random effects Indicating (1|visitorID) wemeantheinterceptvaryingbyvisitorids. Similarforstation IDs and City. Search number – isthecountoutcome, Poissondistribution
Analysis: Part 1 The outputforrandomeffects The point of an interest is a variance here: in this case the variability in the intercept (on the log odds scale) between visitors, between stations and between cities. The standard deviation is also displayed (simply the square root of the variance, not the standard error of the estimate of the variance)
Analysis: Part 1 The outputforfixedeffects The estimates can be interpreted essentially as always. For example, for return hour, a one unit increase in return hour (i.e. closer to the evening) is associated with a .0745 unit increase in the expected log odds of searches. Similarly, people who are looking for a car on Friday are expected to have much higher log odds of making a high search number than people who are interested in renting on Saturday.
Changing which attributes affect the user‘s decision to go to the step 3 (offerview page)
Analysis: Part 2 Model • The outcome: • if a user is going to jump into Step 3 Offerview page? (binomial type) • Input variables: • Search number • Days to pickup change (delta) • Pickup time change (delta) • Return time change (delta) • Rental days change (delta) • Weekday pickup date Problem Repeated measures: multiple searches for every user
Analysis: Part 2 Generalizedmixturemodel Offer Viewed 1/0 ~ searchNum + daysToPickupChange + pickupTimeChanged + returnTimeChanged + rentalDaysChange+ weekday_pickupDate+ (1|visitorID) + (1|city) + (1|StationID) Output
Is there any geographical effect? In terms of the visitor‘s region and the station where car is searched?
Analysis: Part 3 Dataset sample Our belief We consider a city of the user and a station as location random effects and want to see how the geography location effects the variation in the number of searches users make. We include geography by predicting dialect distances with a Generalized Additive Model which models the interaction between longitude and latitude of either city’s center or station.
Analysis: Part 3 Generalized Additive Model Solution The functions fmay be functions with a specified parametric form (for example a polynomial, or a spline depending on the levels of a factor variable) or may be specified non-parametrically, or semi-parametrically, simply as 'smooth functions', to be estimated by non-parametric means.
Analysis: Part 3 Visualizationoftheresults Darker areas resonate with the areas of the users with location effect for the number of searches. We can see from this graph that people from the southern areas are likely to make more searches and for them we have a stronger geo effect in the model.
Analysis: Part 3 Visualizationforthe GAM modelforstations: Search number ~ Station Latitude+ Station Longitude Darker areas resonates with the areas of the stations users make more searches for. For example, this graph clearly shows that people looking for car rentals on the North of the country are more affected by the geo variables and incline to make more searches that people looking for cars on the South.
How does the rental cost change from search to search and can we cluster users based on this?
Analysis: Part 4 Assumptions • Users whomake at least 4 searches on DE site • Convertedcarlinesintoordinal variables (Part 0, dataprocessing) • We apply log transformation. It’s a good way to make a series stationary on variance.
Analysis: Part 4 Log(Cost) overthesearches • Eachbox – a randomuser • A simple linear approximationisusedherefor log(cost). • Howeverwecanseethat in manycaseseven a simple linear approximationcandescribethesearchpatternsquitewell.
Analysis: Part 4 1st Clustering Approach: linear approximation Based on the angle oftheslopewecanclusterusers on threecategories: Users withdecreasingtrend (orange) Users withincreasingtrend (green) Notrend(yellow)
Analysis: Part 4 Possiblesolution: wavelettransforms We still have a lotofcaseswhereusers jump forwardandbackwardwith extreme values, i.e. very volatile. Examplesareshownbelow. Problem: volatile searches
Analysis: Part 4 2nd Clustering Approach: wavelettransforms The assumption: users searches is a non-stationary time series trend. We use the Wavelet transformation and here’s the example of wavelet spectre for a random user. We can see that wavelet transform is good for detecting spikes. Other examples on the next page.
Analysis: Part 4 Spectreexample The wavelet power spectrum, using the Morlet wavelet. The x-axis is the wavelet location in time, i.e. over the number of searches. The y-axis is the wavelet period. The black contours are the 15% significance regions, using a red-noise background spectrum. The red areas indicate the periods with high activity, i.e. users were very volatile in terms of picking up the rental option.
Analysis: Part 4 Other examplesofwaveletspectresforrandomusers
Analysis: Part 4 Results • Here’s the clustered plots of random selection in visitors. We’ve obtained 3 clusters using wavelet clustering with computing wavelet spectra (wavelet transforms) and dissimilarity and distance matrices. It’s noticeable how volatile are the plots in the third cluster (the right one). Wavelet transform for clustering is good for clustering users with volatile searches.
Can we cluster users based on their activities? I.e. how much they interact with which attributes?
Analysis: Part 5 Problem Clustering usersbased on howmanytimestheyinteractedwith such attributesas: • Date interactions • Station type interactions (threelevels: citystations, bahnhofstationsandflughafenstations) • Car modelsinteractions • Rental daysinteractions • Pickup time interactions • Return time interactions Ourapproach: Kmeansclusteringalgorithm
Analysis: Part 5 Table based on clustersandbriefdescriptionofeachcluster Cluster 1: Focus on Car Model Comparison (4.46 interactions on average) Cluster 2: Focus on changingrental time & car model (daystopick-up, pickup - & return time) Cluster 3: Focus on car model, rentaldays & daystopick-up Cluster 4: Focus on Car Models and Station Location Cluster 5: Focus on Car Models, Days to Pick-Up and Station Location Cluster 6: Strong Focus on Days to Pick-Up, Car Model & Rental Days
Can we cluster users based on their patterns? I.e. the way of changing different attributes.
Analysis: Part 6 First, let‘sseewhichfactorsexplaintherentalcostchangethemost. Building a mixture model with a response ~ Gaussian distribution. Random factors are visitor IDs and Cities carRentalCost~ daysToPickup_delta + carRentalDays_delta + carCategory_delta + pickupDate_time_delta + StationID_delta+ (1|visitorID)
Analysis: Part 6 Factorscorrelation Users are more likely to interact with different search parameters in sequential way, not changing several parameters at the same search. Upcomingproblem Can’t directly measure either correlation between search parameters
Analysis: Part 6 Ourproposedsolution To work with delta values of each attribute on a visitor level. Delta always the distance between the first and the last one. Example: 1st search: 8 rental days 2nd search: 8 rental days (delta = 0) 3rd search: 10 rental days (delta +2) 4th search: 9 rental days (delta -1) 5th search: 12 rental days (delta +3) If we calculate the mean of the vector (0,2,-1,3) = +1.
Analysis: Part 6 Kmeansclustering: visualization Rental days delta Days to pickup delta Car categories
Build a modelthatwouldbeabletopredictthecarcategoryuserismorelikelytochoose.
Model: input variables • Change in Rental Costs(Analysis part4) • First Search Input • Car Category • Rental Days • Days to Pick-Up • Distancebetweenthefirstandthe last selectedlocationssinceweassumethefirstone was themostsuitable • NumberofSearches • Numberof Visits • Attributes Clusters, patterns(Analysis part 6) • Interaction Segments (Analysis part5)
Model: algorithm Ourapproach: extreme gradientboostingalgorithm Extreme Gradient boosting is a good solution here as it is a robust classifier that can perform on a dataset on which minimal effort has been spent on cleaning and can learn complex non-linear decision boundaries via boosting. Gradient boosting – boosting many weak predictive models into a strong one, in the form of ensemble of weak models.
Model: extreme gradientboosting Tree Ensemble Number of searches > 6 Revenue class decreasing yes no yes no First viewed category C yes no -1 -0.9 +0.9 +2 +0.1 K – the number of trees, F – the set of all possible trees
Model:extremegradientboosting ObjectiveFunction: Training Loss + Regularization We want to minimize Commonlyusedtraininglossfunction: squarederrorbetweentheactualandpredictedvalues The loss function is differentiable
Model: extreme gradientboosting Additive training Using additive training we add the tree which minimizes our objective function Where g and h are differentials of loss function