1 / 15

Motivations

Benefits of InterSite Pre-Processing and Clustering Methods in E-Commerce Domain Sergiu Chelcea, Alzennyr Da Silva, Yves Lechevallier, Doru Tanasa, Brigitte Trousse AxIS Research Team INRIA Sophia Antipolis and Rocquencourt. Motivations. To show on the clickstream dataset proposed

Download Presentation

Motivations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Benefits of InterSite Pre-Processing and Clustering Methods in E-Commerce DomainSergiu Chelcea, Alzennyr Da Silva, Yves Lechevallier, Doru Tanasa, Brigitte TrousseAxIS Research TeamINRIA Sophia Antipolis and Rocquencourt

  2. Motivations To show on the clickstream dataset proposed for ECML/PKDD 2005 Discovery challenge the benefits of our InterSitepre-processing method proposed by Tanasa in his PhD Thesis (2005) And the benefits of a new crossed clustering method developed by Lechevallier&Verde and published in (2003, 2004) on Web logs 2 main viewpoints: User and web site charge

  3. Plan 1. Intersite Data Pre-Processing - introduction of user’s intersite visit « Group of SessionIDs » - first statistical Intersite analysis 2. Crossed Clustering Approach - confusion table with classes of time periods and classes of product types - analysis on the most used shop: shop 4 3. Conclusions

  4. Data pre-processing Table 1. Format of page requests Initial data: Table 2. Number of requests per shop

  5. Data pre-processing Tanasa & Trousse (IEEE Intelligent Systems 2004) Tanasa ‘s Thesis (2005)

  6. Data pre-processing • Data fusion, data cleaning Table 3. Transformed log lines • Data Structuration • SessionID a single visit on each shop • Towards the notion of user’s intersite visit: • we group such SessionIDs that belongs to a single user (same IP) • into a « Group of SessionIDs ». • We compare the Referer with the URLs • previously accessed (in a reasonable time window) • 522,,410 SessionIDs into 397,629 Groups, • equivalent to a 23.88% reduction;

  7. Relational DB model Data summarisation

  8. Data pre-processing Fig. 1. Visits per days and hours: (a) globally, (b) multi-shop • Low number of new visits on Saturdays and Sundays during the lunch time • The high number of new visits on Tuesdays and Wednesdays • Same results a) and b)

  9. Crossed Clustering Aproach for Time Periods/Product Analysis Method developed by Yves Lechevallier & Rosanna Verde (2003,2004) Data: Selection of ls pages in shop 4 (the most used)

  10. Crossed Clustering Aproach for Time Periods/Product Analysis Method developed by Yves Lechevallier & Rosanna Verde (2003,2004) Relational BD model : We add easily a crossed table Line: an individual (weekday, one hour) 7 days X 24 hours = 168 individuals Column: a multi-categorical variable representing the number of products requested by users into the specific time slice

  11. Crossed Clustering Aproach for Time Periods/Product Analysis Table 4. Quantity of products requested by weekday x hour and registered on shop 4

  12. Crossed Clustering Aproach for Time Period/Product Analysis 57,7% Table 5. Confusion table

  13. Crossed Clustering Aproach for Time Period/Product Analysis Example of one surprising result: the class Product 5 is defined by one type of products « Free standing combi refrigerators » consulted predominantly on Fridays from 17:00 to 20:00 (class period 6) 57,7% of such a product type requested on this period

  14. Conclusions 1. Intersite Data Pre-Processing - structuration into user’s intersite visits « Group of SessionIDs » - first statistical Intersite analysis - anomalies and recommandations for the dataset 2. Crossed Clustering Approach - first application of such a method on time periods of Web logs and in e-commerce domain - promising results

  15. Data pre-processing Inconsistency problems: - table kategorie: found repeated entries and different entries with same ID • for some page types (dt, df) the given parameter represented actually a • specific product, not the given product description (from products table). • extra parameters equivalent to the give ones for some page types: • i.e. for ct page type, id is equivalent to the given c parameter • missing values (descriptions) in tables: • 3 values in product table and 64 in category table • multiple site SessionIDs: 13 cross-server visits had same SessionID on the • visited sites (up to 4 sites); SessionID should change on each new site; • multiple IP SessionIDs: 3690 visits (SessionIDs) were done from more than • one IP (anonymization proxies ?).

More Related