310 likes | 441 Views
Improving the Web Design Mining Web Data at Cityjob.com. Hing-Po Lo, Linda Lu, Miriam Chan. Department of Management Sciences. City University of Hong Kong, Hong Kong. mshplo@cityu.edu.hk. I. Introduction. Customer Relationship Management. Data Mining. The Web. A. The Web.
E N D
Improving the Web Design Mining Web Data at Cityjob.com Hing-Po Lo, Linda Lu, Miriam Chan Department of Management Sciences City University of Hong Kong, Hong Kong mshplo@cityu.edu.hk
I. Introduction Customer Relationship Management Data Mining The Web
A. The Web • More than 200 millions surfers per day • Huge volume of data captured from the Web • Only 2% of web data analyzed US$B
B. Customer Relationship Management • DOT COM companies • work in an “information-intensive” and “ultra-competitive” mode • require the use of CRM to establish a personalized • relationship with their customers
C. Data Mining Tools • There are many software and web vendors that may help to explore and mine the web log files. • Most study the “clickstream” at the “session level”. In order to conduct CRM, one has to analyze the web log file at the “customer level”. • A tailor-made software using SAS macro and Enterprise Miner has been developed.
Cityjob.COM • It offers information on almost all posts available from major companies in HK. • It receives on average over several thousand visitors per day.
II. The Data • Study Period: • 11 December 2000 to 4 February 2001 • Three types of data files: • Web log files; • Subscribers’ profiles; • Jobs’ profiles.
Web log files #Software: Microsoft Internet Information Server 4.0 #Version: 1.0 #Date: 2000-12-11 00:00:00 #Fields: date time c-ip cs-username s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-win32-status sc-bytes cs-bytes time-taken cs(Cookie) 2000-12-11 00:00:00 208.223.166.3 - W3SVC4 PROD5_WEB 202.130.170.225 GET /default.asp - 200 0 15838 645 1297 RMID=d0dfa603398e0850;+CityjobID=LASTUPD=20001130&LOGIN=sloo;+IND=000;+OPN=000;+CTY=091;+RDB=c80200000000000000020028311b1b0000000000000000;+ASPSESSIO
2. Subscribers’ profiles Cont’d
Web log files Jobs’ files Subscribers’ files
SAS macros were written to perform the following tasks: A: Reading the web log files B: Cleaning the data files C: Creating new variables D: Merging the data files E: Prepare different SAS data files
Useful Summary Information • Subscribers’ profiles • Jobs’ profiles • Web log files • Web log files + User ID • E. Web log files + Job ID
. Collaborative Filtering 1. By Association Rules • Whenever a visitor enquires about a particular job, we can “cross sell” similar jobs by recommending other jobs that have the highest association with the original one. • The association is based on the click history of all the visitors to the Web.
For example,if • Job A: cityjobCF520: • Title: Assistant Accountant; Qualification: Diploma; Working experience: one year then • Job B: cityjobCF180: • Title: Assistant Accountant; Qualification: Diploma; Working experience: three year • Job C: cityjobCF100: • Title: Assistant Accountant; Qualification: University/College; Working experience: not specified • Job D: cityjobCEUJ0: • Title: Assistant Accountant; Qualification: Not specified; Working experience: two years
This group of 4 jobs has a • Confidence Value of 50.3% : • given a visitor enquires about job A, the probability that he would also enquire about jobs B, C, and D is 0.503; • Lift Value of 298.46 : • if a visitor has enquired about job A, he is almost 300 times more likely to enquire about jobs B, C, and D than a visitor chosen at random.
2. By Popularity Index For example,if • Job A: cityjobCDU20 • Title: EXECUTIVE TRAINEE - INVESTMENT PRODUCTS, Type: FIN, Working Experience: 0, Qualification: UC, Industry: BNK, Level: JUN, Index of popularity: 64.9. then (with same type, industry and qualification) • Job B: cityjobCM470 • Title: ASSOCIATE (TREASURY), Type: FIN, Working Experience: 3, Qualification: UC, Industry: BNK, Level: JUN, Index of popularity: 59.2. • Job C: cityjobCM470 • Title: ASSOCIATES (CRM), Type: FIN, Working Experience: 2, Qualification: UC, Industry: BNK, Level: JUN, Index of popularity: 44.6. • Job D: cityjobCFLC0 • Title: DEALER & INVESTOR ADVISOR, Type: FIN, Working Experience: 3, Qualification: UC, Industry: BNK, Level: PRO, Index of popularity: 36.6.
. Predictive Models • Churn (Attrition) model • Toidentify subscribers with high likelihood of ceasing their current activity of visiting the Web site,thus the Cityjob.com can take action to retain them. It is often less expensive to retain them than it is to win them back. • Popular job model • What are the characteristics of jobs that would attract more visitors? Are they related to their job type and job industry?
1. The Churn (Attrition) Model • Sample: All subscribers of Cityjob.com. • Dependent Variable:Visit = 1 if the subscriber has • visited the Cityjob.com during the study period; • Visit = 0 otherwise.
Factors used: Gender; Age; Educational Level • dummy variables for interest and country; • no. of days since registration. • Sampling procedure: Stratified sampling based on • the variable “Visit” is used to obtain equal number • of observations from the two groups of • subscribers (Y=1 and Y=0). • Data partition: Training data 70%, Validation data 30%
Lift Chart • Churn model • (logistic regression ) • important factors: • No. of days since registration; • Educational level, • Gender • Whether has interest in computer games or not.
2. The Popular Job Model • Sample : All jobs advertised on the Cityjob.com. • Dependent Variable: Popular = 1 if the job has been • visited for at least 20 times, Popular = 0 otherwise.
Factors used: Dummy variables for different job types, • job industries, job level, qualification required, • working experience. • Data partition: Training data 70%, Validation data 30% • Missing values: missing values for working experience • and qualification required were replaced by 0 and • 3 (Secondary school completed) respectively.
Lift Chart • popular job model • (logistic regression) • Important factors: • 1. higher qualification(more likely) • 2. higher level (more likely) • 3. jobs industries: • accounting, banking, building , • construction ( more likely ) • 4. jobs types: • art/design/creative, engineering, • sales (less likely)
. Recommendation • Web Design • a. To develop a collaborative filtering system • b. To include a popularity index 2. Marketing Strategies a. To develop appropriate marketing strategies for customer retention b. To develop Cityjob.com’s own web monitor system
.Unexpected Discovery There was a user who came everyday during the study period at exactly the same time (4:00 a.m. HK time) and stayed for one to three hours browsing more than 500 pages each time (average 5 sec. per page).