290 likes | 438 Views
Mapping Visitors’ Behavior to Business Goals through Click Stream Analysis. Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in. Definition. Web Analytics as defined by Web Analytics Association :
E N D
Mapping Visitors’ Behavior to Business Goals through Click Stream Analysis Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in
Definition Web Analytics as defined by Web Analytics Association : “ Web Analytics is the measurement, collection, analysis and reporting of Internet data for the purposes of understanding and optimizing Web usage.” Clickstream as defined by Internet Advertising Bureau (IAB) : “The electronic path a user takes while navigating from site to site, and from page to page within a site. It is a comprehensive body of data describing the sequence of activity between a user’s browser and any other Internet resource, such as a Web site or third party ad server” http://www.webanalyticsassociation.org/aboutus/
Information from Web Analytics • How many visitors visit the page daily? • Who are the regular visitors? • What percentage of the visitors to the page are registered users? • What are the top pages that are visited on the web page? • What is the average visit time on the website? • How often does the visitor return to the site? • What is the average page depth of a visitor? • What is the geographic distribution of users of the website?
Measures • Clicks: The interaction between the user and the web server is measured by the click of a mouse. • Visits: The number of times a user visits a specific web site. Every new session is counted as a new visit. • Hits: Total number of server requests serviced by the server • Exits: Site exits, counted by site inactivity for more than 30 minutes • Unique Visitors: A Unique User who accesses the site in a specified period of time. • Repeated Visitor: The average number of times a user returns to a site over a specific time period. • Page views: The view of any page by the user. A page may contain text, images, and other online elements and may be statically or dynamically generated and could contain single or multiple frames or screens. • Sessions: IAB defines it to be an “A sequence of Internet activity made by one user at one site. If a user makes no request from a site during a 30 minute period of time, the next content or ad request would then constitute the beginning of a new visit “ • Unique authenticated visitors: A unique visitor who logs on to a site via a registration method using his/her user id and password.
Metrics • Page views per visit: Average number of page views per visit. • Page views per session: Average number of page views per session. • Page views per hour/day: Average number of page views per hour/day. • Clicks per session: Average number or clicks per session. • Clicks per hour: Average number of clicks per hour. • Time between clicks: The average duration of time spent between two clicks. • Hits per hour: Average number of hits to the web server per hour. • Busy hour of the day: The highest number of hits to the web server in a particular hour of a day.
Implementing Web Analytics • Define your business objectives • Define the KPIs that are important for your business based on objectives and goals of business. • Identify the data that needs to be collected. • Identify the process to collect the data • Prepare the data, analyze and interpret the data • Design and implement the plan of action • Monitor the data for continuous feedback
Objectives of the Study • The objectives of this study are to • Explore Web analytics and its usefulness to web based business. • Identify the techniques used in click stream analysis. • Identify the application of click stream analysis through analyzing click stream data obtained from a particular website using appropriate click stream analysis techniques.
Methodology • This study analyzes the click stream data obtained from a web site, which specializes in an online information exchange service to facilitate identification of suitable partners, in India and other countries. • The site has a very different revenue model. The visitors are allowed to browse through the site without any initial payment. The visitors are allowed to look at the profiles of prospective partners free of charge. The visitors will have to become members by making a one-time payment only when they need to contact the prospective brides or grooms. • Users can search for profiles through advanced search options on the site on various preferences ranging from basic details of preferred partner to lifestyle, career, education, profession etc.
Methodology • Members can make initial contact with each other through services available via Chat, SMS, and e-mail. • Users can avail free registration on the website and are assured of exclusive privacy and confidentiality. The website allows the users to create their profiles, search for other profiles, and express interest in other profiles and contact others. Registration and creating a profile is free of cost. • Registered users can become paid members that will allow them to contact others, view contact details of other members, write personalized messages, initiate chats and let other members view their contact details. Paid memberships are provided for a specified duration.
Methodology • The click stream data is analyzed to identify different paths taken by the visitors and the sequence of pages that lead to payment of membership fee. Based on this analysis, specific strategies are recommended to maximize the revenue for the website.
DATA PREPARATION 10.208.65.96 172.16.8.37, 124.124.35.130 - - [23/May/2008:00:00:00 -0400] "GET /billing/billing.php?user=&cid=22401528da14a61c43512fa025b59578i353273 HTTP/1.0" 200 1832 10.208.65.96 68.126.193.219 - - [23/May/2008:00:00:00 -0400] "GET /profile/js/common.js HTTP/1.1" 200 1246210.208.65.96 59.95.71.32 - - [23/May/2008:00:00:00 -0400] "GET /P/css/comm_style.css HTTP/1.1" 200 2640 10.208.65.96 122.163.70.145 - - [23/May/2008:00:00:00 -0400] "GET /P/search.php?checksum=&searchchecksum=16465054&j=300&newsearch=&inf_checksum=&castemapping=&crmback=&searchorder=T&label_select_no=&savesearch=&from_index=&viewall=&save_search_redirect=&hide_search_bar=y HTTP/1.1" 200 21561 10.208.65.96 61.1.81.153 - - [23/May/2008:00:00:00 -0400] "GET /P/css/homestyle.css HTTP/1.1" 304 26 10.208.65.96 68.197.236.117 - - [23/May/2008:00:00:00 -0400] "GET /profile/mainmenu.php?checksum=3590208069017f9d75933dfa9ac9005d|i|537f26ca181f05c308393257397ab261i2810388 HTTP/1.1" 200 3333 10.208.65.96 172.16.25.60, 59.145.189.43 - - [23/May/2008:00:00:00 -0400] "GET /P/css/homestyle.css HTTP/1.0" 304 26 10.208.65.96 10.232.65.96, 10.232.49.1, 203.126.136.220 - - [23/May/2008:00:00:00 -0400] "GET /profile/mainmenu.php?checksum= HTTP/1.1" 200 3329 Problem : Format of data • Clickstream data files are neither delimited nor fixed length files Solution: • Used the date in the clickstream as the delimiter to import data to database • Have to perform string handling in database to separate out the fields
Data • Data is obtained from the site in the form of click stream records. Each record consists of the details of clicks by the visitors and each record contains the following details: • Server IP • Client IP • Time stamp with Date • Status: HTTP Status code • URL requested: has three subfields namely The request method, resource requested and the protocol used • No. of bytes transferred • The country of origin for a specific request is identified using the IP address.
Data • URL is used to identify the information/web page browsed by the visitors. • Time stamp of each click is used to sequence the movement of the visitors across different pages in the website. • Identifying a unique user session is an important step in the analysis of click stream data. Inactivity for more than 30 minutes is considered as a break of session. • This is an approximation since there could be multiple users accessing from the same IP, or the same user accessing from different IPs. • Due to lack of more data available we consider hits from each unique IP as belonging to a unique user for a unique session.
Summary and Conclusions • Usage of the website by time of the day. • This will help busy hour identification, and provide information of the server capacity required for the website, and when maintenance window can be scheduled. • Usage of website from different geographic location. • This can provide the data of the distribution of users across geographical locations • Exit screens • provide information on where the users exit from the website. This input can help redesign the webpage if it provides information on which pages are breaking the flow of the user session.
Summary and Conclusions • Most accessed and least accessed pages • This can be used for variable pricing of advertisings on the web page. This can also be used for better user interface design and space utilization, by removing or repositioning the links that are infrequently accessed. • Associations • Provide information on unique actions on the website and the sequence in which the user has performed these actions. This can be used in better user interface design. • Web diagrams • Gives information on co-occurrence of actions on the webpage and their significance – also provides inputs on user interface design.
Questions? • Suggestions? • Comments?