Challenges and successes in predicting web form user inputs

Challenges and successes in predicting web form user inputs • Sumit Amar • Research Developer • Microsoft Corporation • samar@microsoft.com

Objectives • Motivation behind logging user actions • How to log your web application usage • Basics of Bayesian Inference • A Naïve Bayesian Classifier design to build form predictions

Source: http://www.magnetism.co.nz/Files/Blogs/Using%20Web%20Forms(1).png

Web UI Instrumentation • Designed to capture user interactions such as text inputs, dropdown and checkbox selections etc. • Little to No code required to plugin into existing websites • Batches multiple interactions • Online or offline propagations (DB or File to DB) • Cross browser • Can be pipelined to analytics systems (such as Omniture)

Rationale to instrument web interfaces • Understand user behavior, intentions, and trends • Gauge usability of the system • Capture true performance metrics • Generate test automation code or smoke tests • Use data mining to enhance user experience

Bayesian approach to build predictions for form entries • Based on Thomas Bayes’ ~250 year old theorem P(H|E) = P(E|H) * P(H) P(E) Probability of a hypothesis given an evidence = Probability of an evidence given the hypothesis * Probability of hypothesis, then normalized.

Bayesian approach to building predictions for form entries For example: P(E) = (P(E|H) * P(H)) / P(E) E | H (2/3 * 3/6) / 3/6 => 0.667 1 | 0 1 | 0 2 | 8 2 | 0 3 | 5 1 | 6 However, the E could be multiple columns, i.e. E = [C1,C2,...,Cn] where C=Column

Building a classifier for form Data • Data captured with instrumentation framework • But contains too much data for the classifier’s purpose

Building classifier for the form Data [Filtered view of] captured data • But, the format of data is not in the way the classifier needs

Building classifier for the form Data Transposed form of data (computed on page loads) Because E = (C1, C2 ..Cn) Where Cx = Input/Evidence variables Let C1=txtName, C2=txtLocation, H = txtQuestion For each hypothesized value of the output variable P(E|H) = P (C1|H) * P(C2|H) --- (i) Likelihood = (i) * P(H) Probability = Normalized (0-1) Likelihood

Probability computation logic • Based on hypothesis variable and resource (page), lookup classifier source table • Retrieve cardinality for each distinct hypothesis by grouping possible hypotheses (used for P(H) calculation) • Create a likelihood dictionary with key as name of E evidence and value as the values of hypotheses with their likelihoods (P(E|H)) • For each input/evidence variable E • Retrieve all possible hypotheses H where evidence was the value of E • Compute (E|H) for each (H) and store in a list with name of the key as the hypothesis value and value as the likelihood • Multiply all E|H values // P(E|H) = P(C1|H) * P(C2|H) *..* P(Cn|H) to obtain likelihoods • Multiply with P(H) i.e. the total of H divided by total of all hypothesis • Normalize likelihoods to bring them within 0 to 1 range probability • Return each possible hypothesis value along with their probabilities

Challenges and recommendations • Missing values in inputs • Monte Carlo Sampling • Gaussian Approximation, and several more • Privacy? • Don’t log PII (personally identifiable information) • Performance? • Batch requests • Use longer intervals/timeouts

Resources • Sumit Amar – samar@microsoft.com • Slides – www.amar.co.in/sumit/Web2.0TalkPredictingInputs.ppt • Demo code (PHP/MySQL) – www.amar.co.in/sumit/i.zip

Challenges and successes in predicting web form user inputs

Challenges and successes in predicting web form user inputs

Presentation Transcript

AGOA: Successes and Challenges

Multicultural Britain – Successes and Challenges

Challenges and Successes in MRNet

International Chapter Challenges and Successes

5/9: Successes and Challenges

Successes, Opportunities, Challenges

Challenges and Successes of Creating…

New Successes and Challenges

18.3 New Successes and Challenges

Successes and Challenges

AVEC Challenges and Successes

Successes Challenges Vision

More Successes and Challenges

Obesity: Myths, Challenges, and Successes

Successes and Challenges

SUCCESSES, CHALLENGES AND OPPORTUNITIES

Overview, Successes, and Challenges

Malaria Successes and Challenges in Asia

Interdisciplinary Collaboration: Challenges and Successes

Florida’s Plan: Successes and Challenges

User inputs

Coastal Wetlands: Successes and Challenges