130 likes | 134 Views
Challenges and successes in predicting web form user inputs. Sumit Amar Research Developer Microsoft Corporation samar@microsoft.com. Objectives. Motivation behind logging user actions How to log your web application usage Basics of Bayesian Inference
E N D
Challenges and successes in predicting web form user inputs • Sumit Amar • Research Developer • Microsoft Corporation • samar@microsoft.com
Objectives • Motivation behind logging user actions • How to log your web application usage • Basics of Bayesian Inference • A Naïve Bayesian Classifier design to build form predictions
Source: http://www.magnetism.co.nz/Files/Blogs/Using%20Web%20Forms(1).png
Web UI Instrumentation • Designed to capture user interactions such as text inputs, dropdown and checkbox selections etc. • Little to No code required to plugin into existing websites • Batches multiple interactions • Online or offline propagations (DB or File to DB) • Cross browser • Can be pipelined to analytics systems (such as Omniture)
Rationale to instrument web interfaces • Understand user behavior, intentions, and trends • Gauge usability of the system • Capture true performance metrics • Generate test automation code or smoke tests • Use data mining to enhance user experience
Bayesian approach to build predictions for form entries • Based on Thomas Bayes’ ~250 year old theorem P(H|E) = P(E|H) * P(H) P(E) Probability of a hypothesis given an evidence = Probability of an evidence given the hypothesis * Probability of hypothesis, then normalized.
Bayesian approach to building predictions for form entries For example: P(E) = (P(E|H) * P(H)) / P(E) E | H (2/3 * 3/6) / 3/6 => 0.667 1 | 0 1 | 0 2 | 8 2 | 0 3 | 5 1 | 6 However, the E could be multiple columns, i.e. E = [C1,C2,...,Cn] where C=Column
Building a classifier for form Data • Data captured with instrumentation framework • But contains too much data for the classifier’s purpose
Building classifier for the form Data [Filtered view of] captured data • But, the format of data is not in the way the classifier needs
Building classifier for the form Data Transposed form of data (computed on page loads) Because E = (C1, C2 ..Cn) Where Cx = Input/Evidence variables Let C1=txtName, C2=txtLocation, H = txtQuestion For each hypothesized value of the output variable P(E|H) = P (C1|H) * P(C2|H) --- (i) Likelihood = (i) * P(H) Probability = Normalized (0-1) Likelihood
Probability computation logic • Based on hypothesis variable and resource (page), lookup classifier source table • Retrieve cardinality for each distinct hypothesis by grouping possible hypotheses (used for P(H) calculation) • Create a likelihood dictionary with key as name of E evidence and value as the values of hypotheses with their likelihoods (P(E|H)) • For each input/evidence variable E • Retrieve all possible hypotheses H where evidence was the value of E • Compute (E|H) for each (H) and store in a list with name of the key as the hypothesis value and value as the likelihood • Multiply all E|H values // P(E|H) = P(C1|H) * P(C2|H) *..* P(Cn|H) to obtain likelihoods • Multiply with P(H) i.e. the total of H divided by total of all hypothesis • Normalize likelihoods to bring them within 0 to 1 range probability • Return each possible hypothesis value along with their probabilities
Challenges and recommendations • Missing values in inputs • Monte Carlo Sampling • Gaussian Approximation, and several more • Privacy? • Don’t log PII (personally identifiable information) • Performance? • Batch requests • Use longer intervals/timeouts
Resources • Sumit Amar – samar@microsoft.com • Slides – www.amar.co.in/sumit/Web2.0TalkPredictingInputs.ppt • Demo code (PHP/MySQL) – www.amar.co.in/sumit/i.zip