1 / 13

Challenges and successes in predicting web form user inputs

Learn the basics of Bayesian Inference and how to use a Naïve Bayesian Classifier to predict user inputs on web forms. Understand the Bayesian approach for form predictions and building classifiers. Explore the logic behind likelihood probability computation and challenges like missing values and privacy concerns.

ckristofer
Download Presentation

Challenges and successes in predicting web form user inputs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges and successes in predicting web form user inputs • Sumit Amar • Research Developer • Microsoft Corporation • samar@microsoft.com

  2. Objectives • Motivation behind logging user actions • How to log your web application usage • Basics of Bayesian Inference • A Naïve Bayesian Classifier design to build form predictions

  3. Source: http://www.magnetism.co.nz/Files/Blogs/Using%20Web%20Forms(1).png

  4. Web UI Instrumentation • Designed to capture user interactions such as text inputs, dropdown and checkbox selections etc. • Little to No code required to plugin into existing websites • Batches multiple interactions • Online or offline propagations (DB or File to DB) • Cross browser • Can be pipelined to analytics systems (such as Omniture)

  5. Rationale to instrument web interfaces • Understand user behavior, intentions, and trends • Gauge usability of the system • Capture true performance metrics • Generate test automation code or smoke tests • Use data mining to enhance user experience

  6. Bayesian approach to build predictions for form entries • Based on Thomas Bayes’ ~250 year old theorem P(H|E) = P(E|H) * P(H) P(E) Probability of a hypothesis given an evidence = Probability of an evidence given the hypothesis * Probability of hypothesis, then normalized.

  7. Bayesian approach to building predictions for form entries For example: P(E) = (P(E|H) * P(H)) / P(E) E | H (2/3 * 3/6) / 3/6 => 0.667 1 | 0 1 | 0 2 | 8 2 | 0 3 | 5 1 | 6 However, the E could be multiple columns, i.e. E = [C1,C2,...,Cn] where C=Column

  8. Building a classifier for form Data • Data captured with instrumentation framework • But contains too much data for the classifier’s purpose

  9. Building classifier for the form Data [Filtered view of] captured data • But, the format of data is not in the way the classifier needs

  10. Building classifier for the form Data Transposed form of data (computed on page loads) Because E = (C1, C2 ..Cn) Where Cx = Input/Evidence variables Let C1=txtName, C2=txtLocation, H = txtQuestion For each hypothesized value of the output variable P(E|H) = P (C1|H) * P(C2|H) --- (i) Likelihood = (i) * P(H) Probability = Normalized (0-1) Likelihood

  11. Probability computation logic • Based on hypothesis variable and resource (page), lookup classifier source table • Retrieve cardinality for each distinct hypothesis by grouping possible hypotheses (used for P(H) calculation) • Create a likelihood dictionary with key as name of E evidence and value as the values of hypotheses with their likelihoods (P(E|H)) • For each input/evidence variable E • Retrieve all possible hypotheses H where evidence was the value of E • Compute (E|H) for each (H) and store in a list with name of the key as the hypothesis value and value as the likelihood • Multiply all E|H values // P(E|H) = P(C1|H) * P(C2|H) *..* P(Cn|H) to obtain likelihoods • Multiply with P(H) i.e. the total of H divided by total of all hypothesis • Normalize likelihoods to bring them within 0 to 1 range probability • Return each possible hypothesis value along with their probabilities

  12. Challenges and recommendations • Missing values in inputs • Monte Carlo Sampling • Gaussian Approximation, and several more • Privacy? • Don’t log PII (personally identifiable information) • Performance? • Batch requests • Use longer intervals/timeouts

  13. Resources • Sumit Amar – samar@microsoft.com • Slides – www.amar.co.in/sumit/Web2.0TalkPredictingInputs.ppt • Demo code (PHP/MySQL) – www.amar.co.in/sumit/i.zip

More Related