1 / 47

Experimental Design …a personal perspective

Experimental Design …a personal perspective. Jim Waters Cabrini College Dept of IS&T. A Graphical password system – Hurrah !. A collaborative venture (2003 – 2005). Drexel iSchool Dr Susan Wiedenbeck Jim Waters Rutgers, Camden, Computer Science Jean-Camille Birget

efrem
Download Presentation

Experimental Design …a personal perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experimental Design…a personal perspective Jim Waters Cabrini College Dept of IS&T

  2. A Graphical password system – Hurrah !

  3. A collaborative venture (2003 – 2005) • Drexel iSchool • Dr Susan Wiedenbeck • Jim Waters • Rutgers, Camden, Computer Science • Jean-Camille Birget • Polytechnic of Brooklyn, Computer Science • Nasir Memon • Alex Brodskiy (System Developer)

  4. The Password Problem • Conflicting requirements • Easy to remember • Hard to guess • Hard to crack

  5. The Solution • Cued recall not pure recall • Use some aspects of an image for password • Many prior attempts • Choose from a set of faces • Low Entropy • Young males always chose attractive females

  6. Our Solution • Choose a meaningful (rich) picture • Select some “points” in a picture as password

  7. Y X Point recorded by system (X,Y) is center of square Square shows system tolerance – square is 20 x 20 pixels excl line Order is part of password

  8. Was it any good ? • How many unique passwords could be created from a given X by Y canvas • Some very clever calculations by the CS folks • LOTS ! • 7.2 X 1012 with our small picture

  9. The iSchool (initial) Focus • Was the system effective from a user point of view • How did it compare against traditional alphanumeric passwords • Objective performance data • Time • Errors • Failure rate • Undo – Clear and Show • The passwords themselves • User perceptions (Likert Scales) • Parametric and nonparametric statistics

  10. Time • - needed to create the valid password (creation phase) • - spent creating each bad password (invalid attempt) • - needed to input password correctly 10 times (practice) • - spent on each and every attempt (good or bad) • - needed to input password correctly once after delay • - spent on each and every attempt (good or bad)

  11. Errors • Number of errors (invalid password entry attempts) • - made when creating a valid password • - made when input password correctly 10 times (practice) • - made when input password correctly once after delay • Magnitude of errors per click Actual click Required click

  12. Failure Rate • 4 invalid attempts at password input (after delay) = FAIL ! • But kept recording attempts • Some subjects had over 30) • Even after repeatedly viewing their passwords • ALSO • How many times subject clicked Undo on a password point • How many times subject cleared password • How many times subject viewed password

  13. The Basic Experiment • Graphical group vs. alphanumeric group • 20 subjects in each group • Randomly assigned to alphanumeric or graphical • Eight character alphanumeric password • 5 pass points • $10 per subject • Some subjects waived fee • One subject received a “Heidegger was a Nazi” T-Shirt instead

  14. The Basic Experiment • What to record ? • Creation and practice • Time, Errors, Failures, Undos, Clears and Shows • Decay of passwords over time (retention phase) • Time, Errors, Failures, Undos, Clears and Shows • Short-term after distraction task (questionnaire Q1) • Medium term after 1 week • Long term retention after a further 5 weeks

  15. User Perceptions • Embedded online questionnaire (Q1 and Q2) • 33 Likert Scale questions • Five questions negative on the left (recoded) • Q2 after last session = Q1 plus some open questions

  16. Subjects • Experienced computer users • 40 members of a North East American university community • Students, staff and faculty • Convenience sample

  17. The Experiment • Demonstration Phase • Verbal and Visual explanation of the purpose of the system and experiment protocol (magic lantern show) • Invitation to participate and earn $10 • Subjects completed IRB approved (Human Subjects) Consent Forms

  18. The Experiment • Creation Phase • Subjects created password using randomly assigned system • Practice Phase • Subjects practiced entering password correctly • Practice until password entered correctly 10 times (in total) • On-screen count of how many correct and incorrect entries • No limit to number of attempts • After 4 failures could view password

  19. The Experiment • Distraction Phase (after completing practice) • Subjects filled in online questionnaire Q1 • Retention Phase • Enter password correctly once (fail after 4 errors !) • R1 – immediately after Distraction phase • R2 – one week later • R3 – six weeks after R1 – plus complete Q2

  20. Results: Creation Used SPSS version 10.0 T-tests The graphical group took significantly fewer attempts: t(38)=3.13, p<.005 to create a valid password

  21. Likert Scale questions 1 is strongly agree 7 is strongly disagree Nonparametric Mann-Whitney U test For the first question there was a significant difference (U=127.00, p<.043)

  22. Learning Phase The two measures were analyzed using t-tests. Significant differences in favor of the alphanumeric group in both cases: Number of incorrect inputs t(38)=-.2.73, p<.013; Total practice time t(38)=-4.24, p<.0001.

  23. Variability in practice phase

  24. Retention Phase R1 – immediately after Distraction phase R2 – one week later R3 – six weeks after R1 Effect of mode not significant (ANOVA)

  25. Concept proved – what next ? • What impacts performance ? • Can we change system design to alter performance • Change picture (Expt 2) • Change tolerance around selected point (Expt 3) • Interference (Expts 4 thru 6) • Nobody has just one password • Will 2 passwords interfere with each other

  26. Experiment 2: More Subjects • Recruited 5 entire MS and BS classes from the iSchool • 3 Different pictures • Re-used data from Graphical subjects from Experiment 1 as baseline

  27. Tea Pool (baseline) Worth a 1000 words ? Mural Map

  28. Experiment 2 • Total new subjects = 71 at $10 a pop ! • Randomly assigned Mural, Tea and Map pictures • Tolerance as per experiment 1 (20 x 20 pixel square) • Similar routine • Demonstration • Creation • Practice • Distraction (questionnaire Q1) • R1 – at end of session • R2 – one week later – plus questionnaire 2 • No later retention tests

  29. Results Means (SD) in R1 first retention trial No Significant differences at all !

  30. Retention 1 week later A two-way mixed model ANOVA was used for the analyses with image as the between-subjects factor and retention trial (R1/R2) as the within-subjects factor. There was a marginal effect of image, F(3,79)=2.55, p<.062. Tukey’s HSD indicated that performance of the MAP group was lower that the TEA group

  31. Experiment 3 Tolerance • Conditions • Base group (20 x 20 pixels) - group from Experiment 1 • Harder (14 x 14 pixels) • Hardest (10 x 10 pixels) • 32 Undergraduate iSchool students (another $320 ) • Demonstration Phase • Creation Phase • Practice Phase • Distraction phase – Questionnaire Q1 • Retention Phase • R1 after distraction Q1 - R2 one week later

  32. 1 4 2 5 3

  33. 1 4 2 5 3

  34. 1 4 2 5 3

  35. Experiment 3 Tolerance • ANOVA – one way • There were no significant differences in the number of attempts or the time to create a valid password between groups • No significant differences between no of attempts or time required during practice phase (10 passwords input correctly) • No significant differences between groups during 1st retention phase (after questionnaire)

  36. Decay ! • Smallest tolerance group (10 x 10) made significantly more errors during the 2nd retention phase (one week later) • In the 10 х 10 group 7 of 16 participants (43.75 percent) failed to log in, • In the 14 х 14 group only 2 of 16 failed (12.5 percent). • There was a significant difference on failure between the groups t(30)=2.63, p<.015).

  37. Tolerance Experiment • Smallest tolerance group rated system significantly worse on three perceptual measures • Non-parametric Mann-Whitney U-test • It did not take me long to input my password correctly 10 Times • Inputting my password was easy. • I think that the password system was pleasant to use.

  38. Interference Experiments • Will having to create, practice and remember 2 different passwords be more difficult than just one? • Is it harder to use two different pictures or the same picture? • What about 2 alphanumeric passwords? • Experiment 4 (2 passwords 1 picture) ($420) • Experiment 5 (2 passwords 2 pictures) ($450) • Experiment 6 (2 alphanumeric passwords) ($450)

  39. Protocol same for each • Demonstration and consent form completion • Create 1st password for HOME system • Practice entering HOME password 10 times • Create 2nd password for OFFICE system • Practice entering OFFICE password 10 times • Distraction task (Questionnaire Q1) • Retention Phase • R1: Immediately after questionnaire • Enter each password (random order) correctly once • R2: One week later • Enter each password (random order) correctly once

  40. So……..? • All groups better with practice • Graphical group benefitted more from practice • Sliced 57 seconds off average practice time for 2nd password • No other significant differences • No effect of 2 graphical passwords vs. 1 graphical password • Password 2 retention same time and # errors as password 1 • No effect of 2 pictures vs. 1 picture

  41. Issues • Juggernaut approach – recorded all conceivable data for every click for every subject for every trial • X,Y location for every single password attempt click • How many pixels away from the correct point each click was • Up to 50 slots for practice trials for each password • Up to 20 slots for retention tests for each password • Online questionnaire stored in database

  42. Issues • Exceeded capability of SPSS v10.0 (Student) • Over 450 variables and over 270 subjects • SPSS output alone in excess of 1000 pages • Much data of limited value (Q1 vs. Q2) • Hard to extract meaningful findings from morass of results • Many findings may be significant by chance • Ran out of time and money • Unanswered questions

  43. Unanswered Questions • Analysis of nature of errors • Order errors and memory failure errors • Needed to analyze each attempt one by one • Manually using recorded X,Y coordinates • Proximal points confused this • What makes a good password picture ? • Memory strategies (geometry and semantics)

  44. Low Entropy (Hotspots)

  45. Better

  46. Best

More Related