470 likes | 567 Views
Experimental Design …a personal perspective. Jim Waters Cabrini College Dept of IS&T. A Graphical password system – Hurrah !. A collaborative venture (2003 – 2005). Drexel iSchool Dr Susan Wiedenbeck Jim Waters Rutgers, Camden, Computer Science Jean-Camille Birget
E N D
Experimental Design…a personal perspective Jim Waters Cabrini College Dept of IS&T
A collaborative venture (2003 – 2005) • Drexel iSchool • Dr Susan Wiedenbeck • Jim Waters • Rutgers, Camden, Computer Science • Jean-Camille Birget • Polytechnic of Brooklyn, Computer Science • Nasir Memon • Alex Brodskiy (System Developer)
The Password Problem • Conflicting requirements • Easy to remember • Hard to guess • Hard to crack
The Solution • Cued recall not pure recall • Use some aspects of an image for password • Many prior attempts • Choose from a set of faces • Low Entropy • Young males always chose attractive females
Our Solution • Choose a meaningful (rich) picture • Select some “points” in a picture as password
Y X Point recorded by system (X,Y) is center of square Square shows system tolerance – square is 20 x 20 pixels excl line Order is part of password
Was it any good ? • How many unique passwords could be created from a given X by Y canvas • Some very clever calculations by the CS folks • LOTS ! • 7.2 X 1012 with our small picture
The iSchool (initial) Focus • Was the system effective from a user point of view • How did it compare against traditional alphanumeric passwords • Objective performance data • Time • Errors • Failure rate • Undo – Clear and Show • The passwords themselves • User perceptions (Likert Scales) • Parametric and nonparametric statistics
Time • - needed to create the valid password (creation phase) • - spent creating each bad password (invalid attempt) • - needed to input password correctly 10 times (practice) • - spent on each and every attempt (good or bad) • - needed to input password correctly once after delay • - spent on each and every attempt (good or bad)
Errors • Number of errors (invalid password entry attempts) • - made when creating a valid password • - made when input password correctly 10 times (practice) • - made when input password correctly once after delay • Magnitude of errors per click Actual click Required click
Failure Rate • 4 invalid attempts at password input (after delay) = FAIL ! • But kept recording attempts • Some subjects had over 30) • Even after repeatedly viewing their passwords • ALSO • How many times subject clicked Undo on a password point • How many times subject cleared password • How many times subject viewed password
The Basic Experiment • Graphical group vs. alphanumeric group • 20 subjects in each group • Randomly assigned to alphanumeric or graphical • Eight character alphanumeric password • 5 pass points • $10 per subject • Some subjects waived fee • One subject received a “Heidegger was a Nazi” T-Shirt instead
The Basic Experiment • What to record ? • Creation and practice • Time, Errors, Failures, Undos, Clears and Shows • Decay of passwords over time (retention phase) • Time, Errors, Failures, Undos, Clears and Shows • Short-term after distraction task (questionnaire Q1) • Medium term after 1 week • Long term retention after a further 5 weeks
User Perceptions • Embedded online questionnaire (Q1 and Q2) • 33 Likert Scale questions • Five questions negative on the left (recoded) • Q2 after last session = Q1 plus some open questions
Subjects • Experienced computer users • 40 members of a North East American university community • Students, staff and faculty • Convenience sample
The Experiment • Demonstration Phase • Verbal and Visual explanation of the purpose of the system and experiment protocol (magic lantern show) • Invitation to participate and earn $10 • Subjects completed IRB approved (Human Subjects) Consent Forms
The Experiment • Creation Phase • Subjects created password using randomly assigned system • Practice Phase • Subjects practiced entering password correctly • Practice until password entered correctly 10 times (in total) • On-screen count of how many correct and incorrect entries • No limit to number of attempts • After 4 failures could view password
The Experiment • Distraction Phase (after completing practice) • Subjects filled in online questionnaire Q1 • Retention Phase • Enter password correctly once (fail after 4 errors !) • R1 – immediately after Distraction phase • R2 – one week later • R3 – six weeks after R1 – plus complete Q2
Results: Creation Used SPSS version 10.0 T-tests The graphical group took significantly fewer attempts: t(38)=3.13, p<.005 to create a valid password
Likert Scale questions 1 is strongly agree 7 is strongly disagree Nonparametric Mann-Whitney U test For the first question there was a significant difference (U=127.00, p<.043)
Learning Phase The two measures were analyzed using t-tests. Significant differences in favor of the alphanumeric group in both cases: Number of incorrect inputs t(38)=-.2.73, p<.013; Total practice time t(38)=-4.24, p<.0001.
Retention Phase R1 – immediately after Distraction phase R2 – one week later R3 – six weeks after R1 Effect of mode not significant (ANOVA)
Concept proved – what next ? • What impacts performance ? • Can we change system design to alter performance • Change picture (Expt 2) • Change tolerance around selected point (Expt 3) • Interference (Expts 4 thru 6) • Nobody has just one password • Will 2 passwords interfere with each other
Experiment 2: More Subjects • Recruited 5 entire MS and BS classes from the iSchool • 3 Different pictures • Re-used data from Graphical subjects from Experiment 1 as baseline
Tea Pool (baseline) Worth a 1000 words ? Mural Map
Experiment 2 • Total new subjects = 71 at $10 a pop ! • Randomly assigned Mural, Tea and Map pictures • Tolerance as per experiment 1 (20 x 20 pixel square) • Similar routine • Demonstration • Creation • Practice • Distraction (questionnaire Q1) • R1 – at end of session • R2 – one week later – plus questionnaire 2 • No later retention tests
Results Means (SD) in R1 first retention trial No Significant differences at all !
Retention 1 week later A two-way mixed model ANOVA was used for the analyses with image as the between-subjects factor and retention trial (R1/R2) as the within-subjects factor. There was a marginal effect of image, F(3,79)=2.55, p<.062. Tukey’s HSD indicated that performance of the MAP group was lower that the TEA group
Experiment 3 Tolerance • Conditions • Base group (20 x 20 pixels) - group from Experiment 1 • Harder (14 x 14 pixels) • Hardest (10 x 10 pixels) • 32 Undergraduate iSchool students (another $320 ) • Demonstration Phase • Creation Phase • Practice Phase • Distraction phase – Questionnaire Q1 • Retention Phase • R1 after distraction Q1 - R2 one week later
1 4 2 5 3
1 4 2 5 3
1 4 2 5 3
Experiment 3 Tolerance • ANOVA – one way • There were no significant differences in the number of attempts or the time to create a valid password between groups • No significant differences between no of attempts or time required during practice phase (10 passwords input correctly) • No significant differences between groups during 1st retention phase (after questionnaire)
Decay ! • Smallest tolerance group (10 x 10) made significantly more errors during the 2nd retention phase (one week later) • In the 10 х 10 group 7 of 16 participants (43.75 percent) failed to log in, • In the 14 х 14 group only 2 of 16 failed (12.5 percent). • There was a significant difference on failure between the groups t(30)=2.63, p<.015).
Tolerance Experiment • Smallest tolerance group rated system significantly worse on three perceptual measures • Non-parametric Mann-Whitney U-test • It did not take me long to input my password correctly 10 Times • Inputting my password was easy. • I think that the password system was pleasant to use.
Interference Experiments • Will having to create, practice and remember 2 different passwords be more difficult than just one? • Is it harder to use two different pictures or the same picture? • What about 2 alphanumeric passwords? • Experiment 4 (2 passwords 1 picture) ($420) • Experiment 5 (2 passwords 2 pictures) ($450) • Experiment 6 (2 alphanumeric passwords) ($450)
Protocol same for each • Demonstration and consent form completion • Create 1st password for HOME system • Practice entering HOME password 10 times • Create 2nd password for OFFICE system • Practice entering OFFICE password 10 times • Distraction task (Questionnaire Q1) • Retention Phase • R1: Immediately after questionnaire • Enter each password (random order) correctly once • R2: One week later • Enter each password (random order) correctly once
So……..? • All groups better with practice • Graphical group benefitted more from practice • Sliced 57 seconds off average practice time for 2nd password • No other significant differences • No effect of 2 graphical passwords vs. 1 graphical password • Password 2 retention same time and # errors as password 1 • No effect of 2 pictures vs. 1 picture
Issues • Juggernaut approach – recorded all conceivable data for every click for every subject for every trial • X,Y location for every single password attempt click • How many pixels away from the correct point each click was • Up to 50 slots for practice trials for each password • Up to 20 slots for retention tests for each password • Online questionnaire stored in database
Issues • Exceeded capability of SPSS v10.0 (Student) • Over 450 variables and over 270 subjects • SPSS output alone in excess of 1000 pages • Much data of limited value (Q1 vs. Q2) • Hard to extract meaningful findings from morass of results • Many findings may be significant by chance • Ran out of time and money • Unanswered questions
Unanswered Questions • Analysis of nature of errors • Order errors and memory failure errors • Needed to analyze each attempt one by one • Manually using recorded X,Y coordinates • Proximal points confused this • What makes a good password picture ? • Memory strategies (geometry and semantics)