540 likes | 650 Views
Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications. Mark Ritzmann Pace University March17, 2007. Contents. Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work. Contents. Overview
E N D
Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications Mark Ritzmann Pace University March17, 2007
Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work
Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work
OverviewEssence of this work • Address the problem of missing or incomplete data and put forth strategies to overcome that problem • Add to the accuracy of existing Keystroke Biometric Recognition System • Apply finding to other application areas
OverviewThe Impact of Missing data • <1% considered trivial • 1-5% considered manageable • 5-15% requires sophisticated methods • >15% may severely impact any interpretation P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006
OverviewMissing Data Mechanisms • MCAR – Missing Completely at Random • MAR – Missing At Random • NMAR – Not missing at Random Most missing data treatment methods assume missing is MAR P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006
OverviewMissing Data Treatment, High Level Statistical Heuristic • Existing data used to calculate missing data • Care need to be taken not to over fit • Mean/mode is prime example • Based on established rules and guidelines • Similar to an expert system • Association is prime example
OverviewMissing Data Treatment Methods • Case Deletion • Parameter Estimation • Mean/Mode Imputation • Method of Assigning All Possible Values of the Attribute • Regression Imputation • Hot Deck Imputation and Cold Deck Imputation • Multiple Imputation • K-Nearest Neighbor Imputation • Internal Treatment Method
OverviewBiometric background • Roots in CIA & Dept of Defense work • Early Issues – technology, cost, lack of standards • Basic Uses • Verification (easier of the two; yes/no) • Identification (harder of the two; 1 of n) • Basic types • Physiological – generally do not change • Behavioral – can change, easier to mimic
OverviewBiometric Issues • Business • Financial feasibility • Interaction with traditional controls • Application not subject to rigor • Incompatibility with business partners • Transition to e-business • Control locus • People • User confidence • Privacy issues • User preferences • User acceptance • User profile • Trust • Operational • Lab vs Field • Scalability • Continuous Authentication • Security BIOMETRICS: CHALLENGES & CAVEATS • System • Business Process • Design • Control • Enrollment Challenge • System Downtime • Availability of template database • Effects of malicious code • Legal & Regulatory • Lack of precedence • Ambiguous process • Imprecise definition • Logistics of proof of defense • Technical • Adaptation • Hardware • Evolving nature of technology • Scattered proliferation & polarization • Uniqueness of biometric • Scalability A. Chandra & T. Calderon, Challenges and Constraints to the Diffusion of Biometrics Information Systems, Communications of the ACM, December 2005, Vol 48, No 2
OverviewPrivacy Issues – special mention • Opt in/Opt out • Any application or web site that used this system would need to do so with full disclosure. The user could then knowingly decide. • Dictated environment • Any corporate or instructional e-mail system where the ultimate ownership of the keystroke resides with that entity • Capture results, not text itself • Use keystrokes to authenticate/identify, not the words themselves or the intact messages
OverviewKeyboard Biometric Studies in the Literature • Key Concepts • Copy vs Free • Authentication vs Identification • Classic Studies • Gaines, 1980 • Umphress & Williams, 1985 • Leggett & Williams, 1988 • Joyce & Gupta, 1990 • Bleha et al, 1990 • Brown & Rogers, 1993 • Recent Studies – University of Torino • Pace University contributions
Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work
Essence and Significance of WorkHigh Level Objectives 1 Develop strategies to manage the significant problem of missing or incomplete data 2 Improve the accuracy of the current Keystroke Biometric Recognition System 3 Apply findings to other areas
Essence and Significance of WorkDetailed Objectives Gain insight as to the effectiveness and application of MISSING DATA strategies and decision making with incomplete information First Objective: Improve the accuracy of the current Keystroke Biometric Recognition system by improving the FALLBACK model invoked when a sample is of insufficient size Second Objective: • Identify a potential application for a Keystroke Biometric recognition system • Project the findings to other potential areas Third Objective:
Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work
Experiment DesignRe-use of assets from previous Pace work • Data set • Features/feature extraction • Tests • Optimal settings
Experiment Design6 Test scenarios Dr. Mary Vilani, Spring 2006 Used with permission
Experiment DesignFeature set Dr. Mary Vilani, Spring 2006 Used with permission
Experiment DesignSummary of Subject Participation • Subjects by Experiment • 36 subjects all four quadrants • 52 subjects1. Copy Task • 40 subjects 2. Free Text • 93 subjects 3. Desktop • 47 subjects 4. Laptop • 41 subjects5. Desk Copy / Lap Free • 40 subjects 6. Lap Copy / Desk Free Dr. Mary Vilani, Spring 2006 Used with permission
Experiment DesignData/Sample Capture Application Dr. Mary Vilani, Spring 2006 Used with permission
Experiment DesignApplication Version 2.0 - developed Fall, 2006 • Development and Implementation of 2 additional Fallback Models • Tremendously enhanced Testing functionality • Development and Implementation of Trace Mechanism
Experiment DesignHigh Level Overview of Fallback Models Statistical Heuristic Linguistic Model Statistical Model Touch Type Model New Models
Experiment DesignOverview of Models • Linguistic • Touch Type • Statistical
Experiment DesignTouch Type Fallback Model - Background • Touch Type approach invented by Frank Edgar McGurrin in late 1800’s • Won speed contest on July 25, 1888 • Was front page news • Touch Type Idea - use sense of touch rather than sight (looking at key label) • Most keyboards still have raised indicator on “f” and “j” to indicate home position
Experiment DesignTouch Type Fallback Model - Duration All Keys All Left Hand All Right Hand Left Little Right Little Left Ring Right Ring Left Middle Left Index Right Index Right Middle A Q ; P Z 1 S / 0 L O W D E K I F G R T V H J Y U N 2 X . 9 C 3 , 8 B 4 5 M 6 7
Experiment DesignTouch Type Fallback Model - Transition Letter/letter Left/left Left/right Right/left Right/right I/N O/N N/D O/R H/E E/R S/T E/S T/H A/N A/T R/E E/A E/N T/I
Experiment DesignStatistical Fallback Model • For Duration – Mean Imputation • For Transition – Multiple Imputation • Mean and Standard deviation calculated on transition full data set • Any value >1 Standard deviation from the mean was removed • New mean and standard deviation calculated on remaining data • Process repeated 3 times
Experiment DesignStatistical Fallback Model – Duration Clusters
Experiment DesignStatistical Fallback Model - Duration B Y CLUSTER 1 All Keys H UNDER 100 G CLUSTER 2 U OVER 100 NODE A I CLUSTER 9 CLUSTER 3 N NODE B A CLUSTER 8 CLUSTER 4 ‘ - L M , S CLUSTER 5 CLUSTER 7 CLUSTER 6 T F P O C . E D R W
Experiment DesignStatistical Fallback Model – Transition development Data Compacting % of sample left after outlier wash Sample Size 100% Data Compacting process
Experiment DesignStatistical Fallback Model – Transition, Raw Order
Experiment DesignStatistical Fallback Model – Transition, Cluster Development
Experiment DesignStatistical Fallback Model - Transition Any/Any E-S Over 50 Node D Under 50 Node B Node A Node C E-A Node 1 Node 4 Node 2 Node 3 S-T N-D A-T T-I O-R E-R R-E E-N A-N H-E O-N I-N T-H
Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work
Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work
Experiment DesignLinguistic Fallback Model – Duration (repeat of previous)
AnalysisProposed Second Generation Touch Type Fallback Model - Duration All Keys Red Circles remain as leafs All else falls back to next level All Left Hand All Right Hand Left Little Right Little Left Ring Right Ring Left Middle Left Index Right Index Right Middle A Q P Z S L O W D E K I F G R T V H J Y U N X C B M
Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work
Future WorkTwo Main Areas • Academic • Hybrid System development – keystroke, mouse movement, stylistic • Principle Components • Eigen Values • Application • For Keystroke Biometric system: • Academic – online testing • Biometric Marketing • For General Missing data, analytical applications
Future WorkKey success factors to system acceptance • Robustness – level of trust • Acceptance Level – support by third party processes • Cost – hardware/software, communications and support • Ease of Use/Portability – extent of support across client machines • Security – privacy, integrity, and non-repudiation “future research into the use of biometric technology in online marketing applications must consider not only technical feasibility, but also social and legal acceptability.”
Future WorkBiometric Marketing • Use of Biometric technology to identify and segment users/consumers • What you have to believe: • Segmentation is better • Short + short + short = long for sampling • Chat rooms, e-mails etc.