1 / 54

Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications

Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications. Mark Ritzmann Pace University March17, 2007. Contents. Overview Essence and Significance of Work Experiment Design Outcomes Analysis Future Work. Contents. Overview

elwyn
Download Presentation

Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Strategies for Managing Missing or Incomplete Data in Biometric and Business Applications Mark Ritzmann Pace University March17, 2007

  2. Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work

  3. Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work

  4. OverviewEssence of this work • Address the problem of missing or incomplete data and put forth strategies to overcome that problem • Add to the accuracy of existing Keystroke Biometric Recognition System • Apply finding to other application areas

  5. OverviewThe Impact of Missing data • <1% considered trivial • 1-5% considered manageable • 5-15% requires sophisticated methods • >15% may severely impact any interpretation P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006

  6. OverviewMissing Data Mechanisms • MCAR – Missing Completely at Random • MAR – Missing At Random • NMAR – Not missing at Random Most missing data treatment methods assume missing is MAR P. Liu & L. Lei, Missing Data Treatment Methods and NBI Models, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, IEEE, 2006

  7. OverviewMissing Data Treatment, High Level Statistical Heuristic • Existing data used to calculate missing data • Care need to be taken not to over fit • Mean/mode is prime example • Based on established rules and guidelines • Similar to an expert system • Association is prime example

  8. OverviewMissing Data Treatment Methods • Case Deletion • Parameter Estimation • Mean/Mode Imputation • Method of Assigning All Possible Values of the Attribute • Regression Imputation • Hot Deck Imputation and Cold Deck Imputation • Multiple Imputation • K-Nearest Neighbor Imputation • Internal Treatment Method

  9. OverviewBiometric background • Roots in CIA & Dept of Defense work • Early Issues – technology, cost, lack of standards • Basic Uses • Verification (easier of the two; yes/no) • Identification (harder of the two; 1 of n) • Basic types • Physiological – generally do not change • Behavioral – can change, easier to mimic

  10. OverviewBiometric Issues • Business • Financial feasibility • Interaction with traditional controls • Application not subject to rigor • Incompatibility with business partners • Transition to e-business • Control locus • People • User confidence • Privacy issues • User preferences • User acceptance • User profile • Trust • Operational • Lab vs Field • Scalability • Continuous Authentication • Security BIOMETRICS: CHALLENGES & CAVEATS • System • Business Process • Design • Control • Enrollment Challenge • System Downtime • Availability of template database • Effects of malicious code • Legal & Regulatory • Lack of precedence • Ambiguous process • Imprecise definition • Logistics of proof of defense • Technical • Adaptation • Hardware • Evolving nature of technology • Scattered proliferation & polarization • Uniqueness of biometric • Scalability A. Chandra & T. Calderon, Challenges and Constraints to the Diffusion of Biometrics Information Systems, Communications of the ACM, December 2005, Vol 48, No 2

  11. OverviewPrivacy Issues – special mention • Opt in/Opt out • Any application or web site that used this system would need to do so with full disclosure. The user could then knowingly decide. • Dictated environment • Any corporate or instructional e-mail system where the ultimate ownership of the keystroke resides with that entity • Capture results, not text itself • Use keystrokes to authenticate/identify, not the words themselves or the intact messages

  12. OverviewKeyboard Biometric Studies in the Literature • Key Concepts • Copy vs Free • Authentication vs Identification • Classic Studies • Gaines, 1980 • Umphress & Williams, 1985 • Leggett & Williams, 1988 • Joyce & Gupta, 1990 • Bleha et al, 1990 • Brown & Rogers, 1993 • Recent Studies – University of Torino • Pace University contributions

  13. Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work

  14. Essence and Significance of WorkHigh Level Objectives 1 Develop strategies to manage the significant problem of missing or incomplete data 2 Improve the accuracy of the current Keystroke Biometric Recognition System 3 Apply findings to other areas

  15. Essence and Significance of WorkDetailed Objectives Gain insight as to the effectiveness and application of MISSING DATA strategies and decision making with incomplete information First Objective: Improve the accuracy of the current Keystroke Biometric Recognition system by improving the FALLBACK model invoked when a sample is of insufficient size Second Objective: • Identify a potential application for a Keystroke Biometric recognition system • Project the findings to other potential areas Third Objective:

  16. Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work

  17. Experiment DesignRe-use of assets from previous Pace work • Data set • Features/feature extraction • Tests • Optimal settings

  18. Experiment DesignFuture inclusion ?

  19. Experiment Design6 Test scenarios Dr. Mary Vilani, Spring 2006 Used with permission

  20. Experiment DesignFeature set Dr. Mary Vilani, Spring 2006 Used with permission

  21. Experiment DesignSummary of Subject Participation • Subjects by Experiment • 36 subjects all four quadrants • 52 subjects1. Copy Task • 40 subjects 2. Free Text • 93 subjects 3. Desktop • 47 subjects 4. Laptop • 41 subjects5. Desk Copy / Lap Free • 40 subjects 6. Lap Copy / Desk Free Dr. Mary Vilani, Spring 2006 Used with permission

  22. Experiment DesignData/Sample Capture Application Dr. Mary Vilani, Spring 2006 Used with permission

  23. Experiment DesignApplication Version 2.0 - developed Fall, 2006 • Development and Implementation of 2 additional Fallback Models • Tremendously enhanced Testing functionality • Development and Implementation of Trace Mechanism

  24. Experiment DesignNew Bio Feature Extractor Interface

  25. Experiment DesignNew Classifier Interface

  26. Experiment DesignHigh Level Overview of Fallback Models Statistical Heuristic Linguistic Model Statistical Model Touch Type Model New Models

  27. Experiment DesignOverview of Models • Linguistic • Touch Type • Statistical

  28. Experiment DesignLinguistic Fallback Model - Duration

  29. Experiment DesignLinguistic Fallback Model - Transition

  30. Experiment DesignTouch Type Fallback Model - Background • Touch Type approach invented by Frank Edgar McGurrin in late 1800’s • Won speed contest on July 25, 1888 • Was front page news • Touch Type Idea - use sense of touch rather than sight (looking at key label) • Most keyboards still have raised indicator on “f” and “j” to indicate home position

  31. Experiment DesignTouch Type Fallback Model

  32. Experiment DesignTouch Type Fallback Model - Duration All Keys All Left Hand All Right Hand Left Little Right Little Left Ring Right Ring Left Middle Left Index Right Index Right Middle A Q ; P Z 1 S / 0 L O W D E K I F G R T V H J Y U N 2 X . 9 C 3 , 8 B 4 5 M 6 7

  33. Experiment DesignTouch Type Fallback Model - Transition Letter/letter Left/left Left/right Right/left Right/right I/N O/N N/D O/R H/E E/R S/T E/S T/H A/N A/T R/E E/A E/N T/I

  34. Experiment DesignStatistical Fallback Model • For Duration – Mean Imputation • For Transition – Multiple Imputation • Mean and Standard deviation calculated on transition full data set • Any value >1 Standard deviation from the mean was removed • New mean and standard deviation calculated on remaining data • Process repeated 3 times

  35. Experiment DesignStatistical Fallback Model – Duration Clusters

  36. Experiment DesignStatistical Fallback Model - Duration B Y CLUSTER 1 All Keys H UNDER 100 G CLUSTER 2 U OVER 100 NODE A I CLUSTER 9 CLUSTER 3 N NODE B A CLUSTER 8 CLUSTER 4 ‘ - L M , S CLUSTER 5 CLUSTER 7 CLUSTER 6 T F P O C . E D R W

  37. Experiment DesignStatistical Fallback Model – Transition development Data Compacting % of sample left after outlier wash Sample Size 100% Data Compacting process

  38. Experiment DesignStatistical Fallback Model – Transition, Raw Order

  39. Experiment DesignStatistical Fallback Model – Transition, Cluster Development

  40. Experiment DesignStatistical Fallback Model - Transition Any/Any E-S Over 50 Node D Under 50 Node B Node A Node C E-A Node 1 Node 4 Node 2 Node 3 S-T N-D A-T T-I O-R E-R R-E E-N A-N H-E O-N I-N T-H

  41. Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work

  42. OutcomesResults Comparison

  43. Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work

  44. AnalysisFallback Trace

  45. Experiment DesignLinguistic Fallback Model – Duration (repeat of previous)

  46. AnalysisProposed Second Generation Touch Type Fallback Model - Duration All Keys Red Circles remain as leafs All else falls back to next level All Left Hand All Right Hand Left Little Right Little Left Ring Right Ring Left Middle Left Index Right Index Right Middle A Q P Z S L O W D E K I F G R T V H J Y U N X C B M

  47. Contents • Overview • Essence and Significance of Work • Experiment Design • Outcomes • Analysis • Future Work

  48. Future WorkTwo Main Areas • Academic • Hybrid System development – keystroke, mouse movement, stylistic • Principle Components • Eigen Values • Application • For Keystroke Biometric system: • Academic – online testing • Biometric Marketing • For General Missing data, analytical applications

  49. Future WorkKey success factors to system acceptance • Robustness – level of trust • Acceptance Level – support by third party processes • Cost – hardware/software, communications and support • Ease of Use/Portability – extent of support across client machines • Security – privacy, integrity, and non-repudiation “future research into the use of biometric technology in online marketing applications must consider not only technical feasibility, but also social and legal acceptability.”

  50. Future WorkBiometric Marketing • Use of Biometric technology to identify and segment users/consumers • What you have to believe: • Segmentation is better • Short + short + short = long for sampling • Chat rooms, e-mails etc.

More Related