Generating Password Challenge Questions

Chuong Ngo Generating Password Challenge Questions

Online Services and the Problem of Account Security • E-commerce, banking, e-mail, etc... • Average: 26 different online accounts • 5 unique passwords • 25 to 30: 40+ accounts • 2012 online fraud cases: 3x 2010 case count • 90% of accounts require user id and password • Passwords need to be strong and unique

Passwords: So secure you can't remember it? • Memorability vs security - negative correlation • Password recovery systems a must • SMS • E-mail • Snail mail • Challenge questions

Why Challenge Question? • User must answer agreed upon questions to validate identity • Most commonly used system

Just How Safe and Secure? • System is weak & exploitable • Answers easy to obtain/public domain • Social media • 12% answerable with social media info • Applicability & repeatability

Can It be Salvaged? • Treat challenge questions like passwords • Must value memorability • Avoid too many “easy” answers • Large pool of challenge questions • What if the questions were targeted and personal?

Targeted Challenge Questions • Applicability and repeatability negligible • More personal, more secure & memorable • Greater answer variety from long-form answers Make a system the uses or generates challenge questions that target the user's strong, personal memories.

System Concept Data Ingest Data Retrieval

The Natural Language Processing Engine at the Heart of it All

The NLP Engine • Uses Stanford CoreNLP • Pipeline includes: • Tokenizer • Sentence Splitter • PoS Tagger • Morpha Annotator • NER • Parsing • Coreferencer

Notable Pipeline Absences • No sentiment analyzer • Requires training for individuals • No real advantage • No relationship analyzer • Beyond scope • Limited use of the coreferencer and dependency tree. • Focused on named noun entities (NN) to simplify implementation.

Fill-in-the-Blanks (FitB) Approach A First Step

FitB Approach Overview • Challenge question is open-ended and general. • User provides a long-form response. • Presents user with the modified answer to the challenge question. • User must “fill-in-the-blanks”/correct the mistakes. • Authentication done by comparing user's responses to the missing entities. • Match must meet or exceed a threshold.

An Example Bob is a great uncle. He loved to fish and would do so as often as he could near his home in Minnesota. He taught me to fish over the summer that I stayed with him. Everyday, we would go to a nearby stream. The stream would later feed into the Mississippi River. [Blank] is a great uncle. He loved to fish and would do so as often as he could near his home in [Blank]. He taught me to fish over the summer that I stayed with him. Everyday, we would go to a nearby stream. The stream would later feed into the [Blank] River.

Why does it work? • It is a single story. • Multiple NNs related to the same idea. • It is memorable. • Prompt helps to kick start memory. • Simple and fast • Does not overly burden the user. • Avoids the problem of question generation. • Easily extensible • No web of knowledge – preserves privacy.

Where does it fall short? • Potentially low entropy in question pool. • Question is not generated. • No web of knowledge – no context. • Unable to correlate multiple stored user responses. • Dependent on large number of NNs. • Needs clean, non-noisy input. • Token matches does not tolerate much deviation. • Some private information may be leaked. • Unable to be integrated into other sources of information. • Significant setup time.

Future Work • Different user interfaces • Example: pictures • Incorporate additional processors • Example: relationship analyzer • Increase the number of data points to match.

Document Retrieval Approach A Slight Twist

Document Retrieval Approach • Similar to FitB approach. • User is prompted to answer the same challenge question they originally wrote an answer for. • User's answers run through NLP engine, NNs extracted. • NNs used to search through all registered answer documents, matching via bag-of-words count. • Authenticated if match is above a specified threshold.

Not Quite Right... • Cannot use regular bag-of-words approach. • Source document and user-provided answer document may differ too much. • Not backed by web of knowledge. • Does not reveal private information.

Future Work • May benefit from existing search engine technologies (ex. Lucene). • May benefit from more data points to match.

Generating Questions from a Web of Knowledge (WOK) Approach Now I Understand Why This is Still Unsolved

WOK Approach Overview • NLP engine extracts the NNs from the user's initial response. • User is prompted to provide more information for the NNs. • Information stored in WOK. • Challenge questions generated from WOK. • Answers compared to the information in the WOK.

Making the WOK • Utilized Protege • Popular java library for OWL and RDF. • Information stored as OWL data models.

Generating the Questions • A random class is chosen from the WOK. • Question is generated using a property's label id and a template question. • User's response is matched against the property's value.

An Example What is the [Blank] of your [Blank]? • What is the livesIn of your Bob Ngo? • What is the name of your Uncle? • What is the relation of your Minnesota?

Why doesn't it work? • Question generation algorithm needs to be less naive. • Generated questions are very impersonal. • Not really an improvement over current method. • Creation of WOK is not automatic/semi-automatic. • Expected answer must be an exact match. • Greater invasion of privacy – has WOK. • Significant setup time.

Future Work • Question generation algorithm must be improved. • Incorporation of additional NLP technologies for a smarter WOK. • Ontology is the wrong technology?

Conclusion • FitB approach is the most ready for deployment. • Document retrieval approach evaluation incomplete. • WOK approach needs a lot more work.

Questions?

Generating Password Challenge Questions

Generating Password Challenge Questions

Presentation Transcript

Four Challenge Questions

Password

Generating Research Questions

Generating Ideas

QUESTIONS THAT CHALLENGE HOW TO CRACK THEM!

Generating Reading Comprehension Questions for Different Genres of Text

PASSWORD

PASSWORD

Collaborative Conservation Solutions The Challenge Questions

Generating Electricity

ADDITIONAL GEOGRAPHY CHALLENGE QUESTIONS:

Generating….

Password

Generating Ideas

Password?

Generating Relevant Research Questions

ADDITIONAL GEOGRAPHY CHALLENGE QUESTIONS:

Generating Ideas

The 50 questions in 50 minute Challenge

Generating Electricity

DARPA Robotics Challenge Lessons Learned Unanswered Questions