1 / 23

Moving Ahead: Creative Feature Extraction and Error Analysis Techniques

Explore new feature creation methods, error analysis strategies, and advanced feature editing types for optimizing text analysis accuracy. Learn how to apply rule language, create error analysis files, and identify key patterns for performance enhancement in data analysis.

dmelanie
Download Presentation

Moving Ahead: Creative Feature Extraction and Error Analysis Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division

  2. Outline • New Feature Creation • Error Analysis

  3. New Feature Creation

  4. Why create new features? • You may want to generalize across sets of related words • Color = {red,yellow,orange,green,blue} • Food = {cake,pizza,hamburger,steak,bread} • You may want to detect contingencies • The text must mention both cake and presents in order to count as a birthday party • You may want to combine these • The text must include a color and a food

  5. Why create new features by hand? • More likely to capture meaningful generalizations • Build in knowledge so you can get by with less training data

  6. Rule Language • ANY() is used to create lists • COLOR = ANY(red,yellow,green,blue,purple) • FOOD = ANY(cake,pizza,hamburger,steak,bread) • ALL() is used to capture contingencies • ALL(cake,presents) • More complex rules • ALL(COLOR,FOOD)

  7. Group Project: Make a rule that will match against questions but not statements

  8. Possible Rule • ANY(ALL(tell,me),BOL_WDT,BOL_WRB)

  9. * Click here Advanced Feature Editing

  10. Types of Basic Features • Primitive features inclulde unigrams, bigrams, and POS bigrams

  11. Types of Basic Features • The Options change which primitive features show up in the Unigram, Bigram, and POS bigram lists • You can choose to remove stopwords or not • You can choose whether or not to strip endings off words with stemming • You can choose how frequently a feature must appear in your data in order for it to show up in your lists

  12. Types of Basic Features * Now let’s look at how to create new features.

  13. Creating New Features *The feature editor allows you to create new feature definitions * Click on + to add your new feature

  14. Right click on a feature to • examine where it matches in • your data Examining a New Feature

  15. Examining a New Feature

  16. Error Analysis

  17. Create an Error Analysis File

  18. Use TagHelper to Code Uncoded File • The output file contains • the codes TagHelper • assigned. • What you want to do now • is to remove prediction • column and insert the • correct answers next to • the TagHelper assigned • answers.

  19. Load Error Analysis File

  20. Load Error Analysis File

  21. Error Analysis Strategies • Look for large error cells in the confusion matrix • Locate the examples that correspond to that cell • What features do those examples share? • How are they different from the examples that were classified correctly?

  22. Group Project • Load in the NewsGroupTrain.xls data set • What is the best performance you can get by playing with the standard TagHelper tools feature options? • Train a model using the best settings and then use it to assign codes to NewsGroupTest.xls • Copy in Answer column from NewsGroupAnswers.xls • Now do an error analysis to determine why frequent mistakes are being made • How could you do better?

  23. Questions?

More Related