230 likes | 239 Views
Explore new feature creation methods, error analysis strategies, and advanced feature editing types for optimizing text analysis accuracy. Learn how to apply rule language, create error analysis files, and identify key patterns for performance enhancement in data analysis.
E N D
Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division
Outline • New Feature Creation • Error Analysis
Why create new features? • You may want to generalize across sets of related words • Color = {red,yellow,orange,green,blue} • Food = {cake,pizza,hamburger,steak,bread} • You may want to detect contingencies • The text must mention both cake and presents in order to count as a birthday party • You may want to combine these • The text must include a color and a food
Why create new features by hand? • More likely to capture meaningful generalizations • Build in knowledge so you can get by with less training data
Rule Language • ANY() is used to create lists • COLOR = ANY(red,yellow,green,blue,purple) • FOOD = ANY(cake,pizza,hamburger,steak,bread) • ALL() is used to capture contingencies • ALL(cake,presents) • More complex rules • ALL(COLOR,FOOD)
Group Project: Make a rule that will match against questions but not statements
Possible Rule • ANY(ALL(tell,me),BOL_WDT,BOL_WRB)
* Click here Advanced Feature Editing
Types of Basic Features • Primitive features inclulde unigrams, bigrams, and POS bigrams
Types of Basic Features • The Options change which primitive features show up in the Unigram, Bigram, and POS bigram lists • You can choose to remove stopwords or not • You can choose whether or not to strip endings off words with stemming • You can choose how frequently a feature must appear in your data in order for it to show up in your lists
Types of Basic Features * Now let’s look at how to create new features.
Creating New Features *The feature editor allows you to create new feature definitions * Click on + to add your new feature
Right click on a feature to • examine where it matches in • your data Examining a New Feature
Use TagHelper to Code Uncoded File • The output file contains • the codes TagHelper • assigned. • What you want to do now • is to remove prediction • column and insert the • correct answers next to • the TagHelper assigned • answers.
Error Analysis Strategies • Look for large error cells in the confusion matrix • Locate the examples that correspond to that cell • What features do those examples share? • How are they different from the examples that were classified correctly?
Group Project • Load in the NewsGroupTrain.xls data set • What is the best performance you can get by playing with the standard TagHelper tools feature options? • Train a model using the best settings and then use it to assign codes to NewsGroupTest.xls • Copy in Answer column from NewsGroupAnswers.xls • Now do an error analysis to determine why frequent mistakes are being made • How could you do better?