250 likes | 364 Views
Development of a Naïve Bayesian Classifier for “Big Five” Item Domains. Alan D. Mead Cassia K. Carter Illinois Institute of Technology. Agenda. The problem: items and domain Bayesian classification Research Questions Method Results Future Directions. Items and Domains.
E N D
Development of a Naïve Bayesian Classifier for “Big Five” Item Domains Alan D. Mead Cassia K. Carter Illinois Institute of Technology
Agenda • The problem: items and domain • Bayesian classification • Research Questions • Method • Results • Future Directions
Items and Domains • In most tests, items are assigned to domains • To meet content specifications • To provide feedback by domain • Items are usually assigned by the item writers and double-checked during item review • For many tests, manual classification is reliable and easy
Personality Domains • This research stemmed from two projects aimed at automating the development of personality items • Project 1: Personality items generated from templates • Project 2: LSA was used to assemble items from a large pool according to semantic similarity between items and a construct definition • Manual classification is not perfectly reliable • It would be good to have a methodological way to classify items into domains
Big Five Dimensions • The Big Five is a common, high-level taxonomy for personality constructs: • Conscientiousness (“I like order” C+) • Agreeableness (“I insult people” A-) • Neuroticism (“I often feel blue” N+) • Openness (“I do not have a good imagination” O-) • Extraversion (“I am the life of the party” E+) • “I am a warm, nurturing person” E+ or A+? • “I am very traditional” O- or C+?
Choice of Methodology • In this classification problem, there will be no response data • If we had response data, we could use EFA/CFA • Predictors are the presence of specific words • These data are probably of a nominal level of measurement • Sample size is the number of items to classify • We might easily have many more predictors than rows of data (even ignoring interaction terms)
Choice of Methods • When X/predictors and Y/domains are metric, many techniques exist (LDF/Regression, LCA, factor analytic approaches, etc.) • When X is metric but Y is categorical, logistic regression is suitable • What to use when X and Y are not metric? • Naïve Bayesian Classifiers are one solution
Bayesian Classification • Predict nominal classes (domain) from nominal predictors (presence of specific words) • Handle problems with many predictors • Have a history of successful application • Are computationally simple • Have been shown to be robust to technical issues like high degrees of multidimensionality and noise
Bayesian Classification (cont.) • Compute P(domain|item) for each domain • Classify as domain of maximum probability: Predicted domain = argmax P(domain|item) = argmax P(item|domain)P(domain)/P(item) = argmax P(item|domain)P(domain) = argmax P(w1,w2,…,wn|domain)P(domain) ≈ argmaxP(w1|d.)P(w2|d.) …P(wn|d.)P(d.) • “Naïve” refers to this assumption of independence of the predictors
Example of NB Classification • I am the life of the party (E+) • Classified as extraversion
Research Questions • RQ1: How well does this method work? • Can it be improved? • RQ2: Does adding additional items help improve classification accuracy? • RQ3: Does type of item added in matter? • RQ4: How to handle unknown words?
Method • Compiled a database of five forms of various Big Five personality tests; N=655 • Leave one out cross-validation (LOOCV) was used: • Hold out item 1; Train classifier on remaining items; Classify item 1 • Hold out item 2; Train classifier on remaining items; Classify item 2 • Repeat for items 3, 4, …, N • Compare predicted domain to actual domain
Pre-processing & Processing • Force all terms to lowercase • Discard any punctuation • Discard common words (I, am, a, the, etc.) • Use Porter stemming to produce rough lemmas (annoyed, annoy, annoys, annoying -> “anno”) • Ignore unknown words (i.e., discard them)
RQ1: Classification Results • 70.5% accuracy (see diagonal) • Too few Extraversion
RQ2: Adding In Items • Added in items written as a part of three grad-level classes • All Big Five items, classified by students who wrote them • Blind manual classification • Final item set included items where agreement occurred for original classification and two independent raters • New N=1116
Additional Items Results • Accuracy 75.3% • Increase only about 3% above set of 655 items • Now Openness lowest, Extraversion still low
RQ3: Type of Item • Does type of item added into database matter? • Template Group 1: Template items where frequency words varied (“I {always/sometimes/never/rarely/often} enjoy spending time with other people”) • N = 940 • Template Group 2: Manually generated templates based on IPIP items (“I have difficulty {dreaming up| conceiving of| brainstorming| devising| inventing| making up| planning| scheming| visualizing} things.”) • N = 194 • Template Group 3: I am a BLANK person (“I am an energetic person”) • N = 1,239 • Student Item Set: Another group of student-written Big Five items only reviewed by one rater
Type of Item Results • Adjective-based items had lowest accuracy • Items come down to a single word, often unique • Template items with high redundancy were best on their own • However, accuracy for this group dropped when added to overall set • Template items with less redundancy improved overall accuracy somewhat • Adding more items doesn’t help dramatically • But adding in items with more information does help
RQ4: Unknown Terms • Unknown terms are a real problem • “I am filled with doubts about things” was seen as “things” because “doubts” and “filled” were used only in this item • Many items hinge upon a single word (e.g., “workaholic”) • Solution: Replace unknown term with sense 1 from wiktionary.org; e.g.: • http://en.wiktionary.org/wiki/advance
Unknown Term Example • “I sometimes feel bashful.” • “bashful” is not known • Lookup up bashful: “inclined to avoid notice” • “I sometimes feel inclined to avoid notice.” • Simplistic approach: • Ignored grammatical implications • In this case, it wasn’t possible to match senses, so sometimes the wrong definition was used. • Did not check that definition used known terms
Results: Unknown Terms • 84% unchanged • Originally 48% correct; After defining unknown terms, 58% correct • 4 items (13%) improved; 1 item (3%) became a miss
Unknown Terms • Small improvements using this method • Would work better if the correct sense could be chosen • Often sense 1 was not the correct part of speech • Some words did not have correct senses on Wiktionary • Could try using synonyms
Future Directions • Find more personality items • Explore better ontologies (e.g., WordNet) • Analyze words more carefully • Part-of-speech (POS) tagging • Try using word-sense disambiguation • Search definitions for “personality-ish” definitions • Use Laplace smoothing and POS tag to handle unknown terms algorithmically