1 / 29

Fex Feature Extractor - v2

Fex Feature Extractor - v2. Topics. Vocabulary Syntax of scripting language Feature functions Operators Examples POS tagging Input Formats. Vocabulary. example A list of active records for which Fex produces a single SNOW example. Usually a sentence. record

halle
Download Presentation

Fex Feature Extractor - v2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fex Feature Extractor - v2

  2. Topics • Vocabulary • Syntax of scripting language • Feature functions • Operators • Examples • POS tagging • Input Formats

  3. Vocabulary • example • A list of active records for which Fex produces a single SNOW example. Usually a sentence. • record • a single position in an example (sentence). • Contains a list of fields, each of which holds a different info: e.g. NLP: Word, Tag, Vision: color, etc. • Raw input to Fex • A list of valid example, (raw sentences, tagged corpora, etc. ) • Fex’s Output • Lexical features written to the lexicon file. • Their corresponding numeric ID’s are written to the example file. • feature function • A relation among one or more records.

  4. Example: Feature Functions

  5. Script Syntax • A Fex script file contains a list of definitions, each of which will rewrite the given observation into a set of active features. • Definition format, terms in ()’s optional: • target (inc) (loc): FeatureFunc ([left, right]) • target - Target index or word. To treat each record in the observation as a target, use -1. This is a macro for “all words”. • inc - Include target word instead of placeholder (*) in some features. • loc - Generate features with location relative to target.

  6. FeatureFunc - A feature function defined in terms of certain unary and n-ary relations, and operators. • left - Left offset of scope for generating features. Negative values are left of the target, positive to the right. • right - Right offset of scope.

  7. Basic Feature Functions • Type DefFex NotationInterpretationOutput to Lexicon Label lab produces a label feature lab[target word] lab(t) lab[target tag] Word w Active if word(s) in current w[current word]record is within scope Tag (pos) t Active if tag(s) in current t[current tag]record is within scope Vowel v Active if the word(s) in v[initial vowel] current record begin with a vowel. Prefix pre Active if the word(s) in the pre[active prefix] current record begins with a prefix in a given list.

  8. Type DefFex NotationInterpretationOutput to Lexicon Suffix suf Active if the word(s) in suf[theactive suffix] the current record begins with a prefix in a given list Baseline base Active if a baseline tag from base[baseline tag] a prepared list exists for the word(s) in the current recordLemma lem Active if a lemma from the lem[active lemma] WordNet database exists for the word(s) in the currentrecord

  9. Example • Sentence = “(DET The) (NN dog) (V is) (JJ mad)” method 1 Script DefOutput to lexiconOutput to example file dog: w [-1,1] 10001 w[The] 10001, 10002, 10003, 10004: 10002 w[is] dog: t [1,2] 10003 t[V] 10004 t[JJ] method 2 Script DefOutput to lexicon Output to example file -1: lab 10001 w[The] 1, 10001, 10002, 10003, 10004: -1: w [-1,1] 10002 w[is] -1: t [1,2] 10003 t[V] 10004 t[JJ]

  10. Operators & Complex Functions • (X) operator - Indicate that a feature is active without any specific instantiation. Script DefOutput to Lexicon dog: v(X) [-1,1] 10001 v[] • (x=y) operator – Creates an active feature iff the active instantiation matches the given argument. Script DefOutput to Lexicon dog: w(x=is) 10001 w[is] Sentence = “(DET The) (NN dog) (V is) (JJ mad)”

  11. Operators & Complex Functions • & operator - conjunct two features: producing a new feature which is active iff record fulfills both constituent features. Script DefOutput to Lexicon dog: w&t [-1,-1] 10001 w[The]&t[DET] • | operator - disjunction of two feature: outputting a feature for each term of the disjunction that is active in the current record. Script DefOutput to Lexicon dog: w|t [-1,-1] 10001 w[The] 10002 t[DET] Sentence = “(DET The) (NN dog) (V is) (JJ mad)”

  12. Operators & Complex Functions • coloc function - Consecutive feature function: takes two or more features as arguments to produce a consecutive collocation over two or more records. The order of the arguments is preserved in the active feature. Script DefOutput to Lexicon mad: coloc(w, t) [-3,-1] 10001 w[The]-t[NN] 10002 w[dog]-t[V] • scoloc function –Sparse Consecutive feature function: operates similarly to coloc, except that active colocations need not be consecutive. However, the order of the arguments is still preserved in determining whether a feature is active. Script DefOutput to Lexicon mad: scoloc(w,t) [-3,-1] 10001 w[The]-t[NN] 10002 w[dog]-t[V] 10003 w[The]-t[V]

  13. Example: POS tagging • Useful features for POS tagging: • The preceding word is tagged c. • The following word is tagged c. • The word two before is tagged c. • The word two after is tagged c. • The preceding word is tagged c and the following word is tagged t. • The preceding word is tagged c and the word two before is tagged t • The following word is tagged c and the word two after is tagged t. • The current word is w. • The most probable part of speech for the current word is c.

  14. Given the sentence: • (t1 The) (t2 dog) (t3 ran) (t4 very) (t5 quickly) • The following Fex script will produce the features from the last slide. -1: lab(t) -1 loc: t [-2,2] -1: coloc(t,t,t) [-2,2] -1 inc: w[0,0] -1: base[0,0] • To do POS tagging, an example needs to be generated for each word in observation.

  15. For the third word, “ran”, the script produces the following output: • Script: LexiconOutput: -1: lab(t) 1 lab[t3] -1 loc: t [-2,2] 10001 t[t1_*] 10002 t[t2*] 10003 t[*t4] 10004 t[*_t5] -1: coloc(t,t,t) [-2,2] 10005 t[t1]-t[t2]-* 10006 t[t2]-*-t[t4] 10007 *-t[t4]-t[t5] -1 inc: w [0,0] 10008 w[ran] -1: base [0,0] 10009 base[V] • And an example in the example file: • 1, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008, 10009:

  16. Input Formats • Fex can presently accept data in two formats: • w1 w2 w3 w4 … • (t1 w1) (t2 w2) (t3 w3) (t4 w4) … • w1 (t2 w2) (t3 t3a; w3) (t4; w4 w4a) …

  17. Using Fex (command line) fex [options] script-file lexicon-file corpus-file example-file Options: • -t: target file • do not have any empty line in your file!!! • Each target in a separate line • -r: test mode • Does not create new features • -h, -I • Creates a histogram of active features

  18. Using Fex (command line) • Target file= targ: Script file = script: dog -1 : lab cat -1 : w [-1,-1] -1 : t [-1,-1] Corpus file = corpus (DET The) (NN dog) (V is) (JJ mad) Lexicon file =lexicon Example file=example fex –t targ script lexicon corpus example

  19. SNoW

  20. Word representation

  21. Restrictions on the learning approach • Multi- Class • Variable number of features • per class • per example • Efficient learning • Efficient evaluation

  22. SNoW • Network of threshold gates • Target nodes represent class labels • Input nodes (features) and links are allocated in a data driven way (Order of 105 input features for many target nodes) • Each sub-network (target nodes) is learned autonomously as a function of the features • An example presented is positive to one network negative to others (depends on the algorithm) • Allocations of nodes (features) and links is Data-Driven (a link between feature fi and targettjis created only when fi was active with any target tj)

  23. Word prediction using SNoW • Target nodes each word in the set of candidates words is a target node • Input nodes an input node for feature fi is allocated only if that feature fi was active with any target • Decision task we need to choose one target among all possible candidates

  24. SNoW (Command line) snow –train –I inputfile –F networkfile [-ABcdePrsTvW] snow –test –I inputfile –F networkfile [-bEloRvw] Architecture Winnow: -W [, , , init weight] :targets Perceptron: -P [, , init weight] :targets NB: -B :targets

  25. SNoW parameters (training) -d <none | abs:<k> | rel > : discarding method -e <i> : eligibility threshold -r <i> : number of cycles output modes -c <i> : interval for network snapshot -v < off | min | med | max > :details for the output to the screen

  26. SNoW parameters (testing) -b <k> : smoothing for NB -w <k> : smoothing for W, P output modes -E : error file -o < accuracy | winners | allpred | allact | allboth > :details for the output -R : results file (stdout)

  27. File Format (Example file) 6, 10034, 10141, 10151, 10158, 10179: 177, 10034, 10035, 10047: With weights: 6, 10034(1), 10141(1.5), 10151(0.4), 10158(2), 10179(0.1): 177, 10034(2), 10035(4), 10047(0.6): Only active feature appear in an example !!!

  28. File Format (Network file) NB target 111 0 1 135 1 naivebayes 0 0.1 0.5 111 : 0 : 10020 : 4 0 -3.518980417 111 : 0 : 10021 : 1 0 -4.905274778 Winnow target 111 1 1 135 1562 winnow 0 1.1 0.9 15 1 111 : 0 : 10020 : 4 1 1.1 111 : 0 : 10021 : 1 0 1 Perceptron target 111 2 1 2701 perceptron 0 0.1 4 0.2 111 : 0 : 10020 : 4 1 0.3 111 : 0 : 10021 : 1 0 0.2

  29. File Format (Error file) Algorithms: Perceptron: (1, 30, 0.05) Targets: 3, 53, 73 Ex: 8 Prediction: 3 Label: 53 3: 0.5866 53: 0.2592* 73: 0.1192 Ex: 15 Prediction: 3 Label: 73 3: 0.5987 73: 0.001229* 53: 0.0002248

More Related