Machine Learning in GATE

Machine Learning in GATE Valentin Tablan

Machine Learning in GATE • Uses classification. [Attr1, Attr2, Attr3, … Attrn]  Class • Classifies annotations. (Documents can be classified as well using a simple trick.) • Annotations of a particular type are selected as instances. • Attributes refer to instance annotations. • Attributes have a position relative to the instance annotation they refer to.

Attributes Attributes can be: • Boolean The [lack of] presence of an annotation of a particular type [partially] overlapping the referred instance annotation. • Nominal The value of a particular feature of the referred instance annotation. The complete set of acceptable values must be specified a-priori. • Numeric The numeric value (converted from String) of a particular feature of the referred instance annotation.

Implementation Machine Learning PR in GATE. Has two functioning modes: • training • application Uses an XML file for configuration: <?xml version="1.0" encoding="windows-1252"?> <ML-CONFIG> <DATASET> … </DATASET> <ENGINE>…</ENGINE> <ML-CONFIG>

<DATASET> <DATASET> <INSTANCE-TYPE>Token</INSTANCE-TYPE> <ATTRIBUTE> <NAME>POS_category(0)</NAME> <TYPE>Token</TYPE> <FEATURE>category</FEATURE> <POSITION>0</POSITION> <VALUES> <VALUE>NN</VALUE> <VALUE>NNP</VALUE> <VALUE>NNPS</VALUE> … </VALUES> [<CLASS/>] </ATTRIBUTE> … </DATASET>

<ENGINE> <ENGINE> <WRAPPER>gate.creole.ml.weka.Wrapper</WRAPPER> <OPTIONS> <CLASSIFIER>weka.classifiers.j48.J48</CLASSIFIER> <CLASSIFIER-OPTIONS>-K 3</CLASSIFIER-OPTIONS> <CONFIDENCE-THRESHOLD>0.85</CONFIDENCE-THRESHOLD> </OPTIONS> </ENGINE>

Attributes Position Instances type: Token

Machine Learning PR • Can save a learnt model to an external file for later use. Saves the actual model and the collected dataset. • Can export the collected dataset in .arff format.

Standard Use Scenario Application • Prepare data by enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc). • [ Load the previously saved model. ] • Run the ML PR in application mode. • [ Save the learnt model. ] Training • Prepare training data by enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc). • Run the ML PR in training mode. • Export the dataset as .arff and perform experiments using the WEKA interface in order to find the best attribute set / algorithm / algorithm options. • Update the configuration file accordingly. • Run the ML PR again to collect the actual data. • [ Save the learnt model. ]

An Example Learn POS category from POS context.

Using Other ML Libraries The MLEngine Interface Method Summary • void addTrainingInstance(List attributes) Adds a new training instance to the dataset. • Object classifyInstance(List attributes) Classifies a new instance. • void init() This method will be called after an engine is created and has its dataset and options set. • void setDatasetDefinition(DatasetDefintion definition) Sets the definition for the dataset used. • void setOptions(org.jdom.Element options) Sets the options from an XML JDom element. • void setOwnerPR(ProcessingResource pr) Registers the PR using the engine with the engine.

Machine Learning in GATE

Machine Learning in GATE

Presentation Transcript

Topics in Machine Learning

Machine Learning in Bioinformatics

Machine Learning

Machine Learning

MACHINE LEARNING

Machine Learning

Machine Learning

Machine Learning

Machine Learning in DryadLINQ

Machine learning in IDS

Submodularity in Machine Learning

Machine Learning

Machine Learning

Machine Learning in realtime

GATE, Human Language and Machine Learning gate.ac.uk/ nlp.shef.ac.uk/

Experiments in Machine Learning

Evaluation in Machine Learning

Machine Learning in Football

Machine learning Courses | Machine Learning Training

Experiments in Machine Learning

Machine learning in IDS

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn