100 likes | 245 Views
Beespace Prototype Design Meeting Entity Recognition. Jing Jiang 09/28/2005. Entity Recognition in Prototype V1. Target entities: gene names Supervised learning: LingPipe (word trigram and tag bigram model) Training data: BioCreative (manually annotated)
E N D
Beespace Prototype Design MeetingEntity Recognition Jing Jiang 09/28/2005
Entity Recognition in Prototype V1 • Target entities: gene names • Supervised learning: LingPipe (word trigram and tag bigram model) • Training data: • BioCreative (manually annotated) • Drosophila (generated from gene lists)
Sample Results • http://sifaka.cs.uiuc.edu/jiang4/Beespace
Performance • Some gene names without explicit mention of “gene” can be captured • E.g., “glutathione S-transferase” • Problems • Gene-like phrases, e.g., “China 2”, “13.8” • Mismatch of gene name boundaries and noun phrase boundaries, e.g., “nicotinic” in “nicotinic pathway”
V2 -- Entity Types • Annotation guideline for BioCreative • Guideline for Beespace? • Ontology? (GENIA ontology) • What to tag? • Genes and proteins • Family of genes • Gene descriptions • Entity boundaries and noun phrase boundaries • Tag only noun phrases that refer to genes or tag any occurrence of a gene name inside a noun phrase?
Sample Sentences • A dose-dependent transactivation of human hARE-mediated chloramphenicol acetyltransferase (cat) geneexpression was observed upon treatments of the Hepa-1 transfectants with TPA, a known inducer, as well as with CAPE. • In the present study, we identified its preferred binding sequence as 5'-CCCTATCGATCG-ATCTCTACCT-3' and characterized its DNA -binding properties using truncated Mblk-1 mutants.
Sample Sentences (cont.) • At least two kinds of nicotinic receptors seem to be involved in honeybee memory, an alpha-bungarotoxin-sensitive and an alpha-bungarotoxin-insensitive receptor. • The involvement of nicotinic pathways in memory formation and retrieval processes was tested by injecting…
Sample Sentences (cont.) • We report the cloning of a honeybee CSP gene calledASP3c, as well as the structural and functional characterization of the encoded protein. • Natural occurring variatioin in npr-1, a gene encoding a putative receptor for an NPY-like molecule, causes variation in feeding behaviour.
Sample Sentences (cont.) • The gene encoding ZENK, an EARLY IMMEDIATEGENE well known in other learning and memory contexts, has figured prominently in molecular songbird research thus far. • This is because frequent contacts of these types cause an increase in the expression of the gene encoding a glucocortocoidreceptor in the hippocampus, and…
Training Data • Dictionary • Rules/guidelines • Bootstrapping • Cross-domain training • Can training data in other domains (fly, human, etc.) still be useful?