10 likes | 96 Views
PattArAn – From Annotation Triplets to Sentence Fingerprints. The PattArAn Team at the University of Maryland, t he University of Iowa, and St. Bonaventure University. Motivation. Gene-GO-PO Triplets.
E N D
PattArAn – From Annotation Triplets to Sentence Fingerprints The PattArAn Team at the University of Maryland, the University of Iowa, and St. Bonaventure University Motivation Gene-GO-PO Triplets • Scientific concepts are annotated with controlled vocabulary (CV) terms from ontologies such as Gene Ontology (GO) and Plant Ontology (PO). • Our Arabidopsis specific tool - Patterns in Arabidopsis Annotation (PattArAN) will focus on pattern creation from annotation knowledge of (gene, GO, PO) triplets and triplet validation using the scientific literature. • PattArAn will help scientists to scour the literature, to understand the connection to the annotation evidence and biological knowledge, and to develop hypotheses. • Goals: • (1) Explore new research ideas in three areas of interests using PattArAn. • (2) Build a gold standard dataset using manual annotation of triplet fingerprints. • Area 1: regulation of flower and fruit development by genes and signal pathways. (e.g., genes TSO1, TSO2, MSI1) • Area 2: signal transduction of the plant hormone ethylene. • (e.g., genes ETR1, ERS1, ETR2) • Area 3: integration of metabolite transporters with plant growth, development and survival. (e.g., genes AtCHX17, AtNHX1, AtKEA2) Observations Document Annotation Guidelines • GO and PO combinations centered on a gene. • Documents supporting annotations identified and collected. • Annotations: Triplets represented by sentences to varying degrees. Supplementary material quite rich. Doublets have most potential. • Knowledge Underlying Triplets: Annotations of document (16399800) well explain a biological process of Arabidopsis thaliana. The TSO2 gene relates to cell division by controlling dNTPs balance. All annotating GOs link through the function of TSO2. Also TSO2 is expressed in the organs mentioned in the POs. Thus, this paper nicely links the PO terms and GO terms. • Cross-document inference: Document 9880378 indicates that the redox gene AtCB5-D is expressed at varying levels across plant tissues. Document 17028151 indicates that upon infection with Pseudomonas syringae, expression levels drop significantly in Arabidopsis leaves. This process is one aspect of a complex, genome wide response to bacterial infection involving many genes. • Inferred Triplet: Using doublets in document (18305484) we may infer that: “The plasma membrane protein SLAC1 is essential for stomatal closure in response to CO2, abscisic acid, ozone, light/dark transitions, humidity change, calcium ions, hydrogen peroxide and nitric oxide.” This is interesting as it is describes a single protein that is involved in many responses due to various environmental signals. Summary Future Work • Check inter-annotator agreement. • Extract gene interaction sentences in the context of our annotation triplets. • Develop algorithms to rank sentences by importance with this gold standard data. Using our triplets we could identify connections between a specific area to other fields in biology in under four weeks. Interesting also to see how biologists’ genes of interest may function in concert to influence different bioprocesses. This well serves as the beginning of an exploration that may eventually lead to new hypotheses and discoveries.