• 10 likes • 127 Views
Mutation Grounding Algorithm Jonas B. Laurila 1 , Rajaraman Kanagasabai 2 and Christopher J. O. Baker 1 1 University of New Brunswick, Saint John. 2 Institute for Infocomm Research, Singapore. April 13th, 2010. Motivation
E N D
Mutation Grounding AlgorithmJonas B. Laurila1, Rajaraman Kanagasabai2 and Christopher J. O. Baker11University of New Brunswick, Saint John. 2Institute for Infocomm Research, Singapore.April 13th, 2010 Motivation Protein mutations are derived from in vitro experimental analysis and their impacts described in detail in scientific papers. Reuse of mutation impact annotations is an important subfield of bioinformatics for which mutation grounding is a critical step. We present a method for grounding of textual mentions from scientific papers describing mutational changes made to proteins. We distinguish between grounding of mutation entities to database entries and positionally correct grounding on amino acid sequences extracted from protein databases. Mutation Annotation Example (from within GATE) Conclusion Automated reuse of mutation impact information from documents is now an achievable milestone, given the respectable performance of our grounding algorithm. In combination with mutation impact extraction from sentences, the mutation grounding algorithm will facilitate the construction of unique datasets suitable as training material for predicting the impacts of genomic variations and the extraction of genotype-phenotype relations. Entity Recognition & Grounding Framework Grounding Workflow • Retrieve protein and gene mentions. • Retrieve all related accession numbers from MGDB, discard all but the most occuring. • Retrieve all organism mentions and discard accession numbers not related to retrieved organisms. • Retrieve all unique mutation mentions, normalize with MutationFinder and try to fit as many as possible onto the sequences corresponding to the accession numbers still left. • The accession number and corresponding sequence on to which most mutations are grounded is now considered as the correct one for the entire document. Grounding Algorithm Evaluation To evaluate the method for mutation grounding a gold standard corpus was built using the COS MIC database. Three target proteins/genes were considered, PIK3CA, FGFR3 and MEN1. Full-text papers containing more than one single point mutation and only about one single gene were chosen, with a total number of 63 documents. Acknowledgements -New Brunswick Innovation Foundation -NSERC Discovery Grant awards to Christopher J. O. Baker Performance