150 likes | 257 Views
Miguel Andrade Ottawa Health Research Institute August 30, 2006 Indiana University, Bloomington, IN. Workshop on Scholarly Databases. Introduction. Miguel Andrade Ottawa Health Research Institute
E N D
Miguel Andrade Ottawa Health Research Institute August 30, 2006 Indiana University, Bloomington, IN Workshop on Scholarly Databases Workshop on Scholarly Databases
Introduction • Miguel Andrade • Ottawa Health Research Institute • Discuss collaborative projects to distribute the annotation of scientific databases amongst scientists. Workshop on Scholarly Databases
Data • Data Description: MEDLINE, • Purpose: describe scientific literature • Audience: scientists, medical doctors, general public Workshop on Scholarly Databases
Data Statistics • Data statistics – 16M entries • data fields: authors, publication journal, abstract, keywords,… • years covered: exhaustive since the 60s (they are working backwards I believe) Workshop on Scholarly Databases
Data Formats • Data formats / types - what type of data is stored and in what format? Working with all kind of biomedical data: sequences, text, diseases, functional annotations, gene expression, … Workshop on Scholarly Databases
Data Example Workshop on Scholarly Databases
Data Example Workshop on Scholarly Databases
Data Example Workshop on Scholarly Databases
Data Example Workshop on Scholarly Databases
Data Management • Database Technology – MySQL or basic scripts • Storage Technology – Backup automated to tapes • Backup Strategy – Daily. Managed by local IT group. We tested them ;-) Workshop on Scholarly Databases
Organization • Partners – Quite on our own. Some collaborations with other groups (Jonathan Wren Oklahoma University / Peer Bork EMBL) • Funding / Ownership - No funding for text mining projects. Workshop on Scholarly Databases
Integration Challenges • What data integration issues are you facing/addressing in your research? Policy. Data embargo needed when there are sensitive issues. Default applied. Not applicable to MEDLINE (academic license from NLM) Technical. Main problem is data updates and inconsistencies. Workshop on Scholarly Databases
Integration Solutions • What data integration solutions do you use/have you developed? We have tried several data and text mining strategies. We rely a lot on the links between entries across different databases. Workshop on Scholarly Databases
Research References • References/links/pointers to relevant work, papers, and efforts. Muro et al. 2006. Amplification of the Gene Ontology annotation of Affymetrix probe sets. BMC Bioinformatics. 7, 159. Perez-Iratxeta et al., 2005. Inconsistencies over time in 5% of NetAffx probe-to-gene annotations. BMC Bioinformatics. 6, 183. Perez-Iratxeta, et al. 2005. Study of stem cell function using microarray experiments. FEBS Letters. 579, 1795-1801. Suomela and Andrade. 2005. Ranking the whole MEDLINE database according to a large training set using text indexing. BMC Bioinformatics. 6, 75. Netzel et al. 2003. Country-specific variations of English in the scientific literature. EMBO Reports. 4, 446-451. Perez-Iratxeta and Andrade. 2002. Worldwide scientific publishing activity. Science. 297, 519. Perez-Iratxeta et al. 2002. Association of genes to genetically inherited diseases using data mining. Nature Genetics. 31, 316-319 Perez-Iratxeta, et al. 2001. XplorMed: A tool for exploring MEDLINE abstracts. Trends Biochem Sci. 26, 573-575. Workshop on Scholarly Databases
Research References • links • StemBase: http://www.scgp.ca:8080/StemBase/ A database of stem cell related gene expression data. • XplorMed: http://www.ogic.ca/projects/xplormed/ A tool for the analysis of bibliographic searches in MEDLINE. • G2D: http://www.ogic.ca/projects/g2d_2/ A tool to predict genes associated to inherited disease. • MarkerServer: http://www.ogic.ca/projects/markerserver/enter.php Detection of markers in large collections of gene expression data. • Kfinder: http://www.ogic.ca/projects/kfinder/ A server for text analysis to support scientific publishing Workshop on Scholarly Databases