Workshop on Scholarly Databases

Miguel Andrade Ottawa Health Research Institute August 30, 2006 Indiana University, Bloomington, IN Workshop on Scholarly Databases Workshop on Scholarly Databases

Introduction • Miguel Andrade • Ottawa Health Research Institute • Discuss collaborative projects to distribute the annotation of scientific databases amongst scientists. Workshop on Scholarly Databases

Data • Data Description: MEDLINE, • Purpose: describe scientific literature • Audience: scientists, medical doctors, general public Workshop on Scholarly Databases

Data Statistics • Data statistics – 16M entries • data fields: authors, publication journal, abstract, keywords,… • years covered: exhaustive since the 60s (they are working backwards I believe) Workshop on Scholarly Databases

Data Formats • Data formats / types - what type of data is stored and in what format? Working with all kind of biomedical data: sequences, text, diseases, functional annotations, gene expression, … Workshop on Scholarly Databases

Data Example Workshop on Scholarly Databases

Data Management • Database Technology – MySQL or basic scripts • Storage Technology – Backup automated to tapes • Backup Strategy – Daily. Managed by local IT group. We tested them ;-) Workshop on Scholarly Databases

Organization • Partners – Quite on our own. Some collaborations with other groups (Jonathan Wren Oklahoma University / Peer Bork EMBL) • Funding / Ownership - No funding for text mining projects. Workshop on Scholarly Databases

Integration Challenges • What data integration issues are you facing/addressing in your research? Policy. Data embargo needed when there are sensitive issues. Default applied. Not applicable to MEDLINE (academic license from NLM) Technical. Main problem is data updates and inconsistencies. Workshop on Scholarly Databases

Integration Solutions • What data integration solutions do you use/have you developed? We have tried several data and text mining strategies. We rely a lot on the links between entries across different databases. Workshop on Scholarly Databases

Research References • References/links/pointers to relevant work, papers, and efforts. Muro et al. 2006. Amplification of the Gene Ontology annotation of Affymetrix probe sets. BMC Bioinformatics. 7, 159. Perez-Iratxeta et al., 2005. Inconsistencies over time in 5% of NetAffx probe-to-gene annotations. BMC Bioinformatics. 6, 183. Perez-Iratxeta, et al. 2005. Study of stem cell function using microarray experiments. FEBS Letters. 579, 1795-1801. Suomela and Andrade. 2005. Ranking the whole MEDLINE database according to a large training set using text indexing. BMC Bioinformatics. 6, 75. Netzel et al. 2003. Country-specific variations of English in the scientific literature. EMBO Reports. 4, 446-451. Perez-Iratxeta and Andrade. 2002. Worldwide scientific publishing activity. Science. 297, 519. Perez-Iratxeta et al. 2002. Association of genes to genetically inherited diseases using data mining. Nature Genetics. 31, 316-319 Perez-Iratxeta, et al. 2001. XplorMed: A tool for exploring MEDLINE abstracts. Trends Biochem Sci. 26, 573-575. Workshop on Scholarly Databases

Research References • links • StemBase: http://www.scgp.ca:8080/StemBase/ A database of stem cell related gene expression data. • XplorMed: http://www.ogic.ca/projects/xplormed/ A tool for the analysis of bibliographic searches in MEDLINE. • G2D: http://www.ogic.ca/projects/g2d_2/ A tool to predict genes associated to inherited disease. • MarkerServer: http://www.ogic.ca/projects/markerserver/enter.php Detection of markers in large collections of gene expression data. • Kfinder: http://www.ogic.ca/projects/kfinder/ A server for text analysis to support scientific publishing Workshop on Scholarly Databases

Workshop on Scholarly Databases

Workshop on Scholarly Databases

Presentation Transcript

Databases on the Internet

Finding Scholarly Research on Your Topic

On Becoming a Scholarly Teacher

How to find scholarly/peer-reviewed articles in EBSCO databases

Databases Workshop

Finding Scholarly Research on Your Topic

On Becoming a Scholarly Teacher

Evaluating Scholarly Databases

Scholarly vs. Popular Databases

Report on Scholarly Communication Initiatives @ Microsoft

XML + Databases = ? (DIMACS Workshop, 3/2000)

Workshop on Scholarly Databases

Databases on NSs

Capita selecta on scholarly communication

A Workshop on Writing for Scholarly Publication

Workshop on Sustainable Models for University-based Scholarly Publishing

Licensed Databases vs. Google Scholar: Which is more Scholarly?