330 likes | 482 Views
Basic reading, writing and informatics skills for biomedical research. Segment 4. Other types of database and browser. Biological databases. A database is an indexed collection of information Some databases contain mainly text, but others contain image, sequence or structural data
E N D
Basic reading, writing and informatics skills for biomedical research Segment 4. Other types of database and browser Ganesha Associates
Biological databases • A database is an indexed collection of information • Some databases contain mainly text, but others contain image, sequence or structural data • A browser is a means of visualising this information and the relationships between data elements • There is a growing amount of information in publicly available databases. • For example, in 2011 the Nucleic Acids Research journalonline Molecular Biology Database Collectionlisted 1380. • The National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute(EBI) host some of the most important databases used for biomedical research. • Wikipedia also contains a list of biological databases • Which databases are relevant to your project? Ganesha Associates
Data, data everywhere… • “Rapid release of prepublication data has served the field of genomics well.” • “With close to one million gene-expression data sets now in publicly accessible repositories, researchers can identify disease trends without ever having to enter a laboratory.” • “Most researchers agree that open access to data is the scientific ideal, so what is stopping it happening [in other fields]?” • “Earth scientists need better incentives, rewards and mechanisms to achieve free and open data exchange” Ganesha Associates
The database problem • Volume of digital data (both high throughput and text) • One second of HD video = 2000 pages of text • Distributed systems and databases, lack of data standards, incompatible data formats • Costs of creation, curation and maintenance • Retrieval: semantic search, metadata, images… Ganesha Associates
ExPASy SwissProt PDB Gene Expression Warehouse ExPASy Enzyme OMIM Enzyme Disease Protein LocusLink Affy Fragment Known Gene MGD Sequence SPAD Pathway SNP Metabolite Sequence Cluster NCBI dbSNP Genbank KEGG NMR UniGene The problem – biomedical research Ganesha Associates
Cross-databasesearchtoday- NCBI Ganesha Associates
The problem – biomedical research Ganesha Associates
The problem – biomedical research Ganesha Associates
The problem – healthcare Ganesha Associates
The problem - healthcare JOURNAL of the AMERICAN MEDICAL ASSOCIATION (JAMA) Vol 284, No 4, July 26th 2000 • 2,000 deaths/year from unnecessary surgery • 7,000 deaths/year from medication errors in hospitals • 20,000 deaths/year from other errors in hospitals • 80,000 deaths/year from infections in hospitals • 106,000 deaths/year from non-error, adverse effects of medications These total up to 225,000 deaths per year in the US from iatrogenic causes which ranks these deaths as the # 3 killer. Iatrogenic is a term used when a patient dies as a direct result of treatments by a physician, whether it is from misdiagnosis of the ailment or from adverse drug reactions used to treat the illness (drug reactions are the most common cause). Ganesha Associates
The problem - healthcare • 17 year innovation adoption curve from discovery into accepted standards of practice • Even if a standard is accepted, patients have a 50:50 chance of receiving appropriate care, a 5-10% probability of incurring a preventable, anticipatable adverse event • Medical literature doubling every 19 years • Doubles every 22 months for AIDS care • 2 million facts needed to practice • Genomicsand personalized medicine will increase the problem exponentially • Typical drug order today with decision support accounts for, at best, Age, Weight, Height, Labs, Other Active Meds, Allergies, Diagnoses Ganesha Associates
So how will we find things in databases ? • Search engine collects, indexes, parses, and stores data to facilitate fast and accurate information retrieval. • Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics (statistics), informatics, physics and computer science. Ganesha Associates
Semantic levels Ganesha Associates
The Gene Ontology organisation • The objective of GO is to provide controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products. • These terms are to be used as attributes of gene products by collaborating databases, facilitating uniform queries across them. • The controlled vocabularies of terms are structured to allow both attribution and querying to be at different levels of granularity. • http://www.geneontology.org Ganesha Associates
An example of annotation Mitochondrial P450 (CC24 PR01238; MITP450CC24) GO cellular component term: mitochondrial inner membrane ; GO:0005743 GO molecular function term: monooxygenase activity ; GO:0004497 GO biological process term: electron transport ; GO:0006118 Ganesha Associates
time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle hemocyanin Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Immune response Toll regulated genes control attacked MicroArray data analysis with GO Ganesha Associates Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.
GoPubMed • GoPubMed is a knowledge-based search engine for biomedical texts. The Gene Ontology (GO) and Medical Subject Headings (MeSH) serve as "Table of contents" in order to structure the millions of articles of the MEDLINE data base. • GoPubMed is one of the first Web 2.0 search engines. • The system was developed at the Technical University of Dresden by Michael Schroeder and his team and at Transinsight. • http://www.gopubmed.org Ganesha Associates
Medline Cognition Cognition's Semantic NLP Understands: Word stems - the roots of words; Words/Phrases - with individual meanings of ambiguous words and phrases listed out; The morphological properties of each word/phrase, e.g., what type of plural does it take, what type of past tense, how does it combine with affixes like "re" and "ation"; How to disambiguate word senses - This allows Cognition's technology to pick the correct word meaning of ambiguous words in context; The synonym relations between word meanings; The ontological relations between word meanings; one can think of this as a hierarchical grouping of meanings or a gigantic "family tree of English" with mothers, daughters, and cousins; The syntactic and semantic properties of words. This is particularly useful with verbs, for example. Cognition encodes the types of objects different verb meanings can occur with. Ganesha Associates
iHOP Information Hyperlinked over Proteins. iHOP provides the network of genes and proteins as a natural way of accessing the millions of abstracts in PubMed Ganesha Associates
iHOP • The minimal information view contains general information, like the symbol, name and organism of a gene. Moreover it provides: • Useful links to external resources (e.g. UniProt, NCBI, OMIM, etc.) • Links to other iHOP views on this gene • Homologues • Other views contain all sentences found in the literature: • For the main gene of a page and other genes (gene B) which iteract. • That mention the main gene together with relevant biomedical terms such as lymphoma. • Sentences are ranked by significance, so that screening over a few sentences will be usually sufficient to gain an idea of a gene's function. Ganesha Associates
GenMAPP • GenMAPP is a free computer application designed to visualize gene expression and other genomic data on maps representing biological pathways and groupings of genes. • Integrated with GenMAPP are programs to perform a global analysis of gene expression or genomic data in the context of hundreds of pathway MAPPs and thousands of Gene Ontology Terms. Ganesha Associates
Automatic rendering of pathway interactions Ganesha Associates
Other ways to search – BLAST, PubChem, UCSC Genome Browser By sequence – BLAST: >DinoDNA from JURASSIC PARK p. 103 nt 1-1200 GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACGGACGTGTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCCATGGAGTTCGTGGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAAGCCGGAGCCTTCCTGGGGCTGGGGGGGGGCG By structure – PubChem: Ganesha Associates
Example of BLAST search results Ganesha Associates
PC Compound Record Ganesha Associates
UCSC Genome Browser • The Genome Browser zooms and scrolls over chromosomes, showing the work of annotators worldwide. • The Gene Sorter shows expression, homology and other information on groups of genes that can be related in many ways. • Blat quickly maps your sequence to the genome. The Table Browser provides convenient access to the underlying database. • VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns. • Genome Graphs allows you to upload and display genome-wide data sets. Ganesha Associates