410 likes | 876 Views
Primig lab michael.primig@unibas.ch http://www.bioz.unibas.ch/primig Thomas Aust Roopa Basavaraj (visiting scientist) Michel Bellis (visiting scientist) Guenda Berthold Philippe Demougin Leandro Hermida Reinhold Koch Ulrich Schlecht Christa Wiederkehr Roland Zuest
E N D
Primig lab michael.primig@unibas.ch http://www.bioz.unibas.ch/primig Thomas Aust Roopa Basavaraj (visiting scientist) Michel Bellis (visiting scientist) Guenda Berthold Philippe Demougin Leandro Hermida Reinhold Koch Ulrich Schlecht Christa Wiederkehr Roland Zuest Bioinformatics I -- Databases Primig lab michael.primig@unibas.ch http://www.bioz.unibas.ch/primig Thomas Aust Roopa Basavaraj (visiting scientist) Michel Bellis (visiting scientist) Guenda Berthold Philippe Demougin Leandro Hermida Reinhold Koch Ulrich Schlecht Christa Wiederkehr Roland Zuest
Primig lab michael.primig@unibas.ch http://www.bioz.unibas.ch/primig Thomas Aust Roopa Basavaraj (visiting scientist) Michel Bellis (visiting scientist) Guenda Berthold Philippe Demougin Leandro Hermida Reinhold Koch Ulrich Schlecht Christa Wiederkehr Roland Zuest Microarray Data Bioinformatics I -- Databases Primig lab michael.primig@unibas.ch http://www.bioz.unibas.ch/primig Thomas Aust Roopa Basavaraj (visiting scientist) Michel Bellis (visiting scientist) Guenda Berthold Philippe Demougin Leandro Hermida Reinhold Koch Ulrich Schlecht Christa Wiederkehr Roland Zuest
Schwede lab Torsten.schwede@unibas.ch http://www.bioz.unibas.ch/schwede Jozef Aerts Juergen Kopp Flavio Monigatti Franziska Roeder Rainer Poehlmann SWISS-MODEL Protein Database Bioinformatics I -- Databases Schwede lab Torsten.schwede@unibas.ch http://www.bioz.unibas.ch/schwede Jozef Aerts Juergen Kopp Flavio Monigatti Franziska Roeder Rainer Poehlmann
Bioinformatics I -- Databases What is a database? How do you make one? Biological Databases Knowledgebases Novel ideas… more Info at http://www.biozentrum.unibas.ch/personal/primig/ Follow the >>>teaching<<< link. What is a database? How do you make one? Biological Databases Knowledgebases Novel ideas… more Info at http://www.biozentrum.unibas.ch/personal/primig/ Follow the >>>teaching<<< link. What is a database? How do you make one? Biological Databases Knowledgebases Novel ideas… more Info at http://www.biozentrum.unibas.ch/personal/primig/ Follow the >>>teaching<<< link.
Bioinformatics I -- Databases What is a database? A database is a structured collection of data Data INPUT >>> Information OUTPUT Data INPUT >>> Information OUTPUT
Bioinformatics I -- Databases What is a relational database? A relational database is a set of tables containing data belonging to defined categories Data INPUT >>> Information OUTPUT
Bioinformatics I -- Databases How do you make one? A relational database management system (RDBMS) lets you construct, update, and administrate a relational database. An RDBMS takes Structured Query Language (SQL) statements entered by a user and creates, updates, or provides access to the database.
Bioinformatics I -- Databases RDBMS Open Source: mySQL | PostgreSQL Commercial: IBM-DB2 | Oracle
Bioinformatics I -- Databases Accessing relational databases You also need a Graphical User Interface (GUI). PHP (recursive acronym for "PHP: Hypertext Preprocessor") is a widely-used Open Source general-purpose scripting language that is especially suited for Web development and can be embedded into HTML Perl is derived mostly from the C programming language. Perl's process, file, and text manipulation facilities make it particularly well-suited for tasks involving e.g. database access, graphical programming, and world wide web programming.
Bioinformatics I -- Databases How do you make one? • Database Model: • Analyse aims (submission/curation system) • Define entities = tables (user, submission) • Define attributes (name, phone, email) • Define relationships between entities (user makes submission) • Draw diagram
Bioinformatics I -- Databases New Assign Submission GeO Curate Submission Author Curator Author Author Delete Revision Delete Publication Accepted Rejected Revise GeO GeO Author Curator GeO Curate Revision Assign Revision GeO Deleted GeO GeO Christa Wiederkehr
Bioinformatics I -- Databases How do you make one? • Database Model: • Analyse aims (submission/curation system) • Define entities = tables (user, submission) • Define attributes (name, phone, email) • Define relationships between entities (user makes submission)
Orf #orf_id #nomenclature_id #orf_name Submitstate #submitstate_id *submitstate Term #term_id *name *term_type Termassign #go_acc *submission_id *ontology User #user_id *name *email *login *password *lab_id *user_role_id Submission #submission_id *title *description °submitstate_id °orf_id *user_id °curator_id Reference #reference_id *title *authors *journal *pubmed °url_pdf *submission_id User_role #user_role_id *user_role Comment #comment_id *text *submission_id *user_role_id Bioinformatics I -- Databases Christa Wiederkehr
Bioinformatics I -- Databases Biological Databases: DNA DNA Sequence Data EBI: http://www.ebi.ac.uk/ NCBI:http://www.ncbi.nlm.nih.gov/ DDBJ:http://www.ddbj.nig.ac.jp/
Bioinformatics I -- Databases Global data synchronization
Mouse Rat Human Bioinformatics I -- Databases EBI – EMBL Release 72 contains 18,324,246 sequence entries comprising 23,090,186,146 nucleotides
Bioinformatics I -- Databases Biological Databases: DNA DNA Sequence Datasubmission at http://www3.ebi.ac.uk/Services/webin/Sbm.cgi
Bioinformatics I -- Databases Biological Databases: proteins Protein Structure Data Protein Databank (PDB) at http://www.rcsb.org/pdb/ Search 17’107 Petide, Protein and Virus Structures
Bioinformatics I -- Databases Biological Databases: proteins Protein Structure Data Submission at http://deposit.pdb.org/adit/
Bioinformatics I -- Databases Biological Databases: compounds Small Molecules Klotho DB: Biochemical Compounds Declarative Database at http://www.biocheminfo.org/klotho/ LIGAND DB at http://www.genome.ad.jp/kegg/catalog/compounds.html
Bioinformatics I -- Databases Biological Databases: RNA • Expression data - RNA • Microarray data repositories • GeneOmnibus (NCBI) at • http://www.ncbi.nlm.nih.gov/geo/ • ArrayExpress (EBI) at • http://www.ebi.ac.uk/arrayexpress/ • MIAME:Minimal Information About a Microarray Experiment
Bioinformatics I -- Databases Biological Databases: RNA • Expression data - RNA • Expression data visualization • Stanford Expression Connection at • http://genome-www4.Stanford.EDU/cgi-bin/SGD/expression/expressionConnection.pl • GermOnline at http://germonline.org • RIKEN mouse at http://read.gsc.riken.go.jp/
Bioinformatics I -- Databases Biological Databases: RNA • Expression data - RNA • Yeast Cell Cycle at http://genome-www.stanford.edu/cellcycle • Human Cell Cycle at http://genome-www.stanford.edu/Human-CellCyle/Hela • Human & Mouse tissue profiling at http://expression.gnf.org
Bioinformatics I -- Databases Biological Databases: proteins • Post-translational data: protein-protein interaction in Yeast • Biochemical studies • Cellzome • BIND • MDS Proteomics • Two-hybrid studies • Curagen’s PathCalling
Bioinformatics I -- Databases Biological Databases: proteins • Post-translational data: protein-protein interaction in Yeast • Biochemical studies • Cellzome at http://yeast.cellzome.com • BIND at http://bind.mshri.on.ca • MDS Proteomics at http://www.mdsp.com • Two-hybrid studies • Curagen’s PathCalling at http://portal.curagen.com Access the data through http://germonline.bioz.unibas.ch and click on S. cerevisiae. Search for any gene, e.g. SPO11 and go to the Protein/Proteome Information section of the Locus Report page.
Bioinformatics I -- Databases Biological Databases: literature Pubmed contains the abstracts of peer-reviewed publications in the field of biomedical research http://www.ncbi.nlm.nih.gov/entrez/query.fcgi Scientific Journals are often available online (sometimes even for free)! http://www.ub.unibas.ch/vlib/vbbiol.htm
Bioinformatics I -- Databases Knowledgebases: a common language The GeneOntology project: http://www.geneontology.org The objective of GO is to provide controlled vocabularies for the description of gene products. These terms are to be used as attributes of gene products by collaborating databases, facilitating uniform queries across them. The three organizing principles of GO are molecular function, biological process and cellular component. A gene product has one or more molecular functions and is used in one or more biological processes; it may be, or may be associated with, one or more cellular components. The GeneOntology project: http://www.geneontology.org The objective of GO is to provide controlled vocabularies for the description of gene products. These terms are to be used as attributes of gene products by collaborating databases, facilitating uniform queries across them. The three organizing principles of GO are molecular function, biological process and cellular component. A gene product has one or more molecular functions and is used in one or more biological processes; it may be, or may be associated with, one or more cellular components.
Bioinformatics I -- Databases Knowledgebases: a common language • The GeneOntology Evidence Code:http://www.geneontology.org/doc/GO.evidence.html • IC inferred by curator (no evidence but reasonable) • IDA inferred from direct assay (enzyme, EMSA) • IEA inferred from electronic annotation (BLAST hit) • IEP inferred from expression pattern (RNA, Protein) • IGI inferred from genetic interaction (suppressors, synthetic lethals, complementation) • IMP inferred from mutant phenotype (deletion, insertion) • IPI inferred from physical interaction (co-IP, 2-hybrid) • ISS inferred from sequence or structural similarity (homolog) • NAS non-traceable author statement (quote cannot be found) • ND no biological data available • TAS traceable author statement • NR not recorded
Bioinformatics I -- Databases Biological Databases: GO based species specific db’s • Annotation: covers knowledge from Genetics, Molecular Biology and Functional genomis • SGD for S. cerevisiae • http://genome-www.stanford.edu/Saccharomyces/ • TAIR for A. thaliana • http://www.arabidopsis.org/ • Wormbase for C. elegans • http://www.wormbase.org • Flybase for D. melanogaster • http://flybase.bio.indiana.edu/ • Mouse Genome Database for M. musculus • http://www.informatics.jax.org
Bioinformatics I -- Databases Knowledgebases: Swissprot >>> Uniprot Release 40.31 of 25-Oct-2002 of SWISS-PROT contains 116776 sequence entries, comprising 42881496 amino acids abstracted from 100002 references.
Bioinformatics I -- Databases Knowledgebases: Swissprot >>> Uniprot • KEY FEATURES • Minimal redundancy: data from different sources are merged; if conflicts exist between various sequencing reports, they are indicated in the feature table of the corresponding entry. • Annotation: • Function(s) of the protein • Post-translational modification(s). For example carbohydrates, phosphorylation, acetylation, GPI-anchor, etc. • Domains and sites. For example calcium binding regions, ATP-binding sites, zinc fingers, homeobox, kringle, etc. • Secondary structure • Quaternary structure. For example homodimer, heterotrimer, etc. • Similarities to other proteins • Disease(s) associated with deficiencie(s) in the protein • Sequence conflicts, variants, etc. • Integration • Swissprot is currently links to about 60 external databases (list at http://www.expasy.org/cgi-bin/lists?dbxref.txt)
Bioinformatics I -- Databases Knowledgebases: Swissprot >>> Uniprot In SWISS-PROT, information is given in the comment lines (CC), in the feature table (FT) and in the keyword lines (KW). Most comments are classified by `topics'; this approach permits the easy retrieval of specific categories of data from the database. ID SP11_YEAST STANDARD; PRT; 398 AA. AC P23179; CC -!- FUNCTION: REQUIRED FOR MEIOTIC RECOMBINATION. MEDIATES DNA CC CLEAVAGE THAT FORMS THE DOUBLE-STRAND BREAKS (DSB) THAT INITIATE CC MEIOTIC RECOMBINATION. CC -!- SUBCELLULAR LOCATION: Nuclear. CC -!- DEVELOPMENTAL STAGE: MEIOSIS-SPECIFIC. CC -!- SIMILARITY: BELONGS TO THE TOP6A FAMILY. FT ACT_SITE 135 135 DNA CLEAVAGE (PROBABLE). FT MUTAGEN 135 135 Y->F: LOSS OF ACTIVITY. KW Hydrolase; DNA-binding; Sporulation; Meiosis; Nuclear protein.
Bioinformatics I -- Databases Novel ideas… A database that contains large-scale automatic structure predicitons: SWISS-MODEL repository Models from SWISS-MODEL server and non-curated external sources will be available.
Bioinformatics I -- Databases Novel ideas… The SWISS-MODEL server at http://www.expasy.org/swissmod/ is an automated modelling system that serves all scientist as a tool to study the putative 3D structure of a protein using Comparative Modelling.
Bioinformatics I -- Databases Novel ideas… The GermOnline server at http://germonline.bioz.unibas.ch http://germonline.org is a platform for online submission/curation that enables scientist who work in the field of meiosis and gametogenesis to create, update and curate a knowledgebase that uses controlled vocabulary (GO) and free text to describe the roles of genes in sexual reproduction.
Bioinformatics I -- Databases Major DB info EBI: http://www.ebi.ac.uk/Databases Nucl. Acid Res. 2002 http://nar.oupjournals.org/content/vol30/issue1/ GermOnline http://germonline.unibas.ch Primig lab http://www.bioz.unibas.ch/personal/primig/ follow the teaching link, check out literature & info, download ppt presentation db’s. Life Sciences Training Facility http://www.bioz.unibas.ch/corelab: you will find more links on bioinformatics
Bioinformatics I -- Projects We would like to collaborate with you on our ongoing GermOnline project. You will be asked to use online sources (species-specific and general knowledgebases, Pubmed) to collect information about the genomes of S. pombe, A. thaliana, C. elegans, D. melanogaster, M. musculus and H. sapiens. This information should be presented in a concise paragraph like the one written by Peter Philippsen for the genome of S. cerevisiae (click on S. cerevisiae and follow the more link in the Genome Information section). You should include two complete references. Furthermore we ask you to search for knowledge about a list of conserved genes important for meiosis and gametogenesis. You are asked to identify the homologs and orthologues and provide curated information about the yeast genes DMC1, MLH3, MRE11, MSH4, MSH5andSPO11. Your search should include literature, knowledgebases and protein structures. More info at http://www.biozentrum.unibas.ch/personal/primig/teaching/bioinfo_I_literature.html The information you provide will be integrated into GermOnline by Ulrich Schlecht. You will be credited for your contribution. The results you produce will be recorded and (if everything works out) they count for the exam. We look forward to getting your feedback.