350 likes | 578 Views
Bioinformatics. DataBases. Expression Data. Neucleotide Sequences. Protein Sequences. Structural Data. Protein domains. Metabolic Patways. Disease Links. Computational Biology. Biology. Computer Science. Introduction/Brief History. Protein Database.
E N D
Bioinformatics DataBases Expression Data Neucleotide Sequences Protein Sequences Structural Data Protein domains Metabolic Patways Disease Links Computational Biology Biology Computer Science
Introduction/Brief History Protein Database 1. In 1974, Margret Dayhoff at the National Biomedical Research Foundation (NBRF) devised the concept of the protein family and super-family, defined by sequence similarity, as a means of organizing and classifying proteins. The collection center became Protein Information Resource (PIR). In 1988, PIR became PIR-International as a result of collaborations with NBRF, Munich Center for Protein Sequence (MIPS), and Japan International Protein Information Database (JIPID). 2. In 1986, SWISS-PROTdatabase was founded by Amos Bairoch from the department of medical biochemistry in the University of Geneva. TrEMBL is a computer-annotated suppliment of SWISS-PROT with tranlational data from nucleotide sequences from EMBL. [http://us.expasy.org/sprot/] The data can be accessed through Sequence Retrieve System (SRS). It is maintained at the Swiss Institute for Bioinformatics. A set of matrices (tables) were devised to reflect percent amino acid mutations (PAM) which shows the probability of an amino acid to be mutated to another
Introduction/Brief History PAM-250 Matrix
Introduction/Brief History DNA Database 1. DNA sequence databases were first assembled in Los Alamos National Laboratory (LANL), New Mexico by Walter Goad and colleagues in GenBank database and European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany. In 1979, Goad established GenBank. LANL collected GenBank data until 1992 when GenBank became under National Center for Biotechnology Information (NCBI). It can be accessed through ENTREZ. 2. In 1980, EMBL database was founded. It is maintained by European Bioinformatics Institute (EBI) in Hinxton, Cambridge, UK. It can be accessed by SRS system. 3. In 1984, DNA DataBank of Japan (DDBJ) in Mishima, Japan was founded. 4, Other Databases: * UniGene [www.ncbi.nlm.nih.gov/UniGene/]. * Saccharomyces Genome Database (SGD) [www.stanford.edu/Saccharomyces/]. * EBI Genomes [www.ebi.ac.uk/genomes/]. * Genome Biology [www.ncbi.nlm.nih.gov/Genomes/].
Introduction/Brief History Protein Motifs Database Motifs are short sequences of amino acids that reflect a functional aspect of a protein. It contains domains of proteins such ATP-binding cassette (ABC-domain) or Kinase domain 1. Protein Family database (Pfam). Founded in 1996 and is maintained by consortum of scientists such as Erik Sonnhammer (CGB, KI, Sweden), Sean Eddy (WashU, St Louis USA), Richard Durbin, Alan Bateman and Ewan Birney (Sanger Centre, UK) 2. PROSITE. Amos Bairoch, is part of SWISS-PROT
Introduction/Brief History Macromolecular 3D structures Database “Protein Data Bank (PDB). The primary database for 3D structurs of biological molecules. Started in the 1970s at the Brookhaven Lab on Long Island, New York State, US. In 1999, the management was moved to the Research Collaboratory for Structural Bionformatics (RCSB) “The SCOP (Structural Classification of Proteins) database was started by Alexey Murzin in 1994 (Lab of Molecular Biology, MRC, Cambridge, UK)” “The CATH database (Class, architecure, topology, homologous superfamily) It was started by Christine Orengo in Janet Thornton's lab (University College London) in 1996. “
Introduction/Brief History Metabolic Pathways Database
Gluconate Arabinose Na+/SO42-- Peptides Peptides Zn2+/Fe3+ Fe3+ Arg Sap Art Brn Gln/Na+ Opp ProP Leu/Ile/Val Fec Tyr Ara Pentose phosphate pathway Tdc Yfe Put Spermidine/Putrescine Enterobactin GltS Ribose-5-P NADPH Glucose-6-P C-0477 Pot Tyr His BfeA N-acetyl-Gluc. ?? Fructose-6-P Ser/Thr Mg2+/Zn2+ Znu Erythrose-4-P LPS + Peptidog-lycan Phospholipids Gly, Ser, Cys Ara Glyceraldehyde-3-P Na+ \//\H+ Phe/Tyr/Trp Pro/Na+ Lys Nha Chorismate Asp Ile Thr PEP Met K+ Oxaloacetate EtOH Asn Amino acids Val/Leu Pnu Acetate Trk Pyruvate Ala Pro Lactate a-ketoglutarate Glu Gln Fatty acid synthsis Molybdate Acetyl-CoA Trk Arg Heme synthesis Pantothenate CoA Pi PanF Xyl Pit NADH/ NADPH Purine/pyrimidine Salvage pathway Rbs Nicotinate Nup PTS Mgl Uridine Mal Sugar Pantothenate Nucleotides Pgt Ura Ato Ybd Xylose Galactose Maltose Phosphoglyceate Nicotinate C4-dicarboxilate Ribose Short-chain fatty acids
Organization of GenBank:Traditional Divisions Records are divided into 17 Divisions. • 11 Traditional • 6 Bulk PRI (28) Primate PLN (13) Plant and Fungal BCT (11) Bacterial and Archeal INV (7) Invertebrate ROD (15) Rodent VRL (4) Viral VRT (7) Other Vertebrate MAM (1) Mammalian PHG (1) Phage SYN (1) Synthetic (cloning vectors) UNA (1) Unannotated • Traditional Divisions: • Direct Submissions • (Sequin and BankIt) • Accurate • Well characterized Entrez query: gbdiv_xxx[Properties] From www.ncbi.nlm.nih.gov
Organization of GenBank:Bulk Divisions Records are divided into 17 Divisions. • 11 Traditional • 6 Bulk EST (355) Expressed Sequence Tag GSS (132) Genome Survey Sequence HTG (62) High Throughput Genomic STS (5) Sequence Tagged Site HTC (6) High Throughput cDNA PAT (17) Patent • BULK Divisions: • Batch Submission • (Email and FTP) • Inaccurate • Poorly characterized Entrez query: gbdiv_xxx[Properties] From www.ncbi.nlm.nih.gov
Other NCBI Databases dbSNP:nucleotide polymorphism Geo:Gene Expression Omnibus microarray and other expression data Gene:gene records Unifies LocusLink and Microbial Genomes Structure:imported structures (PDB) Cn3D viewer, NCBI curation CDD:conserved domain database Protein families (COGs) Single domains (PFAM, SMART, CD) From www.ncbi.nlm.nih.gov
Pase your protein sequence here BLAST: Sequence Similarity Searches From www.ncbi.nlm.nih.gov
File Formats of theSequence Databases Each sequence is represented by a text record called a flat file. • GenBank/GenPept (useful for scientists) • FASTA (the simplest format) • ASN.1 & XML (useful for programmers) From www.ncbi.nlm.nih.gov
Header Feature Table Sequence A TraditionalGenBank Record LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 VERSION AY182241.2 GI:32265057 KEYWORDS . SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004) REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi:27804758. FEATURES Location/Qualifiers source 1..1931 /organism="Malus x domestica" /mol_type="mRNA" /cultivar="'Law Rome'" /db_xref="taxon:3750" /tissue_type="peel" gene 1..1931 /gene="AFS1" CDS 54..1784 /gene="AFS1" /note="terpene synthase" /codon_start=1 /product="(E,E)-alpha-farnesene synthase" /protein_id="AAO22848.2" /db_xref="GI:32265058" /translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN" ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 241 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt // The Flatfile Format From www.ncbi.nlm.nih.gov
The Header LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 VERSION AY182241.2 GI:32265057 KEYWORDS . SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004) REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi:27804758. From www.ncbi.nlm.nih.gov
Length Division Molecule type Locus name Modification Date Header: Locus Line LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 VERSION AY182241.2 GI:32265057 KEYWORDS . SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004) REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi:27804758. LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 From www.ncbi.nlm.nih.gov
Header: Database Identifiers LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 VERSION AY182241.2 GI:32265057 KEYWORDS . SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004) REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi:27804758. • Accession • Stable • Reportable • Universal ACCESSION AY182241 VERSION AY182241.2 GI:32265057 Version Tracks changes in sequence GI number NCBI internal use From www.ncbi.nlm.nih.gov
Header: Organism LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 VERSION AY182241.2 GI:32265057 KEYWORDS . SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004) REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi:27804758. SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. NCBI-controlled taxonomy From www.ncbi.nlm.nih.gov
GenPept Identifiers Implied protein The Feature Table FEATURES Location/Qualifiers source 1..1931 /organism="Malus x domestica" /mol_type="mRNA" /cultivar="'Law Rome'" /db_xref="taxon:3750" /tissue_type="peel" gene 1..1931 /gene="AFS1" CDS 54..1784 /gene="AFS1" /note="terpene synthase" /codon_start=1 /product="(E,E)-alpha-farnesene synthase" /protein_id="AAO22848.2" /db_xref="GI:32265058" /translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE NHHFAHLKGMLELFEASNLGFEGEDILDEAKASLTLALRDSGHICYPDSNLSRDVVHS LELPSHRRVQWFDVKWQINAYEKDICRVNATLLELAKLNFNVVQAQLQKNLREASRWW ANLGIADNLKFARDRLVECFACAVGVAFEPEHSSFRICLTKVINLVLIIDDVYDIYGS EEELKHFTNAVDRWDSRETEQLPECMKMCFQVLYNTTCEIAREIEEENGWNQVLPQLT KVWADFCKALLVEAEWYNKSHIPTLEEYLRNGCISSSVSVLLVHSFFSITHEGTKEMA DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN" start (atg) stop (tag) Coding sequence From www.ncbi.nlm.nih.gov
The Sequence: 99.99% Accurate ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 1741 ggacccacat cctgtcttta ctattccaac ctcttgtaaa ctagtactca tatagtttga 1801 aataaatagc agcaaaagtt tgcggttcag ttcgtcatgg ataaattaat ctttacagtt 1861 tgtaacgttg ttgccaaaga ttatgaataa aaagttgtag tttgtcgttt aaaaaaaaaa 1921 aaaaaaaaaa a // From www.ncbi.nlm.nih.gov
GenPept: FASTA format >gi|32265058|gb|AAO22848.2| (E,E)-alpha-farnesene synthase [Malus x domestica] MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWKNDFLDQSLISKYDGDEYRKLSEKLIE EVKIYISAETMDLVAKLELIDSVRKLGLANLFEKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQH GYKVSQDIFGRFMDEKGTLENHHFAHLKGMLELFEASNLGFEGEDILDEAKASLTLALRDSGHICYPDSN LSRDVVHSLELPSHRRVQWFDVKWQINAYEKDICRVNATLLELAKLNFNVVQAQLQKNLREASRWWANLG IADNLKFARDRLVECFACAVGVAFEPEHSSFRICLTKVINLVLIIDDVYDIYGSEEELKHFTNAVDRWDS RETEQLPECMKMCFQVLYNTTCEIAREIEEENGWNQVLPQLTKVWADFCKALLVEAEWYNKSHIPTLEEY LRNGCISSSVSVLLVHSFFSITHEGTKEMADFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIV CYMREVNASEETARKNIKGMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEK GPRTHILSLLFQPLVN >gi|32265070|gb|AAP75563.1| putative doublecortin domain-containing protein MAKTGAEDHREALSQSSLSLLTEAMEVLQQSSPEGTLDGNTVNPIYKYILNDLPREFMSSQAKAVIKTTD DYLQSQFGPNRLVHSAAVSEGSGLQDCSTHQTASDHSHDEISDLDSYKSNSKNNSCSISASKRNRPVSAP VGQLRVAEFSSLKFQSARNWQKLSQRHKLQPRVIKVTAYKNGSRTVFARVTAPTITLLLEECTEKLNLNM AARRVFLADGKEALEPEDIPHEADVYVSTGEPFLNPFKKIKDHLLLIKKVTWTMNGLMLPTDIKRRKTKP VLSIRMKKLTERTSVRILFFKNGMGQDGHEITVGKETMKKVLDTCTIRMNLNLPARYFYDLYGRKIEDIS KGKH From www.ncbi.nlm.nih.gov
Abstract Syntax Notation: ASN.1 Seq-entry ::= set { class nuc-prot , descr { title "Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds." , source { org { taxname "Malus x domestica" , common "cultivated apple" , db { { db "taxon" , tag id 3750 } } , orgname { name binomial { genus "Malus" , species "x domestica" } , mod { { subtype cultivar , subname "'Law Rome'" } , { subtype old-name , subname "Malus domestica" , attrib "(10)cultivar='Law Rome'" } } , lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus" , gcode 1 ,, From www.ncbi.nlm.nih.gov
Type the protein name(s) here Choose a reference organism
Pase your protein sequence here
Select Gram stain Select output format Paste your sequence here