1 / 22

What is Bioinformatics?

What is Bioinformatics?. Bioinformatics : collection and storage of biological information Computational biology : development of algorithms and statistical models to analyze biological data. Jobs for bioinformaticians. Databases make biological data available to scientists.

mirit
Download Presentation

What is Bioinformatics?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is Bioinformatics? • Bioinformatics: collection and storage of biological information • Computational biology: development of algorithms and statistical models to analyze biological data

  2. Jobs for bioinformaticians

  3. Databases make biological data available to scientists • As biology has increasingly turned into a data-rich science, the need for storing and communicating large datasets has grown tremendously. • Nucleotide, protein sequences • Protein structure • Expression data • Gene/protein networks

  4. Nucleotide Databases • EMBL www.ebi.ac.uk/embl/ • The EMBL (European Molecular Biology Laboratory) nucleotide sequence database is maintained by the European Bioinformatics Institute (EBI) in Hinxton, Cambridge, UK.

  5. Nucleotide Databases cont. • GenBank: maintained by the National Center for Biotechnology Information (NCBI); contains Entrez for accession to nucleotides, proteins, annotations, etc. www.ncbi.nlm.nih.gov/Genbank/ • UniGene: a non-redundant set of gene-oriented clusterswww.ncbi.nlm.nih.gov/UniGene/

  6. Protein Databases • SWISS-PROT: SWISS-PROT is a protein sequence database to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.www.expasy.ch/sprot/

  7. Protein Databases • PIR http://pir.georgetown.edu/ -The Protein Information Resource (PIR) is a division of the National Biomedical Research Foundation (NBRF) in the US. It is involved in a collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japanese International Protein Sequence Database (JIPID). Release 67.00 (31 Dec 2000) contains 198,801 entries.

  8. Sequence Motif Databases • Pfam www.sanger.ac.uk/Software/Pfam/ • Pfam is a database of protein families defined as domains (contiguous segments of entire protein sequences). For each domain, it contains a multiple alignment of a set of defining sequences (the seeds) and the other sequences in SWISS-PROT that can be matched to that alignment.

  9. 3D-Structure Databases • PDB www.rcsb.org/pdb/ -The PDB is the main primary database for 3D structures of biological macromolecules determined by X-ray crystallography and NMR. Structural biologists usually deposit their structures in the PDB on publication, and some scientific journals require this before accepting a paper. It also accepts the experimental data used to determine the structures.

  10. How to get sequences? • Entrez Database provides nucleotide and protein sequences in different formats. • One of the formats is FASTA

  11. FASTA FORMAT • Each sequence begins with a description line ‘>’

  12. A protein in FASTA format >HBA_ALLMI VLSMEDKSNVKAIWGKASGHLEEYGAEALEMFCAYPQTKIYFPHFDMSHNSAQIRAHGKKVFSALHEAVNHIDDLPGALCRLSELHAHSLRVDPVNFKFLAHCVLVVFAIHHPSALSPEIHASLDKFLCAVSAVLTSKYR • The first line is the description line, starts with a character '>' shows that the description line of a sequence follows the string following the '>' and ending at the first space (' ') is the sequence id (HBA_ALLMI).

  13. A DNA sequence in Fasta >X sequence ATGAATAGCACAGAGAGACCAAGAGAGAGAGAGAGACCCAGATATATCAGATAGAGA

  14. Why align sequences? • Find evolutionary relationship between species and/or genes. • Identify novel genes and define similar genes in other species. • Study genomes and how they change.

  15. Sequence Alignment • Homology means that two (or more) sequences have a common ancestor. • An example to sequence alignment Sequence 1 Sequence 2

  16. CLUSTALW: A software for aligning sequences http://www.ebi.ac.uk/clustalw/

  17. Genome Databases • www.ensembl.org

  18. Genome Databases: Gene Prediction • Define the location of genes (coding sequences, regulatory regions) • Gene prediction using software based on rules and patterns. Find Open Reading Frames (ORFs), with additional criteria for good start sequence for a gene. • Gene identification through alignment with known proteins and EST sequences (Expressed Sequence Tags; mRNA sequences). • Gene prediction through similarity with proteins or ESTs in other organisms. • Gene prediction through comparison with other genomes; conserved regions are probably coding or regulatory regions.

  19. Genome Databases: Annotation • Annotation of the genes: Compare with genes/proteins of known function in other organisms. • Functional classification. Broad groups of functional characterization, such as 'ribosomal proteins', 'nucleotide metabolism', 'signal transduction'.

  20. Genome Databases: Evolution • Evolutionary history • Genome duplications • Gene loss

  21. Transcription Databases • Microarrays can analyze 1000s of transcripts simultaneously. • Allow analysis of genes that are high or low in expression between normal and disease, for example. • Microarray Databases contain expression data (large amounts). • Stanford Microarray Database:

  22. Signaling & Metabolic Pathways • Analyze how genes/proteins interact and learn about function of genes • KEGG: Kyoto Encyclopedia of Genes and Genomes • http://www.genome.ad.jp/kegg/

More Related