REMINDERS

REMINDERS • 2nd Exam on • Coverage: • Central Dogma of DNA • Replication • Transcription • Translation • Recombinant DNA technology and molecular biology • Protein analysis

BIOINFORMATICS

BIOINFORMATICS • Study of the structure of biological information and biological systems • Integrates theories and tools of mathematics/statistics, computer science and information technology • Involves the use of hardware and software to study vast amounts of biological data

What is Bioinformatics? • the field of science in which biology, computer science, and information technology merge to form a single discipline • application of information technology to the storage, management and analysis of biological information • facilitated by the use of computers

FUNCTIONS • Data Management • Storage • Retrieval • Data Analysis *Literature/Bibliography, Sequence, Structure, Taxonomy, Expression, etc.

BIOLOGICAL DATABASES • Systematic data storage/retrieval • Maintained on a regular basis • Can contain various types of data (integration) • Sequence • Structure • Other pertinent information • Nucleic acids and proteins are most common

DATABASES • a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system • Biological databases consist usually of the nucleic acid sequences of the genetic material of various organisms as well as protein sequences and structures

DATABASES • e.g. nucleotide sequence database typically contains information such as • contact name • the input sequence with a description of the type of molecule • the scientific name of the source organism from which it was isolated • additional requirements • easy access to the information • a method for extracting only that information needed to answer a specific biological question

DATABASES • Sequence • GenBank, European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ); managed by the International Nucleotide Sequence Database Collaboration (INSDC) • UniGene • Saccharomyces Genome Database (SGD) • UniProtKB (UniProtKB/Swiss-Prot or UniProt/TrEMBL) • ExPASy

DATABASES • Structure • Nucleic Acid Database (NDB) • Protein Data Bank (PDB) • Worldwide Protein Data Bank (wwPDB) • ExPASy

DATA MINING • Process by which testable hypotheses are created regarding function/structure of gene/protein of interest through identifying similar sequences in “more established” organisms • Tools: • Text-term search • Sequence similarity search

Machine Learning • Studies methods and the design of computer programs based on past experience • Why? • New methods are being introduced • Old ones should be improved

“Units” of Information • DNA (genome) • RNA (transcriptome) • Protein (proteome)

What is Being Analyzed? • Sequence • Structure • Interactions • Pathways • Mutations/Evolutions

Why? • Increasing amount of biological information entails • Organization • Archiving • Global unification/harmonization • More biological discoveries • Functional/Structural similarities • Phylogenetic/Evolutionary patterns

Applications • Medicine • Pharmaceuticals • Biotechnology • Agriculture

STRUCTURE DATABASES

Molecular Data • When you draw a molecule, • You start with atoms • Then proceed with the structure • And the three-dimensional data • What can be stored? • Coordinates • Sequences • Chemical graphs • Atoms and bonds

Databases • Protein Data Bank (PDB) • Molecular Modeling Database (MMDB)

Techniques in the Laboratory • X-ray Crystallography • Nuclear Magnetic Resonance

Formats • PDB • mmCIF • MMDB

Structure Viewers • Cn3D • RasMol • WebMol • Mage • VRML • CAD • Swiss PDB Viewer

Promises of bioinformatics • Medicine • Knowledge of protein structure facilitates drug design • Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up • Genome analysis allows the targeting of genetic diseases • The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated • The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

Challenges in bioinformatics • Explosion of information • Need for faster, automated analysis to process large amounts of data • Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…) • Need for “smarter” software to identify interesting relationships in very large data sets • Lack of “bioinformaticians” • Software needs to be easier to access, use and understand • Biologists need to learn about the software, its limitations, and how to interpret its results

SEQUENCE ALIGNMENT

Two or More Sequences • Measure similarity • Determine correspondences between residues • Find patterns of conservation • Derive evolutionary relationships

Alignment • Correspondences of nucleotides/amino acids in two sequences or more are assigned • An assignment of correspondences that preserves the order of the residues within the sequences is an alignment • Gaps are used to achieve this • Sequence alignment refers to the identification of residue-residue correspondences

Uses • Homology • Similarities • “Ancestry” • Genome annotation • Assigning structure and function to genes • Database queries • For newly-discovered/unknown sequences

Tools • Dot Plots • Diagonal lines of dots showing similarities between two sequences • Scoring Matrices • Score reflects quality of each possible alignment; best possible score is identified • Scoring scheme is crucial • PAM (Point Accepted Mutations) and BLOSUM (BLOCKS Substitution Matrix) • Dynamic Programming • Algorithmic technique that reuses previous computations

Scoring • Penalties/Scores • Match (e.g. A – A) • Mismatch (e.g. A C) • Gap (e.g. A _) • Linear Gap Penalty: Uniform • Affine Gap Penalty: Gap Existence vs. Gap Extension

Local vs. Global Alignments • Global Alignment • Similarities between majority of two sequences • Local Alignment • Similarities between specific parts of two sequences

Programs Pairwise Sequence Alignment • BLAST • VAST • FASTA Multiple Sequence Alignment • MAFFT

Needleman-Wunsch Algorithm • Can be used for global and alignments • Maximum-value function • A simple scoring scheme is assumed Three steps • Initialization • Matrix fill (scoring) • Traceback (alignment)

REMINDERS

REMINDERS

Presentation Transcript

Reminders

Reminders

Reminders

Reminders:

REMINDERS

Reminders

Reminders!

Reminders:

Reminders

Reminders

Reminders

REMINDERS!!

Reminders

Reminders

REMINDERS

Reminders

Reminders

Reminders

Reminders

Reminders

Reminders

Reminders!!