520 likes | 640 Views
Lecture #4 : Comparing genes. 9/14/09. This week. Homework #2 due on Wed Email with questions Email me answers or hand in in class Wed - I will be at Dept of Biology retreat Lecture will be given by Kelly O’Quin - expert in phylogenetics
E N D
Lecture #4 : Comparing genes 9/14/09
This week • Homework #2 due on Wed • Email with questions • Email me answers or hand in in class • Wed - I will be at Dept of Biology retreat • Lecture will be given by Kelly O’Quin - expert in phylogenetics • He will go over homework so it must be done before class
Questions for today 0. More BLAST • Where do we get high quality gene sequences? • How do genes evolve? • How do we compare genes?
How to find genes • Start with genes which are known from model organisms • Use these to pull out genes from genomes • Compare genes to learn about sensory evolution
Blast - Genbank • What database do you want to search? • What do you want to compare? • What program do you want to do the searching?
Defaults Database Program Confirm
Nucleotide BLAST = DNA nucleotide query vs nucleotide database
Choices for programs • Megablast Highly similar sequences >95% • Word length 28 • Discontiguous megablast • Pretty similar seqs • Word length 11 • Blastn Dissimilar seqs • Word length 11
BLAST a genome Request ID AWJ4D4B7012
BLASTing is fun • This is meant to be enjoyable • Be a genome explorer • Find out what kind of data is out there • Find out what kind of data isn’t there • QUESTIONS?????
Q1. • There is so much data in Genbank. How do you find GOOD data? • Example • Bovine rhodopsin - 1st G protein coupled receptor to be sequenced • Search Genbank with text • 49 entries
Searching for genes • Searching by text is fraught with peril • Genbank has too many links • Pull up many things that are not what you want • BLAST is better approach • NCBI has also made records which combine all similar sequences into one
NCBI has done some of the work • They have hand-curated data for some species to make a set of reference sequences • Nucleotide sequences - NMxxxxxxx • Protein sequences - NPxxxxxx • For human rhodopsin • NM000539 • NP000530 • These are the gold standard for sequences
Homologs • Two genes which arise in the common ancestor of two organisms and are passed down • Implies genes perform same function in two organisms • Therefore they can be compared to learn about evolution
These 4 primates have many genes which are homologs and have been passed down from primate ancestor Human Chimp Macaque Bushbaby
Location Orthologues are predicted and linked Links to transcript and protein
Good places to find genes • Model organisms: NCBI homologene • Genes from models and other organisms: Sanger Ensembl gene families • NOTE: These are often predicted from genome sequences • If there is a sequence in NCBI homologene, it may be different (and more accurate) than Sanger predictions • OMIM is a good reference
Q2. How do genes change through time? • Change in actual sequence • Mutation • Recombination • Change in frequency of a sequence • Selection - “survive” better • Drift - get passed on by chance • Migration - move between populations
Mutation vs selection • Mutation = sequence change • ATGCCGTGACGT • ATGCCTTGACGT • Selection/drift/migration = sequence frequency changes across a number of individuals • ATGTG ATGTG ATGTG ATGTG ATGTG ATGTG • ATGTG ATGTG ATGTG ATGTG ATGTG ATGTT • ATGTG ATGTG ATGTG ATGTT ATGTT ATGTT • ATGTG ATGTG ATGTG ATGTT ATGTT ATGTT • ATGTT ATGTG ATGTG ATGTT ATGTT ATGTT
Evolution as tinkerer • Changes are typically small • Mutation is source of new sequence • Not all mutations are created equal • Some occur more often than others • Other forces shift frequency of particular sequence
Mutation causes nucleotide change • What about AA sequence? • Synonymous change • Syn = same • AA stays same • Nonsynonymous change • Not same • AA changes
Amino acid (AA) types • Non-polar A, F, G, I, L, M, P, V, W • Polar N, Q, S, T, Y • Charged, + H, K, R • Charged, - D, E • Other C Often changing AA within a group does not affect protein function
Selection • Stabilizing selection - Acts to keep protein function the same • Synonymous change more frequent than nonsynonymous • Amino acid changes occur within group much more common than between • Non polar nonpolar • Polar polar
Similarity matrix A = alanine C = cysteine D = aspartic acid E = glutamic acid F = phenylalanine G = glycine H = histidine
Comparing sequences • Can do at either nucleotide or AA level • Gather sequences from a bunch of different organisms • Need to align them so that sites which perform the same function can be compared
Aligning sequences • Sequences may differ in length • Often have differences at amino- or carboxy- terminus of the protein • Need a way to align parts of protein that are performing the same function
Example - RH2 opsin in fishes Goldfish MNGTEGNNFYVPLSNR Medaka MENGTEGKNFYIPMNNR Zebrafish MNGTEGSNFYIPMSNR Killifish MGYGPNGTEGNNFYIPMSNK Trout MQNGTEGSNFYIPMSNR Halibut MVWDGGIEPNGTEGKNFYIPMSNR Cod MRMEANGTEGKNFYIPMSNR Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR