1 / 9

Homology and Homologs

Homology and Homologs. Homology just means sequence similarity by virtue of a common evolutionary ancestor. > gi|24640218|ref|NP_572350.2|    CG3126-PA, isoform A [Drosophila melanogaster] Length=1571 Score = 427 bits (1098), Expect = 6e-118

presta
Download Presentation

Homology and Homologs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Homology and Homologs Homology just means sequence similarity by virtue of a common evolutionary ancestor. >gi|24640218|ref|NP_572350.2|   CG3126-PA, isoform A [Drosophila melanogaster] Length=1571 Score = 427 bits (1098), Expect = 6e-118 Identities = 223/415 (53%), Positives = 297/415 (71%), Gaps = 19/415 (4%) Frame = +2 Query 1901 SLVDHNEIMAKLTLKQEGDDGPDVRGGSGDILLVHATETDRKDLVLYFEAFLTTYRTFIT 2080 ++++ I L LK+ +DGP+V+GG D L+VHA+ + + EAF+TT+RTFI Sbjct 1151 NMLEEVNITRYLILKKREEDGPEVKGGYIDALIVHASRVQKVADNAFCEAFITTFRTFIQ 1210 Query 2081 PEELIQKLQYRYERF-CHFQDTFKQRVSKNTFFVLVRVVDELCLVEMTDEILKLLMELVF 2257 P ++I+KL +RY F C QD KQ+ +K TF +LVRVV++L ++T ++L LL+E V+ Sbjct 1211 PIDVIEKLTHRYTYFFCQVQDN-KQKAAKETFALLVRVVNDLTSTDLTSQLLSLLVEFVY 1269 Query 2258 RLVCKGELSLARILRKNILEKV---ENKRMLHHANS—-ALKPLAARGVAARPG------- 2401 +LVC G+L LA++LR +EKV + ++ + G+A G Sbjct 1270 QLVCSGQLYLAKLLRNKFVEKVTLYKEPKVYGFVGELGGAGSVGGAGIAGSGGCSGTAGG 1329 Query 2402 ----TLHDFHSLEIAEQLTLLDAELFYKIEIPEVLLWAKEQNEEKSPNLTQFTEHFNNMS 2569 +L D SLEIAEQ+TLLDAELF KIEIPEVLL+AK+Q EEKSPNL +FTEHFN MS Sbjct 1330 GNQPSLLDLKSLEIAEQMTLLDAELFTKIEIPEVLLFAKDQCEEKSPNLNKFTEHFNKMS 1389 Query 2570 YWVRSIIMLQEKAQDRERLLLKFIKIMKHLRKLNNFNSYLAILSALDSAPIRRLEWQKQT 2749 YW RS I+ + A++RE+ + KFIKIMKHLRK+NN+NSYLA+LSALDS PIRRLEWQK Sbjct 1390 YWARSKILRLQDAKEREKHVNKFIKIMKHLRKMNNYNSYLALLSALDSGPIRRLEWQKGI 1449 Query 2750 SEGLAEYCTLIDSSSSFRAYRAALAEVEPPCIPYLGLILQDLTFVHLGNPDHID-GKVNF 2926 +E + +C LIDSSSSFRAYR ALAE PPCIPY+GLILQDLTFVH+GN D++ G +NF Sbjct 1450 TEEVRSFCALIDSSSSFRAYRQALAETNPPCIPYIGLILQDLTFVHVGNQDYLSKGVINF 1509 Query 2927 SKRWQQFNILDSMRRFQQVHYEIRRNDEIISFFNDFSDHLAEEALWELSLKIKPR 3091 SKRWQQ+NI+D+M+RF++ Y RRN+ II FF++F D + EE +W++S KIKPR Sbjct 1510 SKRWQQYNIIDNMKRFKKCAYPFRRNERIIRFFDNFKDFMGEEEMWQISEKIKPR 1564 These two sequences, my Xenopus query sequence and the matching Drosophila sequence, show strong (and variable) homology, but even if we knew the function of the Drosophila gene it may not tell us much about the function of the Xenopus gene.

  2. A A A Genes and Evolution - I Gene duplication though speciation The two copies of Gene A will now evolve independently, but will continue to have the same function They are ORTHOLOGS

  3. A A A A A’ Genes and Evolution - II The two copies of Gene A will now evolve independently, but will probably not continue to have exactly the same function Gene duplication though internal genome duplication They are PARALOGS

  4. Homologs, orthologs & paralogs http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html

  5. Mutation and Evolution Translated part of mRNA sequence Ancestral sequence ATGAAGGCTGCCTACGACTGCCGTGCCAGAATGCTGAGG  MKAAYDCRARMLR In species A ATGAAGGCTGCCTATGACTGCCGTGCCAGAATGCTGAGG  MKAAYDCRARMLR ATGAATGCTGCCTATGACTGCCGTGCCAGAATGCTGAGG  MNAAYDCRARMLR ATGAATGCTGCCTATGACTGCCGTGCCAGAATGCTAAGG  MNAAYDCRARMLR ATGAATGCTGCCTATGACTGCCGTG GAATGCTAAGG MNAAYDCR GMLR ATGAATGCAGCCTATGACTGCCGTG GAATGCTAAGG MNAAYDCR GMLR ATGAATGCAGCCTATGATTGCCGTG GAATGCTAAGG MNAAYDCR GMLR ATGAATGCAGCCTATGATTGCCGAG GAATGCTAAGG MNAAYDCRGMLR In species B ATGAAGGCTGCCTACGACTGCCGTGCCATAATGCTGAGG  MKAAYDCRAIMLR ATGAAGGCCGCCTACGACTGCCGTGCCATAATGCTGAGG  MKAAYDCRAIMLR ATGAAGGCCGCCTACGACTGTCGTGCCATAATGCTGAGG  MKAAYDCRAIMLR ATGAAGGCCGCCTACGACTGTCGTGCCATAATGCTGAGA MKAAYDCRAIMLR ATGAAGGCCGCCTACGACTGTCGTGCCATAATCCTGAGA MKAAYDCRAIILR ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA MKAAYDCRAIILR ATGAATGCAGCCTATGATTGCCGAG---GAATGCTAAGG MNAAYDCR-GMLR ||||| || || || || || || | ||| || | | | |||||| +|| ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA MKAAYDCRAIILR

  6. Searching for Similarity amino acid comparison DNA comparison ATGAATGCAGCCTATGATTGCCGAG---GAATGCTAAGG MNAAYDCR-GMLR ||||| || || || || || || | ||| || | | | |||||| +|| ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA MKAAYDCRAIILR The DNA sequence can change while the amino acid sequence stays the same, so always look for similarities by comparing amino acid sequences. We note that evolution causes sequence to change, by substitution, insertion or deletion, but not usually by small-scale re-ordering. So we need a tool which will find the ‘alignment’ between the two sequences which shows the greatest degree of similarity while introducing the fewest gaps as possible.

  7. The Downside of Gaps Take two random sequences, with no ‘real’ similarity: GACACTAGGTCGATGCGTGGTGGCGAGA ACGCATCCGGATGTGCACCGTGGAACTG And allow cost free gaps: GAC--ACT----AGGTCGATGC---GTGG---TGGCGAGA || | | | | | ||| |||| || ACGCA-TCCGGA--T-G-TGCACCGTGGAACTG Clearly, although the alignment has no mismatches, it is obviously not biologically meaningful! The introduction of gaps into alignments must ideally reflect biological possibilities, but this is rather difficult. So the tendency is to make gaps ‘expensive’, and introduced only when they make more long range matching happen than they introduce ‘un’-matching, e.g. TTCCCAACTCTCCTCTTTCACCATGAAGCTCAAGGACAGATTCCACTCGCCCCAAAATCAAGCTCACCCCGTCCAAGAA | || | || |||||||||||||||||||| ||||||||| ||| ||| | ||| | | | TTCCCACCTCTCCTCTTTGCACCATGAAGCTCAAGGACAAATTCCACTCCCCCAAAATCAAGCGCACCCCGTCCCAGAA TTCCCAACTCTCCTCTTT=CACCATGAAGCTCAAGGACAGATTCCACTCGCCCCAAAATCAAGCTCACCCCGTCCAAGAA |||||| ||||||||||| |||||||||||||||||||| ||||||||| |||||||||||||| |||||||||| |||| TTCCCACCTCTCCTCTTTGCACCATGAAGCTCAAGGACAAATTCCACTC=CCCCAAAATCAAGCGCACCCCGTCCCAGAA

  8. The Essential Task Basically what we are trying to do, is to see whether we can work out the function of an unknown gene by comparing its sequence with those of genes in other species where we already know the function. We can do this because the sequence of most genes is conserved to some extent during evolution of different species. The problem is that while gene function is probably related to both its overall three-dimensional structure and small regions of specific linear sequence, our only serious tool for discerning similarity between proteins is based firmly on long range linear sequence similarity. And there is no obvious requirement on genes to conserve sequence in order to conserve function – it’s just easier that way… But it seems clear that we can only expect this to be effective if we are looking at true ORTHOLOGS.

  9. Finding Orthologs So how do we find orthologs, and can we know when we have? The simplest is Reciprocal Best BLAST, but it implicitly relies on having all the protein sequences of you own organism, and the one you wish to find an ortholog in. database of human proteins database of frog proteins best match human protein frog protein x

More Related