1 / 22

Comparative Genomics of the Eukaryotes

Ishay Ben-Zion. Comparative Genomics of the Eukaryotes. A paper by : Rubin, Yandell, Wortman,…. Motivation. Evolution – Charles Darwin (1838) Similarity between different species Model organisms A human shares 50% of his genes with a banana. How ? Humans and bananas are multi-cellular

charlier
Download Presentation

Comparative Genomics of the Eukaryotes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ishay Ben-Zion Comparative Genomicsof the Eukaryotes A paper by : Rubin, Yandell, Wortman,…

  2. Motivation • Evolution – Charles Darwin (1838) Similarity between different species Model organisms • A human shares 50% of his genes with a banana. How ? • Humans and bananas are multi-cellular • Other Similarities Humans share 23% of their genes with Yeast • Could banana be a good model organism ?

  3. Important requirements: Size Generation Time (for genetic research) Manipulation (genetic and not) Little “Junk DNA” (easy for sequencing) Money Model Organisms Heavily Studied – used as examples for other species Once it is studied enough – It is a good candidate

  4. This paper describes: A comparison between the genomes of 3 Eukaryotes: Eukaryote – Cell has inner structures with membranes (nucleus) 1) A fruit fly - Drosophila melanogaster 2) A worm – C. elegans 3) Yeast – S. cerevisiae Other model organisms (E. coli, mouse, Zebrafish, Arabidopsis)

  5. Taxonomic classification Cellular life Bacteria Archaea Eukaryota Domain: Kingdom: Animalia Protista Plantae Fungi H. influenzae Fly worm yeast Species:

  6. Drosophila melanogaster • Popular model organism (for developmental biology) • A trial for the human genome (sequenced at 2000) • Easily induce mutations

  7. Caenorhabditis elegans • Transparent, 1-mm long • Simple – 959 cells (300 neurons) • Eat, sleep & have sex (or self-fertilize) • Hermaphrodites – 99.95%, Males – 0.05%

  8. Caenorhabditis elegans Good as a model organism for: • Genetics: First multi-cellular sequenced genome • Developmental biology: cell fate mapping • Neurobiology: neurons connectivity map

  9. Saccharomyces cerevisiae • Also called Baker’s yeast • Single-celled • Diameter: 5-10 μ • Popular model organism • Simplest Eukaryote • First Eukaryotic sequenced genome

  10. The 1st comparison • Instead of counting genes - count gene families • What are gene families ? Paralogs = highly similar proteins in the same genome Similar functionality – but not always • Remark: proteins = genes Sets of paralogs

  11. Findings • Size of a family: one or more • No. of families – not a good measure for complexity

  12. The 2nd comparison • Pool genes of large families of 3 species: • For each protein – search for orthologs • Orthologs = Similar proteins in other species • Among families found in flies and worms (but not yeast): Responsible for multi-cellular development • Among families found only in flies: Responsible for immune response and fly specific

  13. A C G C T C G C A A C T A C G C T T G C Methods – BLAST algorithm • Basic Local Alignment Search Tool • For comparing biological sequences (to find Homology) Example: Proteins, DNA sequences Query Library of sequences (In the library – sequences of different lengths) • In the paper: Paralogy, Orthology - kinds of Homology

  14. BLAST – Step 1 • Separate query to k-letter words Example: Proteins – Letters are Amino acids (L=Leucine) Query sequence: RPPQGLF (k=3) 3-letter words: RPP PPQ PQG QGL GLF

  15. BLAST – Step 2 • Take one k-letter word – PQG • Search library for similar words – LGMCPQA, DPPEGVV • Define similarity: High score for 2 words Have common ancestor PQG – PQA : 12 PQG – PEG : 15 • Save similar words above a threshold T (save positions) • Repeat for all k-letter words in query Use scoring matrix for two k-letter words

  16. BLAST – Step 3 • Align at saved positions: - - - R P P Q G L F - - - - - - D P P E G V V - - - Scores: -2 7 7 2 6 1 -1 • Extend match right and left for positive score • New pairs are called High-scoring Segment Pairs (HSP) • Save significant HSPs (above a threshold S) Total: 15 + 7 + 1 = 23

  17. BLAST – Step 4 • Align saved HSPs (with gaps) Example: 2 Sequences with 2 HSPs Insert gap • Compute total score (involves gap penalties) • Report all matches above a threshold E

  18. BLAST – Whole process Separate query to k-letter words Search library for similar k-letter words and save Extend to HSPs and save Align whole sequences and compute total score Return sequences with score above E These are homologous to query

  19. The 3rd comparison • Compare all genes of three species with length limitation (80% of length) • 20% of the fly appear in worm and yeast They perform functions common to all eukaryotic cells

  20. The 4th comparison • Compare all genes of three species to mammalian sequences (without length limitation) • 50% of the fly proteins appear in mammals • 36% of the worm proteinsappear in mammals Fly is closer to mammals • Most of mammalian sequences used here were short The similarities reflect conserved domains

  21. What are conserved domains ? • Domains – independent parts that construct proteins • Appear in different combinations in different proteins • Similarity to short sequences Conserved domains Closeness in evolution ABC ADEG

  22. To conclude Significant similarity between genomes of ”distant” species (Man – Yeast 23%) Similarity increases for taxonomically close species ( ) No. of genes or gene families – bad measure for complexity Why ? More information that is not encoded in the genome (Protein interactions – e.g. physical proximity of genes) How to define complexity ?

More Related