1 / 19

EXPLORING DEAD GENES

EXPLORING DEAD GENES. Adrienne Manuel I400. What are they?. Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA Results from reverse transcription from an mRNA transcript Or from gene duplication and subsequent disablement.

harriet
Download Presentation

EXPLORING DEAD GENES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EXPLORING DEAD GENES Adrienne Manuel I400

  2. What are they? • Dead Genes are also called Pseudogenes • Pseudogenes are non functioning copies of genes in DNA • Results from reverse transcription from an mRNA transcript • Or from gene duplication and subsequent disablement

  3. Expression of Pseudogenes • Evidently transcribed • Expression of pseudogenes vary • Snail (lymnaea stagnalis) example of an organism that still has functioning

  4. Pseudogenes, Good and Bad! • - Raised expression for tumor cells • + Useful in studying molecular evolution • + Helpful in determining rates of genomic DNA Loss for an organism

  5. Size and Distribution of Pseudogenes DEFINING POPULATIONS AND SUBPOPULATIONS • ‘G’ the total population of confirmed and predicted protein-encoding genes • ΨG is the estimated population of pseudogenes that correspond to G

  6. The Set of genes with at least one verifying EST match was derived GE • A set of genes that were deemed to be highly expressed was derived from microarray expression data and denoted GM • The corresponding predicted tool or pseudogenes is denoted ΨGM

  7. Data Files • Sanger Sequencing Centre ftp (ftp://ftp.sanger.ac.uk) in this website are the six complete sequences of worm chromosomes • GFF Data Files with annotations for genes and other genomic features that correspond to wormpep18 • Arranged were the pseudogene population in the form of a pipeline

  8. Pipelines Step 1: Sanger centre pseudogene annotations • Start with list of 332 pseudogenes • Pseudogene population was derived by looking for gene disablement Step 2: FASTA matching to find potential pseudogenes

  9. PIPELINES (continued) • Worm genes masked for low complexity region with the program SEG • TFASTX and TFASTY are next used to compare the complete wormpep18 against the worm genome • After comparison Pseudogene matches were refined with the next step

  10. Pipeline (continued) Step 3: reduction for overlaps on the genomic DNA • Significant matches of protein sequences to the DNA were reduced for redundancy where homologs match the same segment of DDNA • Matches are then sorted Step 4: Prevention of over counting for adjacent matches. • Initial matches may correspond to same pseudogene • To avoid over counting matches were realigned

  11. Pipeline Step 5: Masking against Sanger Centre annotation and Transposon library. • Potential pseudogenes filtered for overlap with any other annotations in the Sanger Centre GFF files e.g. exons of genes, tandem or inverted repeats Step 6: Reduction for possible additional repeat elements • At this point there is a set of 3814 pseudogenic fragments

  12. Pipeline (final step) Step 7: reducing threshold stringency • e-value match threshold reduced from .01 to .001 Check the web! • http://bioinfo.mbb.yale.edu/genome/womr/pseudogene • To find pseudogene population, the data can be viewed either by searching for protein name or viewing specific range in the chromosome

  13. Size of Pseudogene Popuation • Composed of 2168 sequence, that’s about 12% of total gene complement • Factors that affect the size: 1. Dead copies of transposable elements 2. Size of pseudogene underestimated because pseudogenes with less obvious disablement aren't included. 3.Annotated genes might be pseudogenes because disablement is undetectable 4. Pseudogenes still part of functioning gene 5. Some pseudogenes arise due to sequencing errors 6. Possible genomic repeats

  14. SUBPOPULATIONS • Highly expressed genes have fewer dead gene copies • The most reliable subset of the pseudogene population is about half the total for ΨG. • 39% of pseudogenes are intronic-these kinds of pseudogenes aren't ailing families of proteins

  15. Chromosomal Distributions • More abundant near the ends of chromosome (the “arms”) • For each chromosome, there is a calculated proportion of dead genes

  16. The data plot above indicates genome to genome over all age. • The percentage composition for each of the 20 amino acids is graphed in decreasing order of the implied amino acid composition in the pseudogene set. In the bottom part of the figure, the G difference for each amino acid composition is indicated by a bar.

  17. Listed are the largest sequence families in the worm ranked by genes and pseudogenes • They’re named for their particular representative. Four of the 10 paralog genes family when ranked by number are functionally uncharacterized • Three of the pseudogenes top 10 are amongst the biggest families when we rank according to number of genes

  18. Pseudofolds • These charts ranked in terms of implied structural pseudofolds • Proteins encoded by the worm genome have been assigned to globular domain folds • From the SCOP database

  19. Why was this studied again? • To provide an initial estimate of the size distribution and characterizations of the pseudogene comparing C.elegans in attempt to estimate the total number in humans. • Found few pseudogenes that are apparently due to processing in the worm genome • Found large uncharacterized gene family that makes up 2/3 of dead genes • Arms of chromosome are an unreliable for encoding genes but more likely to spawn new proteins

More Related