260 likes | 510 Views
The Myth of Junk DNA. Dr. Raymond G. Bohlin Fellow, Discovery Institute Probe Ministries. Non-Protein Coding DNA. 2001 – 65,000 mRNAs, but only 4% from exons 2002 – ENCODE found 11,655 non-protein-coding RNAs 2005 – most of mammalian DNA is transcribed
E N D
The Myth of Junk DNA Dr. Raymond G. Bohlin Fellow, Discovery Institute Probe Ministries
Non-Protein Coding DNA • 2001 – 65,000 mRNAs, but only 4% from exons • 2002 – ENCODE found 11,655 non-protein-coding RNAs • 2005 – most of mammalian DNA is transcribed • 2008 – both strands used in transcription and frequently from overlapping segments
Evolutionary predictions • If a sequence is non-functional, then over time the sequence should degrade • If a sequence is functional, then the sequence should be conserved by natural selection.
Non-Protein-coding DNA • 2005 – non-coding regions in humans and mice, hundreds of nucleotides long are identical. • Such ultra conserved regions (UCR) regulate developmentally important functions • This is not expected by evolution!
Introns • Introns are not just inert spacers between exons • 2005 – intronic sequence is highly conserved between humans, mice, rats, dogs and chickens – likely functional • Mammalian thyroid receptor gene produces two variant proteins with opposite effects – splicing is regulated by an intron.
Co-expressed loci are clustered together along in the nucleus, sometimes to “create” genes Nuclear compartment with concentrated transcription factors Chromosome 5 loop Chromosome 21 loop Chromosome 2 loop
Pseudogenes • A pseudogene is a gene that closely resembles a functional gene but appears to be a useless leftover • Pseudogenes as defined above would be predicted by evolution but difficult under ID • The human genome may have as many as 2000 pseudogenes
pseudogenes • Some pseudogenes appear to suppress expression of the functional gene. • The pseudogene can be transcribed and this transcript binds to the mRNA sequence of the functional gene, thus blocking translation. “RNA interference” • Transcribed pseudogenes serve as “perfect decoys” for RNA degrading enzymes, thus enhancing expression.
Repetitive Sequences • About half of the mammalian genome consists of various types of repetitive sequences. • Long Interspersed Nuclear Elements – LINEs • Short Interspersed Nuclear Elements – SINEs • Endogenous Retroviruses - ERVs
Overview of LINEs LINEs and SINEs have different structural arrangements. The major LINE in the human genome is the L1. This sequence: • Is found throughout Mammalia but is largely taxon-specific • Is variously truncated at the 5’ end: ranges from 6-8kb to a few hundred bps in length • Has a biased chromosomal distribution: AT-rich chromosome bands and the X-chromosome G-dense Pu:Py element ORF2: Reverse transcriptase and endonuclease ORF1 (A-rich ‘tail’) Species-specific regulatory region 3’ UTR (A-rich ‘tail’)
Chimp- vs. Human-Specific L1s* Chimp Human 271 L1Hs(Ta) elements 0 L1Hs(Ta) elements 252 L1 nonTa elements 210 L1 nonTa elements 490 L1Pa2 elements 476 L1Pa2 elements 5-6 Million Years Ago *Mills, R.E. et al. 2006. Recently mobilized transposons in the human and chimpanzee genomes. Am. J. Hum. Genet. 78: 671-679.
RNA outputs “Gene” 4 “Gene” 2 “Gene” 3 “Gene” 5 “Gene” 1 Remember the layout of a mammalian gene? Many human gene folders are bordered by species-specific repertoires of L1s. L1s L1s
Almost forty percent of human nuclear matrix attachment elements are L1 sequences.
Overview of SINEs The major SINE in the human genome is Alu. Unlike LINE-1, Alu (and other SINEs) do not encode enzymes for their mobilization. This sequence: • Is primate-specific—subfamilies are distributed in a taxonomically hierarchical manner (same with LINE-1) • Is ~300 bps in length; consists largely of two dimers (with sequence differences) • Has a biased genomic distribution: GC-rich chromosome bands Central A-stretch (A-rich ‘tail’) 31 bp insert Monomer A Monomer B
Chimp- vs. Human-Specific SINEs* Chimp Human 1167 other Alu elements 233 other Alu elements 263 AluS elements 50 AluS elements 1,709 AluYa5 elements 10 AluYa5 elements 1,290 AluYb8 elements 9 AluYb8 elements 484 AluY elements 360 AluY elements 356 AluYc1 elements 979 AluYc1 elements 261 AluYg6 elements 1 AluYg6 elements 864 SVA (SINE) elements 396 SVA (SINE) elements 5-6 Million Years Ago *Mills, R.E. et al. 2006. Recently mobilized transposons in the human and chimpanzee genomes. Am. J. Hum. Genet. 78: 671-679.
Any seemingly random aspect of chromosome sequence arrangement is not. A case in point involves endogenous retroviruses (ERVs): A. Human ERVs contribute 51,197 promoter elements that initiate transcription at various stages (Conley et al., Bioinformatics 24: 1563-1567, 2008). B. Mouse ERVs are highly expressed at the 2-cell embryo stage (and are the earliest to be transcribed in the zygote) and are essential for ontogenesis (Kigami et al., Biology of Reproduction 68: 651-654, 2003).
ERVs • In humans ERVs help regulate blood cell production and metabolizing fat • ERVs also regulate gene expression in the gastrointestinal tract, mammary glands, and testes. • The ERV derived protein syncitin is required for the fusion of fetal and maternal cells in the placenta.
Although less than 2% of genomic DNA in many vertebrates (e.g., mammals) can be placed in the traditional “gene” category, nearly all sequences are transcribed in a cell- and tissue-specific manner.
DNA as Computer • Information carried by DNA is bidirectional, multi-layered, and interleaved. • Repetitive elements format and punctuate the information at different scales • Cells can write codes onto non-coding DNA so phenotype is not always equal to genotype • “metaprogramming” – Cornell Conf.