70 likes | 219 Views
Lamprey Genome Consortium. Analysis of conserved non-coding elements. Greg Elgar gelgar@nimr.mrc.ac.uk 01/12/2010. Lamprey Genome Consortium. Analysis of conserved non-coding elements. McEwen et al., PLoS Genet. 2009 Dec;5(12):e1000762.
E N D
Lamprey Genome Consortium Analysis of conserved non-coding elements Greg Elgar gelgar@nimr.mrc.ac.uk 01/12/2010
Lamprey Genome Consortium Analysis of conserved non-coding elements McEwen et al., PLoS Genet. 2009 Dec;5(12):e1000762 Previous analysis on 19 million sequence reads from trace archive Greg Elgar gelgar@nimr.mrc.ac.uk 01/12/2010
Lamprey Genome Consortium Analysis of conserved non-coding elements McEwen et al., PLoS Genet. 2009 Dec;5(12):e1000762 In total, 246 (high confidence) Lamprey CNEs identifiable in trace archive (considerably lower than we expected) Still highly conserved but across smaller regions (about 50% length of jawed alignments Retain high functional homology despite shorter length Greg Elgar gelgar@nimr.mrc.ac.uk 01/12/2010
Lamprey Genome Consortium Analysis of conserved non-coding elements Analysis of current assembly Approximately 160 CNEs identifiable using same non-stringent parameters (Word size=8, Gap pen =1) Found across a broad range of gene loci Loci tend be gene deserts and therefore some of the more difficult regions to assemble/anchor. Many very large gaps in scaffolds (e.g. scaffold_845 that contains tshz gene CNEs is >400kb but about 70% NNNs) Greg Elgar gelgar@nimr.mrc.ac.uk 01/12/2010
Lamprey Genome Consortium Analysis of conserved non-coding elements Analysis of current assembly As a result, difficult to carry out multiple local alignments using tools such as MLAGAN Some very highly conserved CNEs (identifiable in ampbioxus) are not in assembly or trace archive Also not amplifiable from lamprey live DNA (now trying sperm DNA) Greg Elgar gelgar@nimr.mrc.ac.uk 01/12/2010
Lamprey genome sequencing Illumina sequencing of lamprey sperm DNA Mate pair (3,4 and 5kb fragments) and paired end (800bp) libraries prepared. All from same individual. Plan is to run short reads off mate pair libraries (36-50bases) and longer reads (up to 150 bases if we get the HiSeq very soon) off the paired end libraries
Lamprey genome sequencing Assembly Plan is to map back to existing assembly and also attempt a de novo assembly We will also attempt de novo assembly of non-mapping reads Of course all data will be available to the Consortium Waiting for a ‘slot’ on the machine currently