60 likes | 193 Views
Curation Topics for Discussion. Topics. Sequencing Errors Pseudogenes and non_coding_transcripts SWEEP. Sequencing errors in Sanger 1/2. ZK1067 ZK1067.7 del tgccaacaggcCtgccagccaac gttggctggcagGcctgttggca single bp deletion reported by Jeb Gaudet<jeb.gaudet@hci.utah.edu>
E N D
Curation Topics for Discussion. IWN Pre-meeting meeting
Topics • Sequencing Errors • Pseudogenes and non_coding_transcripts • SWEEP IWN Pre-meeting meeting
Sequencing errors in Sanger 1/2 • ZK1067 ZK1067.7 del tgccaacaggcCtgccagccaac gttggctggcagGcctgttggca single bp deletion reported by Jeb Gaudet<jeb.gaudet@hci.utah.edu> PCR as it seems that the 2nd G is compressed under the CC even in the overlapping clone T26C5 • K07A1 K07A1.16 deletion TTTATtGGAACAT single base deletion in genomic, compared to research data. PCR as there are 7 reads that are all inconclusive. • Y111B2A Y111B2A.23 Poss 2 del tcaAtaccacaaaatgctccaaAtc 2 base deletion in region supported by mRNA and 2 EST reads PCR as there are 4 reads that are all inconclusive. • C36A4 gap EST suggests 3 c's at 9280 rather than 2 PCR as all of the 5 reads have a large balooning trace (probably is 3 c's • C34C12 C34C12.6 1 delition gacttggcgggGaagttcttgcc single EST suggests this but would make valid CDS • K08E5 K08E5.1 deletion in this region gtttgatgatgagtgatgatcatgg PCR looks to be poss 10+ bp deletion in the genomic sequence reappraise • T07D4 T07D4.2 2bp del? not in repos poss 2bp deletion in genomic. • K12D12 K12D12.4 deletion tttggcttgggCcacgaatcttt not in repository single base deletion in genomic, 1 OST suggests this. • F11A10 F11A10.1 deletion gatgtacactGctttgcagcg ?? single base deletion in genomic, 2 ESTs suggest this. • ZK637 ZK637.15 g or A aagcatttttttatGg needs oligos. IWN Pre-meeting meeting
Pseudogenes • Pseudogene Definitions (source google)?? “A sequence of DNAsimilar to a gene but non-functional; probably the remnant of a once-functional gene that accumulated mutations.” “An inactive gene derived from an ancestral active gene.” “A DNA sequence which resembles a gene but which has been inactivated by mutation so that it cannot produce a functional product.” “Pseudogenes are defined by their possession of sequences that are related to those of the functional genes, but that cannot be translated into a functional protein” “A gene copy created by a gene duplication event that is no longer functional due to a disabling mutation.” “A non-functional member of a gene family.” “A gene showing significant sequence homoplasy (>75%) to a functional gene, but which has lost any normal function, often through gaining internal stop codons.” IWN Pre-meeting meeting
Objectives • Our definition. • “A pseudogene is a sequence of dna that is derived from an ancestral gene possibly via a gene duplication event that may show significant sequence homology to a functional gene but has acquired mutations to render the locus inactive or the gene product non-functional.” • Separate out the dumping ground. • Non_coding_transcripts Isoforms Blurred with real ncRNAs Ignore tag populated to remove these from wormrna. • Expressed_pseudogene Be aware of Real ncRNAs Have identifiable orfs and protein honology • Confirmed_pseudogene Experimentally confirmed. Protein family analysis. • Pseudogene_fragment small blocks of protein homology • Pseudogene gene predictions with problems, Protein homology, no transcript data IWN Pre-meeting meeting
Methods • non_coding_transcript • Transcript object • Isoform • EST/OST/mRNA data differs to other transcript data associated with this gene. Eg. Alternate splicing that puts the prediction in a non- advantageous frame where the prediction will be grossly truncated. IWN Pre-meeting meeting