60 likes | 226 Views
http://www.aceview.org = global annotation of the whole human genome Restricted to the Gencode Regions Reinforced filters to obtain Havana density. AceView Danielle and Jean Thierry-Mieg NCBI. Align all ( 4.3M ) human mRNAs and ESTs Refine the introns by co alignment
E N D
http://www.aceview.org = global annotation of the whole human genome Restricted to the Gencode Regions Reinforced filters to obtain Havana density AceViewDanielle and Jean Thierry-MiegNCBI
Align all (4.3M) human mRNAs and ESTs Refine the introns by co alignment Cluster into transcripts and genes Annotate and ignore the suspect clones Annotate the resulting proteins Do not use a priori splice consensus Do not mask the genome or the cDNAs AceView Strategy
Ignore 5' 3' pairs (they were creating gaps) Reassess manually all the atypical introns and ask the code to ignore the corresponding dubiousclones Ignore all the NMs Remove the pseudo genes To ignore a feature is the only manual operation authorized in the AceView annotation interface Gencode adaptation
98.5% are gt-ag 1% are gc-ag Most other are dubious. If the suspect intron is supported by less than N clones and its boundaries fall within exons supported by some other clone, we flag the N clones as 'suspected internal deletion' and ignore this intron For Gencode we chose N = 3 some of these suspect introns were kept in Havana About the introns boundaries
2 transcripts belong to the same gene if they iterativelly share at least one donor or one acceptor site <==> Split genes touching by their UTR Ignore stupid EST bridges Shed unspliced mRNAs/EST Respect true complex loci New definition of a gene
Good agreement: on number of genes and transcripts. The reinforced filters will be used in the next global AceView release (later this month). Havana is missing some Aceview score=1 transcripts fully supported in genbank: mostly cassette exons i.e.: BI910947, CF593648 A few dubious introns were kept in Havana Comparison AceView/Havana