190 likes | 203 Views
This article describes the efforts of the annotation group to manually annotate the RNAs associated with the clones present on the PancChip4.0, a mouse pancreas-specific microarray chip. It provides an overview of the manual annotation tasks performed, the progress made so far, the coverage achieved, and the computational and manual results obtained. The article also discusses what has been learned from this annotation effort.
E N D
Generally describe the PancChip How we perform the manual annotation/tasks Progress so far, coverage Characterization of PancChip manual & computational results What we have learned so far from this effort Manual Annotation of a mouse pancreas-specific microarray chip:PancChip4.0 Describe efforts of annotation group to annotate the RNAs associated with the clones present on the PancChip
PancChip created to aid in diabetes and pancreatic development research expression studies Two versions of PancChip PancChip2.0 3139 clones -selected from pancreas libraries in dbest and clone expression measured on mouse array number of clones on 2.0 Chip- 3139 PancChip4.0 3139 clones from 2.0 version plus clones from 20 pancreas libraries created and sequenced by the endocrine pancreas consortium number of clones on 4.0 Chip- 11154
Manual Annotation efforts utilize: A web-based tool to update GUS Group of Annotators - one full time and 13 part time
To create a list for manual annotation, for each clone we have the 5’and 3’ EST in GUS so can determine which RNAs (or DoTS assemblies) contain the PancChip clone sequences
The list of RNAs to annotate is presented to the annotator For Panc2.0, annotated ranked expression from microarray exp. but later Panc4.0 chip list was displayed
Gene Annotation Tasks Annotation Tasks include: 1. Assigning reference RNA sequence. (largest RNA, greatest number of contained seqs. in assembly) 2. Determining members of Gene cluster (these are the similarity-based genes) -removing or adding clusters members - validating RNA transcripts assigned to gene by BLAST similarity, image clone linkages, blat alignment to genomic sequence 3. Adding approved abbreviated gene name or symbol (if known) and evidence for the MGI symbol Gene description field: 4. Adding approved full gene name 5. Adding gene (symbol) synonyms and evidence for them.
Editing Gene Page: Annotator Interface
RNA Annotation Tasks include: 1. Modifying TS (RNA) description of reference sequence to reflect HUGO or MGI approved full gene name, if assigned. 2. GO Function assignment/verification – GO Function manually assigned; predicted GO Functions are verified
Editing RNA Page: Annotator Interface
Coverage Panc2.0 based on clone id (assume that one of the sequences from clone is in RNA that has been reviewed) 2126 elements out of the 3400 elements on the chip Redundant clones sequences will be in same assembly, so if RNA reviewed then multiple clones, will be reviewed Coverage Panc4.0 thus far, based on clone id 5796 elements out of the 11154 the elements
Consider PancChip4.0 How many RNAs contain clones sequences (EST from same clone can be in separate RNA because of no sequence overlap)? 10635 RNAs How many RNAs annotated (EST(s) of clone in RNA which has been reviewed)? 5004 RNAs How many similarity-based genes contain these RNAs (genes on chip)? 8308 genes How many sim.genes have been annotated thus far? 3509 genes
How many RNAs(avg) per PancChip gene? 9.3 (musDoTS- 5) How many RNAs(avg) per reviewed gene? 12.7 How many RNAs(avg) per un-reviewed gene? 6.8 If RNAs in cluster largely represent alternative forms, then annotating genes with multiple forms. There are 2540 genes with only one RNA in the PancChip gene. How many exons (avg) does the manually assigned reference sequence contain? 9.6 (assuming that this app. number of gene exons; ensembl gene is 8). This depends on the current BLAT alignment of the mouse RNAs (or DTs.) If assign reference sequence (comp.), how many exons (avg), does the assigned reference contain? tbd
How many approved gene symbols have been assigned so far? 2880 For how many genes, where no approved gene symbol, has gene (symbol) synonym been assigned? 48 Including its approved gene symbol, how many genes have at least one other gene symbol syn.? 2014 How many PancChip RNAs have GOFunctions? 4423 (42% of the RNAs) How many PancChip RNA GO Functions have been reviewed? 1483
CHR CNT CNT CHR 1 182 11 245 2 242 2 242 3 158 5 219 4 169 7 183 5 219 1 182 6 151 7 183 4 169 8 127 17 162 9 162 9 162 10 145 3 158 11 245 6 151 12 112 10 145 13 108 15 136 14 115 8 127 15 136 16 120 16 120 14 115 17 162 12 112 18 94 19 110 19 110 13 108 18 94 91 X The chromosomes that the PancChip RNAs align to: *saw similar breakdown based on DoTS gene
What is similarity (wrt NRDB) breakdown for the RNAs on the PancChip? P value less than -200 2603 (1493) P value from -200 to -100 2001 (824) P value -100 to -5 3170 (575) 2861 no NR similarities Generally, Annotators have been annotating known genes
GO Function breakdown for the Panc4.0 chip mDoTS * Based on last set of GO predictions
Annotation for PancChip genes is displayed in allgenes.org additional info. used in queries or data mining (e.g., gene syn.) Assists in validation/evaluation of computational prediction/ for algorithm improvement Should be able to compare mouse/human annotation since also annotating human PancChip RNAs (28971 RNAs, 1500 genes) Issues.. Maintain manual annotation when RNAs or gene change when updating DoTS. Need to switch to human when updating mouse and vice versa Secure access to allgenes-dev (need IP)
New annotation tool Gene based annotation -creating curated gene model stable sequence (curated gene instance) RNAs created based on gene model exons (curated RNA instances each one created representing different alternative form) Will have ability to work off line (no database connection) with storage of annotations
An interesting mouse RNAs on PancChip Gene ID: 36131209 clone ids 5665212;5665066