210 likes | 226 Views
Generally describe the PancChip. How we perform the manual annotation/tasks. Progress so far, coverage. Characterization of PancChip manual & computational results. What we have learned so far from this effort. Manual Annotation of a mouse pancreas-specific microarray chip:PancChip4.0.
E N D
Generally describe the PancChip How we perform the manual annotation/tasks Progress so far, coverage Characterization of PancChip manual & computational results What we have learned so far from this effort Manual Annotation of a mouse pancreas-specific microarray chip:PancChip4.0 Describe efforts of annotation group to annotate the DoTS RNAs associated with the clone sequences present on the PancChip
PancChip created to aid in diabetes and pancreatic development research for expression studies Two versions of PancChip: PancChip2.0 3139 clones -selected from pancreas libraries in dbest and clone expression measured on mouse array some mouse clones chosen based on similarity to human sequences expressed in Pancreas number of clones on 2.0 Chip- 3139 PancChip4.0 3139 clones from 2.0 version plus clones from 20 pancreas libraries created and sequenced by the endocrine pancreas consortium number of clones on 4.0 Chip- 11154 (11400)
To create a list for manual annotation, for each clone we have the 5’and 3’ EST or sequence in GUS so can determine which RNAs (or DoTS assemblies) contain the PancChip clone sequences
Example gene with RNA from list to illustrate annotation process Frizzled 2 gene Frizzled receptors: Fzd1, Fzd2, Fzd3, Fzd4, Fzd5, Fzd7, Fzd8, Fzd9 Ligands, the Wnt (wingless-related MMTV integration site) family Wnt5a, Wnt5b, Wnt3a, Wnt4, Wnt6, Wnt7b, Wnt11 Frizzled receptors and Wnts are expressed in the developing pancreas (RT-PCRs exps mouse embryonic pancreas) cDNAs were present as clones from libraries used to create the PancChip
Frizzled Receptors are expressed in the developing pancreas (RT-PCRs exps mouse embryonic pancreas) fzd2 Science June 6 2003 One of three pathways in which Frizzled involved
Manual Annotation efforts utilize: A web-based tool to update GUS Group of Annotators - one full time and 13 part time
The list of RNAs to annotate is presented to the annotator For Panc2.0, annotated ranked expression from microarray exp. but later Panc4.0 chip list was displayed
Gene Annotation Tasks Annotation Tasks include: 1. Assigning reference RNA sequence. (longest RNA, greatest number of contained seqs. used to create assembly consenus) 2. Determining members of Gene cluster (these are the similarity-based genes) -removing or adding clusters members - validating RNA transcripts assigned to gene by BLAST similarities, image clone linkages, blat alignment to genomic sequence 3. Adding approved abbreviated gene name or symbol (if known) and evidence for the MGI symbol Gene description field: 4. Adding approved full gene name 5. Adding gene (symbol) synonyms and evidence for them.
Editing Gene Page: Annotator Interface * ‘Transcript Unit’ or RNAs associated with the Fzd2 gene
Editing RNA Page: Annotator Interface RNA Annotation Tasks include: Modifying TS (RNA) description of reference sequence to reflect HUGO or MGI approved full gene name, if assigned.
RNA Annotation Tasks include: 2. GO Function assignment/verification – GO Function manually assigned; predicted GO Functions are verified GO ontology - hierarchy of controlled vocabulary terms describing functions Ex: nucleic acid binding->DNA binding For Frizzled 2 signal transducer activity -> transmembrane receptor activity -> non-G-protein coupled 7TM receptor activity -> frizzled receptor activity -> frizzled-2 receptor activity
PancChip4.0 How many RNAs contain Panc4 sequences (EST from same clone can be in separate RNA because of no sequence overlap)? 12323 PancChip DoTS RNAs How many of these RNAs annotated? 6194 RNAs How many PancChip genes contain these RNAs (genes on chip)? 8579 genes How many PancChip genes have been annotated thus far? 4004 genes 3336 have approved gene symbols; 2339 have at least 1 synonym; 51 have syn. but no approved gene symbol How many RNAs(avg) per PancChip gene? 1.8 How many RNAs(avg) per reviewed PancChip gene? 3.6
enzyme 3359 ligand binding or carrier 2050 nucleic acid binding 1639 signal transducer 1115 structural protein 820 transporter 541 obsolete 280 enzyme regulator 234 cell adhesion molecule 252 chaperone 213 defense/immunity protein 115 molecular_function unknown 103 apoptosis regulator 143 cell cycle regulator 47 motor 42 protein tagging 22 antioxidant 14 chaperone regulator 3 How many PancChip RNAs have predicted GO Functions? 4821; (reviewed 1389) Predicted and reviewed GO Functions (top level distribution) for PancChip RNAs
Similarity Breakdown for PancChip RNAs The cDNAs, corresponding to the clones on the PancChip 4.0, were mapped to DoTS RNAs then ranked by their BLASTX similarity to known protein sequences within the Non Redundant (NR) Protein Database (NCBI). The number in ( ) is the number annotated.
2 870 11 852 7 684 5 638 1 616 4 614 9 574 3 560 6 517 8 500 10 482 17 474 14 394 15 378 12 373 13 373 16 342 19 332 302 18 296 PancChip RNAs align to multiple chromosomes Chr Count X The PancChip RNAs were BLAT aligned to genomic sequence and their alignment distribution per chromosome was determined. Chromosome 2 was the most highly represented. Quality alignments were considered (BA 1-3).
Chr Count 11 39 2 36 5 27 10 21 15 19 8 19 9 19 7 19 17 18 4 17 14 16 3 16 13 15 1 14 18 14 12 13 6 13 16 12 11 X 19 9 RNAs on the PancChip that may encode transcription factors. nucleic acid binding->DNA binding->transcription factor activity There are 421 RNAs with the predicted function of transcription factor activity Examples (gene symbols)- Tcf1, HNF4, Tcf2, Ipf1 and Isl1. Each have been shown to play a role in pancreatic gene expression. Where the RNAs align (those with good alignments)
Look at Frizzled 2 receptor expression on PancChip PancVsLiver Cy5(red) liver 509 Cy3(green) pancreas 1780 Foreground median minus background median values Look at Frizzled 2 receptor expression on PancChip IsletVsPanc Cy3(green) islet 368 Cy5(red) pancreas 89
* * Channel 4 (Cy3) is pancreas; Channel 3 (Cy5) is liver
Look at these developmental time series Panc2.0; Panc4.0; include Wnt1 clone for chip
New annotation tool: annotate genomic sequence (mouse Chr 11& 2 regions) create gene model RNA transcripts (alt.forms)