140 likes | 267 Views
ESTminer CHADO adaptor. The University of Georgia Alan Gingle, agingle@uga.edu Yecheng Huang, yhuang@uga.edu http://cggc.agtec.uga.edu/ Nov 1, 2004. Introduction. Purpose of this presentation is to draft an EST chado schema that is open for community comments
E N D
ESTminer CHADO adaptor The University of Georgia Alan Gingle, agingle@uga.edu Yecheng Huang, yhuang@uga.edu http://cggc.agtec.uga.edu/ Nov 1, 2004
Introduction Purpose of this presentation is to draft an EST chado schema that is open for community comments Examples are used to demonstrate our approach to applying CHADO to EST data. Contents: ESTMiner_CHADO schema overview Control Vocabulary -- Ontology and definition Feature, and its properties, relationship and location Appendix (example used in slides, minor tables)
ESTminer CHADO schemaoverview • Major part of CHADO that is relevant to the ESTMiner project
EST Control vocabulary I - Ontology 26: imo 27: ipo 1: Read3’’ 6: Library 2: Sequence 19: Library_name 4: Cluster 3: Contig 20: stage 17: Identity_threshold 26: numofSeq 21: cultivar 18: Length_threshold … 5: ESTName 22: cell_type 27: numofcontig 7: GB_ACC_# 23: organ … 8: Scr1o 12: QUAL16o 24: strain 9: Scr1e 13: QUAL16e 25: Organism 14: QUAL20o 10: Scr2o 15: QUAL20e 11: Src2e … 16: GB_Access …
EST Control vocabulary II -Definition • insert into cv (cv_id,name,definition) values (1, ‘CGGC_UGA‘,’University of Georgia, Comparative Grass Genomic Center’ ); • insert into cvterm(cvterm_id, cv_id, name, definition, dbxef_id) valuses (1, 1, ‘Read5’, ‘5\’ read’, 1 );
EST Feature **** Check the example at the appendix **** insert into feature (feature_id, uniquename, residues, seqlen, type_id, …) values (1, ‘IP1_1_F11.g1_A002‘, ‘TGAG…CATTT’, 788,1,… );
EST Feature Relationship feature_id 1 (sequence) member of feature_id 5 (contig) member of feature_id 6 (cluster)
EST Feature Location feature_id 4 feature_id 3 feature_id 1 1 11 589 628 778
Appendix – Example of EST Library IP1 STAGE: N/A FULL_NAME: Immature pannicle 1 CULTIVAR: BTx623 CELL_TYPE: N/A STRAIN: N/A ORGANISM: Sorghum bicolor L. BOTANICAL_NAME: S. bicolor ORGAN: Developing preanthesis pannicles CELL_LINE: N/A COMMENT_FOR_EST: Sequences have been trimmed to exclude PolyA, vector and regions below Phred quality 16. The threshold for high quality sequence is 20. Three-prime sequences, which are obtained with PolyTMix or T7 sequencing primer, are presented as the reverse complement. PUBLISH: Y HOST: N/A SEX: N/A RE_2: EcoRI TISSUE: N/A RE_1: XhoI LIB_NAME: IP1 VECTOR: pBluescript II SK(-) from Lambda Zap II V_TYPE: Plasmid DESCR: The library was made from poly-A RNA in the cloning vector lambda ZAP II. Clones to be sequenced were prepared by mass excision.
Appendix – Example of EST Sequence Seqence Name: IP1_1_F11.g1_A002 GenBank Access Number: BG946868 • Length of Sequence: 788 • Screened Vector • Phred Qulity 20+ START:11 END:589 • Phred Qulity 16+ START:11 END:628 • Phred Quliaty Below 16
Appendix – Example of EST Database • insert into db (db_id, name …) values (1, ‘CGGC_UGA’, …); • insert into dbxref (dbxref_id, db_id,…) values (1, 1…); • insert into dbxrefprop (dbxrefprop_id, dbxref_id, …) values (1,1…)