190 likes | 326 Views
The Refgene Database. Currently we use Google Spreadsheets to Track Reference Genes. Currently we use Google Spreadsheets to Track Reference Genes. Problems with Google Spreadsheets. Occasional errors picking a gene previously selected Every group has their own Spreadsheet
E N D
Currently we use Google Spreadsheets to Track Reference Genes
Currently we use Google Spreadsheets to Track Reference Genes
Problems with Google Spreadsheets • Occasional errors picking a gene previously selected • Every group has their own Spreadsheet • Makes cross-access difficult • They are not integrated in any real way • They are hard to Maintain • Every month they are updated by hand • Hand editing leads to mistakes
It would be Easier to Have a Database to Keep Track of Reference Genes • Data from all reference genomes would be integrated and searchable • They could be automatically updated • New genes and their homologs in every species could populate automatically from Kara’s homology compute. • MODs could provide database reports to update stats on annotation progress • Stats for pubs and grant updates could be retrieved easily
Proposed Interfaces • Refgene interface: Used to set new target genes each month. • As genes are targeted, homologs are automatically populated from P-POD clusters • MOD interface: Used by MOD curators • Allows MOD curators to hand-modify individual records
Fields Required • homolog gene identifiers for a MOD • Date that the gene was deemed comprehensively annotated • Date the gene was chosen as a reference gene
Functionality that should be on the Ref. Gen. Curation Home Page • Log in • Enter new curation target genes • Homologs: - Upload orthologs for those genes- Manually add orthologs- Enter 'curation status' • Generate reports
The Database Should also Accept Automated Loads from MODs Curation status Curation target genes
1. Curator log in - 1 • Once logged in, the curator name and MOD can be assumed anywhere that data is needed. • Also prevents random folks from editing and should probably also restrict editing capability to only homologs from the species you are a curator of? • We all still need to be able to add new target genes regardless of our MOD however. Curator Login Login: To Curation Central Password: Login Reference Genomes Snapshot Target genes: 275 86%
1. Curator log in - 2 Curation Targets Logged in: doughowe | ZFIN Upload New Targets Target Completion Date (MM-YYYY) 06- 2008 Targets TAB file: Browse 06- 2008 Upload Upload orthologs Select month: pull-down list Search Targets [view all] Gene Symbol Or Entrez Gene ID: Search result takes you to the homolog Add/Update page shown on the bottom of slide 13 for the specific target gene located in the search. More than one gene located by the search shows a list of these genes which then link to the homolog View/Add/Update page for each gene Search Reports: • TO DO List (not comprehensive) • ISS could be added • Potential outliers View reports by gene: by organism: Access to reports could be done a couple of ways enter gene Symbol or MOD ID drop down list GO
2. Upload New Targets - 1 • Check that all required data is included • Check that no genes on new targets list have been a target or called as homologs previously…if so, reject load and alert curator to any/all duplications so they can select new targets and try again.
2. Upload New Targets - 2 option 1 Upload New Targets Target Completion Date (MM-YYYY) 06- 2008 Upload Targets TAB file: Browse option 2 Upload Upload Curator enters target date here; and that info is applied to the new table load file Upload (A) We need to be able to enter either an ID or a gene name (in the same or in a different column(B) Need a check for genes already selected
2. Upload New Targets - 3 Upload Your upload was successful! Error: CDC2 (SGD02541231) has already been selected. Please go back and replace this entry.
3. Upload orthologs - 1 Logged in: doughowe | ZFIN Upload New Targets Target Completion Date (MM-YYYY) 06- 2008 Targets TAB file: Browse 06- 2008 Upload Upload orthologs GO Select month: pull-down list Search Targets [view all] Gene Symbol Or Entrez Gene ID: Search Reports: • TO DO List (not comprehensive) • ISS could be added • Potential outliers View reports by gene: by organism: enter gene Symbol or MOD ID drop down list GO
3. Orthologs loads calculated data from P-POD and other available methods Upload orthologs GO Select month: 06-2008
Clicking View/Enter homologs goes to new Target gene-specific page to edit/add homologs Homology Determination Methods Published homolog View/Enter/Edit Homolog page: Target Gene: POLA (H.Sap) Manual Analysis HomoloGene InParanoid OrthoMCL TreeFam P-POD [note] Species PD curator enters symbol curation date curator name PD This is an interface to support adding individual homologs to specific target genes as well as edit previously added homologs..more detail next slide Add
Homology Determination Methods View/Enter/Edit Homolog page: Target Gene: POLA (H.Sap) Published homolog Manual Curation HomoloGene InParanoid OrthoMCL TreeFam P-POD Submit curator enters symbol/ID curation date curator name [note] Curators enter gene symbols OR MOD object IDs. TAIR gene symbols are not unique. pull down menu listing curators. If user login is supported, this column could probably be dropped and would be assumed behind the scenes based on the user login. Currently logged in user displayed at top of page like this: Logged In: dhowe|ZFIN. A checkbox set to assert which homology determination methods were used to support the homology call. Full set may be more than shown here. pull down menu OR organism already chosen based upon curator log in name Clicking submit adds a new homolog to the DB after checking that the gene has not already been added as a target gene or homolog Notes are used to describe anything specific about the orthology call
4. Reports View curation report: POLA • Options: • select all/unselect all • View annotations: * all * non-IEA * experimental √ √