290 likes | 465 Views
Modifying EnsMart. Damian Smedley. Modifying EnsMart. Modifying the existing EnsMart system Current status of the distributed, generic BioMart system and future plans Modifying the BioMart system. EnsMart schema. %_gene_snp_ dm. %_gene_ main. gene_id. gene_id. Attribute columns.
E N D
Modifying EnsMart Damian Smedley
Modifying EnsMart • Modifying the existing EnsMart system • Current status of the distributed, generic BioMart system and future plans • Modifying the BioMart system
EnsMart schema %_gene_snp_dm %_gene_main gene_id gene_id Attribute columns Attribute columns Filter columns %_gene_xref_REFSEQ_dm %_transcript_main gene_id gene_id transcript_id transcript_id Attribute columns Attribute columns Filter columns %_karyotype_lookup %_dna_chunks_support
Modifying EnsMart • Adding new data to an existing EnsMart database • Changing species and foci available in EnsMart • Creating a completely new species/focus (i.e.) a new chimpanzee-gene mart or a new protein focus
Adding new data to an existing mart • Modify the database • Modify the API • Modify the Web code
Modify database • New table(s) following the mart naming convention • New column(s) in existing tables • Example: Some human transcripts mapped to some IDs mysql> SELECT * FROM example_mappings; +--------------------------+------------+ | transcript_stable_id | EXAMPLE_id | +--------------------------+------------+ | ENST00000326632 | AK024481 | | ENST00000269816 | AK024448 | | ENST00000326447 | AF346307 | | ENST00000018806 | AF118890 | | ENST00000018806 | U92072 | | ENST00000018784 | AF118889 |
Modify database: new table • Left join onto the main table to create a new dimension (dm) table: • mysql> CREATE TABLE hsapiens_ensemblgene_xref_EXAMPLE_dm • -> SELECT m.gene_id, m.gene_stable_id, m.transcript_id, m.transcript_stable_id, m.translation_id, m.translation_stable_id, e.EXAMPLE_id as display_id, e.EXAMPLE_id as dbprimary_id • -> FROM hsapiens_ensembltranscript_main m LEFT JOIN example_mappings e ON m.transcript_stable_id=e.transcript_stable_id; • mysql> select * from hsapiens_ensemblgene_xref_EXAMPLE_dm; +---------+----------------+---------------+---------------------+--------------+------------------------+------------+------------+ | gene_id| gene_stable_id | transcript_id | transcript_stable_id| translation_id| translation_stable_id | display_id | dbprimary_id +---------+----------------+---------------+---------------------+--------------+------------------------+------------+------------+ | 97565| ENSG00000023810 | 123373 | ENST00000036411 | 124178 | ENSP00000037729 | AK024495 | AK024495 | 97565| ENSG00000023810 | 123374 | ENST00000036403 | 124179 | ENSP00000038269 | AF123675 | AF123675 | 97567| ENSG00000014005 | 123376 | ENST00000018764 | 124181 | ENSP00000018764 | AK012367 | AK012367
Modify database:new column • Create flag columns in the gene and transcript main tables indicating whether a particular gene or transcript has a mapped EXAMPLE ID mysql> SELECT gene_id, gene_stable_id, has_EXAMPLE FROM hsapiens_ensemblgene_main limit 5; +---------+----------------+--------------+ | gene_id | gene_stable_id | has_EXAMPLE | +---------+----------------+--------------+ | 97565 | ENSG00000023810| NULL | | 97565 | ENSG00000023810| NULL | | 97567 | ENSG00000014005| 1 | | 97569 | ENSG00000023795| NULL | | 97569 | ENSG00000023795| 1 | +---------+----------------+--------------+
Modify API • MartGeneExtractor.pm • Table name: • my $META_TABLES = { • example => ["%s_%sgene_xref_EXAMPLE_dm", "gene_id"], ... • Attributes and filters: • my $META_NAMES = { • # attribute • xexample_dis => ["example","example.display_id"], • # filters • FG_EXAMPLE_ID => ["example","example.display_id in(%s)"], • example_exclusive => ["example","example.display_id is not null"], • example_excluded => ["example","example.display_id is null"],...
Modify web code • MetaData.pm organised into: • Stages – collections of blocks on one HTML page • Blocks - collections of related forms • Forms – collections of entries • Entries – an individual HTML element STAGE BLOCK FORM ENTRY
Modify web code • MetaData.pm • xexample_dis attribute: • FORM_XREF_ATTRIBUTES:{ • ..... • my %entry_labels = • ( xgene_name_dis => [1, 'Gene Name'], • ..... • ( xexample_dis => [35, 'Example ID'], Add Example ID as an attribute
Modify web code • MetaData.pm cont • Hyperlinks for attribute: • my %hyperlinks = • ( xhugo_dis => ['exturl' , 'HUGO'], • xexample_dis => ['exturl' , 'EXAMPLE], ... • exturl defined in /conf/DEFAULTS.ini: • [ENSEMBL_EXTERNAL_URLS] • EXAMPLE = http://www.ebi.ac.uk/cgi-bin/emblfetch?###ID### Add hyperlink definition for example id
Modify web code • MetaData.pm • The FG_EXAMPLE_ID filter is picked up automatically by a method in the web code that detects all filters beginning FG_ as ID list filters • example_exclusive/excluded filters: • FORM_EXAMPLE:{ • my $form_name = 'example'; • my $form = $block->addobj_form(); • add_available_by_api_filter( $form, 'example_exclusive' ); • $form->set_name($form_name); • $form->set_type('CHECK_WITH_RADIO'); • ENTRY_CHECK:{ • my $entry = $form->addobj_form_entry(); • $entry->set_value(1); • $entry->set_label("Entries with an EXAMPLE ID"); • } Uses gen_check_with_radio method in PanelMain.pm to organise HTML layout
Modify web code • Example exclusive/excluded filters (cont) • ENTRY_RADIO_1:{ • my $entry = $form->addobj_form_entry(); • $entry->set_name_suffix('_type'); • $entry->set_api_filter('example_exclusive'); • $entry->set_value('Only'); • $entry->set_default('Only'); • $entry->set_label('Only'); • $entry->set_label_summary("Has Example ID: %s"); • activate_filter_onchange( $entry ); • add_error_scalar( $entry ); • }
Modify web code • Example exclusive/excluded filters (cont) • ENTRY_RADIO_2:{ • my $entry = $form->addobj_form_entry(); • $entry->set_name_suffix('_type'); • $entry->set_api_filter('example_excluded'); • $entry->set_value('Excluded'); • $entry->set_label('Excluded'); • $entry->set_label_summary("Has Example ID: %s"); • activate_filter_onchange( $entry ); • add_error_scalar( $entry ); • } • }
Changing species/focus available • Create an EnsMart database with: • just the species and focus combination tables interested in including all lookup and support tables: • (i.e.) hsapiens* for a human-only EnsMart • (i.e.) hsapiens_ensemblgene* plus hsapiens_*lookup and hsapiens_*support for a human ensemblgene only EnsMart • the _meta* tables • evoc* and go* if want expression vocabulary and GO searching
Changing species/focus available • Edit the _meta_release_info table: • To only have human datasets: mysql>DELETE FROM _meta_release_info WHERE species != 'homo_sapiens'; • To further restrict focus to ensembl genes only: mysql>UPDATE _meta_release_info SET core_datasets = 'core' WHERE species = 'homo_sapiens'; mysql>UPDATE _meta_release_info SET satellite_datasets = NULL WHERE species = 'homo_sapiens';
Adding a new species • Create the tables conforming to the mart naming convention • Filters and attributes corresponding to equivalent columns in existing EnsMart tables will be picked up automatically (i.e.) chromosome name attribute is already defined by: %_ensemblgene_main.chr_name • Add new filters and attributes as detailed earlier
Adding a new focus • Requires a new Extractor module in the API. For example a new protein focus would require a MartProteinExtractor.pm equivalent to MartGeneExtractor.pm • May require extra configuration methods in MartInfo.pm and MartDefs.pm • All filters and attributes need adding to MetaData.pm in the web code.
BioMart system • MartLib API allowing query chaining between distributed Marts • XML based configuration system • MartEditor tool to create and edit the XML documents • MartShell command line tool and MartExplorer GUI • MartWeb servlet planned for this year • Currently Java-based but perl API and web interface coming to replace existing EnsMart site • EBI Industry Programme 19th March includes “BioMart – a distributed, query-oriented data integration architecture”
Adding new filters and attributes in BioMart • Edit the XML file: • example id attribute <AttributeDescription description=“EXAMPLE Ids” displayName=”EXAMPLE ID” field=”display_id” internalName= ” xref_example_id” homepageURL=”” linkoutURL=”” maxLength=”8” source=”” tableConstraint=”gene_xref_EXAMPLE_dm”/> • example id exclusive/excluded filter <Option description=”filter to include/exclude genes mapping to EXAMPLE Ids” displayName=”with EXAMPLE ID(s)” field=”has_EXAMPLE” internalName=”example_id_xrefs” isSelectable=”true” legal_qualifiers=”only,excluded” tableConstraint=”main” type=”boolean”/> • example id list filter <Option description=”filter to include genes with supplied list of EXAMPLE Ids” displayName=”EXAMPLE ID(s):” field=”display_id” internalName=”example_id” isSelectable=”true” legal_qualifiers=”=,in” qualifier=”=” tableConstraint=”gene_xref_EXAMPLE_dm” type=”list”/>
Adding new datasets in BioMart • Just have to create the XML document once a Mart-compliant database created • The MartEditor tool simplifies this task. Creates a naïve initial XML view of the dataset and allows further editing in a GUI environment • Compare to existing perl system where a whole new perl module has to be coded and new code added to several other modules
Conclusions • Adding new filters, attributes or a whole new species to EnsMart requires some understanding of the mart schema and a bit of “copying and pasting” in MetaData.pm and appropriate Extractor (i.e.) MartGeneExtractor.pm • Adding new datasets/foci requires a good understanding of the mart API as a new Extractor modules needs to be coded • The new BioMart system reduces all this to the creation of a mart-compliant schema and use of a GUI editing tool to produce an XML configuration file
Acknowledgements • Arek Kasprzyk • EnsMart production and API • Damian Keefe • Darin London • Web code • Will Spooner • Java based Mart system • Craig Melsopp • Darin London • Katerina Tzouvara