1 / 34

Ensembl Funcgen Perl API

Ensembl Funcgen Perl API. Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK. Funcgen. What is Ensembl Funcgen/eFG?. A local data storage and analysis platform OR A Ensembl functional genomics database providing epigenomic and regulatory annotations OR Both.

trilby
Download Presentation

Ensembl Funcgen Perl API

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ensembl Funcgen Perl API Nathan Johnson njohnson@ebi.ac.uk EBI - Wellcome Trust Genome Campus, UK Funcgen

  2. What is Ensembl Funcgen/eFG? A local data storage and analysis platform OR A Ensembl functional genomics database providing epigenomic and regulatory annotations OR Both

  3. Tab2MAGE MAGE-ML Annotated Features Analysis Pipeline DAS GFF eFG Dataflow Experimental Data Import API FuncGen DB Export API Web API

  4. eFG data Peak Calls e.g. Mpeak, TileMap, ChIPOTLE, Nessie Combinatorial analysis e.g Regulatory Build Externally curated e.g cisRED, MiRanda, Vista Experimental Technology • Experimental meta data • Raw & Normalised data Processed • Arrays/Chips/Probes • e.g. Tiling arrays • Short reads • e.g Solexa, SOLiD etc

  5. eFG data • Ensembl v50 July '08: • >60 data sets (ChIP-chip, wiggle, bed, custom)‏ • 3 species • 9 cell types • 24 Histone modifications, DHSS, CTCF, RNAPoLII … • Regulatory Build v3: • Gene Associated 1584 • Gene Associated - Cell type specific 5614 • Non-Gene Associated 799 • Non-Gene Associated - Cell type specific 520 • Promoter Associated 12022 • Promoter Associated - Cell type specific 1619 • Unclassified 24814 • Unclassified - Cell type specific 127633

  6. eFG Display cisRED miRanda Vista Regulatory Features CTCF Data Methylation data

  7. How eFG fits in. • ensembl-functgenomics API • Object Oriented PERL • Follows Object ObjectAdaptor paradigm • Fully integrated with wider Ensembl family of MySQL DBs • Multi-Assembly: eFG stores a registry of core coordinate information which allows data to be stored using different core DBs and different genome assemblies. • Minimal maintenance: Designed to aid incremental updates to local installations. Patch and update rather than blow away and recreate. • Fully automated data import API and analysis pipeline

  8. Experimental Array eFG Schema Sets Features

  9. Features: Probe > Annotated; External > Regulatory. Sets - An abstract concept for manipulation of data collections: Logical association/combination Access and administration Supporting/Product Set classes: ResultSet - Chips/Channels > Replicates ExperimentalSet - Feature only import. FeatureSet - e.g. Peak calls > AnnotatedFeatures DataSet - Combines supporting Sets and product FeatureSet Features & Sets

  10. eFG data flow DataSet3 DataSet4 DataSet2 External DB ResultSet3 HitList DataSet1 SupportingSet2 ResultSet3 ResultSet2 1... 2.. 3.. External SupportingSet2 ResultSet2 ResultSet1 SupportingSet1 Experimental ResultSet1 Feature Feature SupportingSet1 Result Raw Data Combined FeatureSet Product FeatureSet Export API GFF

  11. Technology data Array: A definitive collection of chips. name(), format(), vendor(), description(), type(). fetch_by_name_vendor(), fetch_all_by_type(). ArrayChip: an individual chip in an array collection. name(), design_id(). fetch_all_by_array_design_ids, fetch_all_by_array_id(), fetch_all_by_ExperimentalChip. Probe: a unique probe sequence within a given array or set of arrays. name(), class(), length(). fetch_all_by_Array, fetch_all_by_ArrayChip(), fetch_all_by_array_probe_probeset_name(). ProbeFeature: an alignment of a Probe against the genome. start(), end(), strand(), mismatches(), cigarline(), analysis(). fetch_all_by_Probe, fetch_all_by_Slice_ExperimentalChips().

  12. DBAdaptor example code use strict; use Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor; use Bio::EnsEMBL::DBSQL::DBAdaptor; my $dna_db = Bio::EnsEMBL::DBSQL::DBAdaptor->new ( -user => ‘anonymous’, -host => ‘ensembldb.ensembl.org’, -species => ‘Homo_sapiens’, -dbname => ‘homo_sapiens_core_37_35j’, -group => ‘core’, ); my $efg_db = Bio::EnsEMBL::Funcgen::DBSQL::DBAdaptor->new ( -user => ‘anonymous’, -host => ‘ensembldb.ensembl.org’, -species => ‘Homo_sapiens’, -dbname => ‘homo_sapiens_fungen_48_36j’, -group => ‘funcgen’, -dnadb => $dnadb, );

  13. Array example code use strict; use Bio::EnsEMBL::Registry; my $reg = "Bio::EnsEMBL::Registry"; $reg->load_registry_from_db ( -host=> ‘ensembldb.ensembl.org’, -user => ‘anonymous’, ); my $efg_db = $reg->get_DBadaptor(‘Human’, ‘funcgen’); my $array_adaptor = $efg_db->get_ArrayAdaptor; my @arrays = @{$array_adaptor->fetch_all }; foreach my $array(@arrays){ print "\nArray:\t".$array->name."\n"; print "Type:\t".$array->type."\n"; print "Vendor:\t".$array->vendor."\n"; } Array: 2005-05-10_HG17Tiling_Set Type: OLIGO Vendor: NIMBLEGEN Array: ENCODE3.1.1 Type: PCR Vendor: SANGER

  14. ArrayChip example code my $array = $array_adaptor->fetch_by_name_vendor ('2005-05-10_HG17Tiling_Set', 'NIMBLEGEN’); my @achips = @{ $array->get_ArrayChips }; foreach my $ac(@achips){ print "ArrayChip:".$ac->name."\tDesignID:". $ac->design_id."\n"; } ArrayChip:2005-05-10_HG17Tiling_Set31 DesignID:2061 ArrayChip:2005-05-10_HG17Tiling_Set24 DesignID:2054 ArrayChip:2005-05-10_HG17Tiling_Set12 DesignID:2042 ArrayChip:2005-05-10_HG17Tiling_Set03 DesignID:2033 ArrayChip:2005-05-10_HG17Tiling_Set04 DesignID:2034 ArrayChip:2005-05-10_HG17Tiling_Set29 DesignID:2059 ArrayChip:2005-05-10_HG17Tiling_Set13 DesignID:2043 ArrayChip:2005-05-10_HG17Tiling_Set34 DesignID:2064 ArrayChip:2005-05-10_HG17Tiling_Set07 DesignID:2037 ArrayChip:2005-05-10_HG17Tiling_Set17 DesignID:2047 ArrayChip:2005-05-10_HG17Tiling_Set23 DesignID:2053 ArrayChip:2005-05-10_HG17Tiling_Set36 DesignID:2066 ArrayChip:2005-05-10_HG17Tiling_Set08 DesignID:2038

  15. Probe example code my $probe_adaptor = $efg_db->get_ProbeAdaptor; my $pfeature_adaptor = $efg_db->get_ProbeFeatureAdaptor; my $probe = $probe_adaptor->fetch_by_array_probe_probeset_name ('2005-05-10_HG17Tiling_Set', 'chr22P38797630’); print "Got ".$probe->class." probe ".$probe->get_probename."\n"; my @pfeatures = @{$pfeature_adaptor->fetch_all_by_Probe($probe) }; print "Found ".scalar(@pfeatures)." ProbeFeatures\n"; foreach my $pfeature(@pfeatures){ print "ProbeFeature found at:\t".$pfeature->feature_Slice->name."\n"; } Got EXPERIMENTAL probe chr22P38797630 Found 1 ProbeFeatures ProbeFeature found at: chromosome:NCBI36:22:38803076:38803125:1

  16. ExperimentalData1 Experiment provides a natural containers for experimetnal meta. name(), group(), mage_xml(), primary_design_type(), description(), get_ExperimentalChips(). fetch_by_name(), fetch_all_by_group(), get_all_experiment_names(). ExperimentalChip represents a unique physical instance of an ArrayChip. unique_id(), cell_type(), feature_type(), biological_replicate(), technical_replicate(). fetch_all_by_experiment(), fetch_by_unique_id_vendor(). Channel represents a control or experimental channel from and ExperimentalChip. dye(), type(), sample_id(). fetch_all_by_ExperimentalChip(), fetch_all_type_experimental_chip_id().

  17. ExperimentalData1 example code my $exp_adaptor = $efg_db->get_ExperimentAdaptor; my $exp = $exp_adaptor->fetch_by_name(‘ctcf_ren’); my $num_chips = scalar(@{$exp->get_ExperimentalChips }); print $exp->name.' '.$exp->primary_design_type. " experiment contains $num_chips ExperimentalChips\n"; ctcf_ren binding_site_identification experiment contains 36 ExperimentalChips

  18. ExperimentalData2 • ResultSet provides easy access to discrete sets of experimental data e.g replicates. • name(), cell_type(), feature_type(), display_label(), get_ExperimentalChips(), get_ResultFeatures_by_Slice(). • fetch_all_by_name(), fetch_all_by_name_Analysis(), fetch_all_by_FeatureType(), fetch_all_by_Experiment(). • ResultFeature is a special lightweight Feature optimised for display and analysis purposes. • start(), end(), score(). • ResultSet::get_ResultFeatures_by_Slice().

  19. ExperimentalData2 example code my $resultset_adaptor = $efg_db->get_ResultSetAdaptor; my $slice_adaptor = $efg_db->get_SliceAdaptor; my ($result_set) = @{$resultset_adaptor-> fetch_all_by_name(‘ctcf_ren_BR1’) }; my $slice = $slice_adaptor->fetch_by_region(‘chromosome’,‘X’); my @result_features= @{$result_set->get_ResultFeatures_by_Slice($slice)}; print "Chromosome X has ".scalar(@result_features). " results\n"; foreach my $rf(@result_features){ print "Locus:\t".$rf->start.'-'.$rf->end. "\tScore:".$rf->score."\n"; } Chromosome X has 582133 results Locus: 429-478 Score:-0.1095 Locus: 529-578 Score:-0.1155 Locus: 629-678 Score:0.0135 Locus: 729-778 Score:-0.1735 Locus: 829-878 Score:0.256

  20. More Sets • Experimental(Sub)Set are a special placeholder sets which facilitate feature import without any underlying data. • name(), cell_type(), feature_type(), format(), get_subsets(), ExperimentalSubSet->name(). • fetch_by_name(), fetch_all_by_Experiment(), fetch_all_by_CellType(), fetch_all_by_FeatureType(). • FeatureSet is generic set for containing features of various types e.g. AnnotatedFeatures, ExternalFeatures, RegulatoryFeatures. • name(), cell_type(), feature_type(), analysis(), get_Feature_by_Slice(). • fetch_by_name(), fetch_all_by_type(), fetch_all_by_CellType, fetch_all_by_FeatureType().

  21. More Sets • DataSet is the top level container which associates underlying data or ‘supporting sets’ and a product FeatureSet i.e. the results of an analysis based on the underlying data. Supporting sets can be any other type of ‘Set’. • name(), cell_type(), feature_type(), product_FeatureSet(), get_supporting_sets(). • fetch_by_name(), fetch_all_by_supporting_set(), fetch_all_by_product_FeatureSet().

  22. Set example code 1 my $dataset_adaptor = $efg_db->get_DataSetAdaptor; my $data_set = $dataset_adaptor->fetch_by_name (‘Nessie_NG_STD_2_ctcf_ren_BR1’); my @supporting_sets = @{$data_set->get_supporting_sets}; foreach my $sset(@supporting_sets){ print ‘Supporting set ‘.$sset->name.”\n”; print 'Produced by analysis '. $sset->analysis->logic_name."\n"; } my $pfset = $data_set->product_FeatureSet; print “\nProduct FeatureSet is “.$pfset->name.”\n”; print 'Produced by analysis '. $pfset->analysis->logic_name."\n"; Supporting set: ctcf_ren_BR1_TR1 Produced by analysis VSN_GLOG Product FeatureSet is Nessie_NG_STD_2_ctcf_ren_BR1 Produced by analysis Nessie_NG_STD_2

  23. Set example code 2 my $featureset_adaptor = $efg_db->get_FeatureSetAdaptor; my @ext_fsets = @{$featureset_adaptor-> fetch_all_by_type('external')}; foreach my $ext_fset(@ext_fsets){ print "External FeatureSet:\t".$ext_fset->name."\n"; } External FeatureSet: miRanda miRNA External FeatureSet: cisRED group motifs External FeatureSet: cisRED search regions External FeatureSet: VISTA enhancer set

  24. Features • ProbeFeature represent an individual alignment of a probe sequence. • probe(), probeset(), probelength(), get_result_by_ResultSet(). • fetch_all_by_Probe(), fetch_all_by_Slice_ExperimentalChips(). • AnnotatedFeature represents any feature based on experimental information i.e. ResultSet or ExperimentalSet data. • cell_type(), feature_type(), score(), display_label(). • ExternalFeature represents an individual feature from an externally curated set. • cell_type(), feature_type(), display_label().

  25. Features • RegulatoryFeature represents a feature generated by the Regulatory Build. A combinatorial analysis based on DNase1 HSS’s, CTCF and histone modifications. • feature_type(), bound_start(), bound_end(), regulatory_attributes, display_label(), stable_id(). • fetch_all_by_Slice, fetch_by_stable_id().

  26. Features example code 1 my $featureset_adaptor = $efg_db->get_FeatureSetAdaptor; my $feature_set = $featureset_adaptor->fetch_by_name (‘miRanda miRNA’); my @features= $feature_set->get_Features_by_Slice($slice); foreach my $feat(@features){ print $feat->display_label."\t".$feat->feature_Slice->name."\n"; } ENST00000390665:mmu-miR-712 chromosome:NCBI36:X:214111:214131:-1 ENST00000390665:mmu-miR-673-5p chromosome:NCBI36:X:214115:214136:-1 ENST00000390665:hsa-miR-22 chromosome:NCBI36:X:214125:214146:-1 ENST00000390665:hsa-miR-887 chromosome:NCBI36:X:214138:214159:-1 ENST00000390665:mmu-miR-696 chromosome:NCBI36:X:214149:214165:-1 ENST00000390665:hsa-miR-328 chromosome:NCBI36:X:214178:214200:-1 ENST00000390665:mmu-miR-669b chromosome:NCBI36:X:214228:214250:-1 ENST00000390665:hsa-miR-197 chromosome:NCBI36:X:214264:214285:-1 ENST00000390665:hsa-miR-220b chromosome:NCBI36:X:214265:214286:-1 ENST00000390665:hsa-miR-636 chromosome:NCBI36:X:214341:214362:-1 ENST00000390665:mmu-miR-689 chromosome:NCBI36:X:214424:214445:-1

  27. Features example code 2 my $regfeat_adaptor = $efg_db->get_RegulatoryFeatureAdaptor; my @reg_feats= $regfeat_adaptor->fetch_by_Slice($slice); foreach my $reg_feat(@reg_features){ print $reg_feat->stable_id.' '. $reg_feat->feature_type->name."\n"; foreach my $attr_feat(@{$reg_feat->regulatory_attributes}){ print 'AttributeFeature '. $attr_feat->feature_type->name."\n"; } } ENSR00000175296 Promoter Associated - Cell type specific AttributeFeature H3K4me3 AttributeFeature H3K4me3 AttributeFeature DNase1 AttributeFeature DNase1 AttributeFeature H3K4me3 ENSR00000092125 Unclassified - Cell type specific AttributeFeature DNase1

  28. eFG Environments • eFG environments provides useful functions, configuration and administration utilities: • efg • efg_pipeline • Coming soon… • Array mapping environment: • Affy, Illumina, Codelink, Agilent, Nimblegen. • Genomic & transcript mapping pipelines.

  29. eFG Import • efg environment • Arrays: • Nimblegen • Sanger ENCODE • Simple: • GFF • BED • Wiggle • External: • cisRED • miRanda • VISTA • redFLY

  30. eFG Import • ChIP-chip • Normalisation: VSN; TukeyBiweight. • Bio::MAGE/Tab2Mage • ResultSet nomeclature: EXP1 EXP1_BR1 EXP1_BR1_TR1 EXP1_BR1_TR2 • ChIP-Seq • Pre/Post analysis

  31. eFG Analysis • efg_pipeline environment • Pipeline - Ensembl gene build pipeline technology. • Analysis Runnables: • ACME • Chipotle • Splitter • TileMap • Nessie(unpublished) • SWEmbl(unpublished) • Regulatory Build

  32. DNAse1 DNAse1 CTCF H3K36me3 H3K4me3 H3K4me3 H3K27me3 eFG Analysis • Regulatory Build - Feature construction: • Anchor/Focus sets: DNase1; CTCF. • Attribute sets: Histone Modifications; Transcription factors. • Regulatory Annotation - Patterns associated with: • Promoter regions • Gene regions • Non-Gene regions

  33. Getting More Information Workshop material http://www.ebi.ac.uk/~njohnson/courses/15.09.2008-GI-Hinxton perldoc – Viewer for inline API documentation. shell> perldoc Bio::EnsEMBL::Funcgen::RegulatoryFeature online at: http://www.ensembl.org/info/software/Pdoc/ eFG schema description: online at: http://www.ensembl.org/info/using/api/funcgen/funcgen_schema.html eFG installation document: online at: http://www.ensembl.org/info/using/api/funcgen/efg_introduction.html ensembl-dev mailing list: ensembl-dev@ebi.ac.uk

More Related