1 / 27

NeXML A future data exchange standard for phylogenetics

NeXML A future data exchange standard for phylogenetics. Rutger Vos University of British Columbia. Introduction The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns

ova
Download Presentation

NeXML A future data exchange standard for phylogenetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NeXMLA future data exchange standard for phylogenetics Rutger Vos University of British Columbia

  2. Introduction The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Introduction (1/7)The problem Increased automation in evolutionary informatics is hampered by poorly defined “standards”

  3. Introduction     The problem EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Introduction (2/7)EvoInfointerests Semantics: CDAO Addressing interoperability problems by coding our way out of it Syntax: NeXML Transport: PhyloWS

  4. Introduction     The problem     EvoInfo interests This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Introduction (3/7)This subproject’s mission , but: Fix (some) problems with nexus To create a file format like nexus* *Maddison, Swofford and Maddison, 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol.46(4):590-621 Give access to data at higher level Be extensible Expose data to xml goodies

  5. #NEXUS BEGIN TAXA; DIMENSIONS NTAX=3; TAXLABELS taxon_1 taxon_2 taxon_3; END; BEGIN CHARACTERS; DIMENSIONS NCHAR=2; FORMAT DATATYPE=STANDARD GAP=- MISSING=? SYMBOLS="0 1 2"; MATRIX taxon_1 00 taxon_2 11 taxon_3 22; END; BEGIN TREES; TRANSLATE 1 taxon_1, 2 taxon_2, 3 taxon_3; TREE Tree1 = ((1:0.12,2:0.12):9.88,3:10.0); END;

  6. Introduction     The problem     EvoInfo interests     This subproject Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Introduction (4/7)Nexus issues Hard/impossible to validate • No explicit versions • Nothing ever deprecated • No public extensions • Leads to hacks such as ‘mixed’ data, ‘hot comments’ • Phylogenetics post-’80s in private blocks https://www.nescent.org/wg_evoinfo/NEXUS_Problems

  7. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Introduction (5/7)Parsing plain text versus parsing XML Processing nexus data involves lexing + parsing + processing XML allows choosing a parser library, data can be processed as a structure that hides tokenization issues

  8. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Introduction (6/7)Extensibility Define new data types that implement described ‘interfaces’ Extensible file format should provide the ability to: Attach typed data structures to core types Attach custom XML

  9. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Introduction (7/7)XML goodies Editors / IDEs XML parser libraries Large stack of off-the-shelf tools: Serialization / data binding tools Web service toolkits Native XML databases

  10. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Design (1/5)Design principles Re-use of prior art Follow design patterns Referencing Verbose and compact representations

  11. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Design (2/5)Re-use of prior art Avoid tag soup! Will return to this later… Generic key/value attachments following apple’s plist semantics: <dict> <key>prior</key> <float>0.78</float> </dict> Trees and networks following graphml General file structure following nexus concepts, i.e. blocks that reference each other

  12. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Design (3/5)XML design patterns “Metadata first” “Declare before use” “Venetian blinds” Abstract inheritance through extension, concrete inheritance through restriction

  13. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Design (4/5)Inheritance Base(optional base/lang/href attributes) extends Annotated(optional dict elements) extends Labelled(optional label attribute) extends IDTagged(required id attribute) extends AbstractElement(in root schema) restricts ConcreteElement(in instance document)

  14. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Design (5/5)Referencing Elements sometimes refer to other elements, much like in nexus In nexml, elements refer to the id of other elements by the name of the referenced element:   <otu id="t1"/>   <!-- referenced later: -->   <node id="n1" otu="t1"/>

  15. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Implementation (1/6)Approach Schema design Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. Processors (perl, java, python, c++, VB, JavaScript) development in parallel Experiments with xml tools (ws, db, data binding tools)

  16. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Implementation (2/6) Entity relationships

  17. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Implementation (3/6)inheritance tree for elements

  18. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Implementation (4/6) anatomy of a “block” <characters     id="c1"     xsi:type="nex:DnaSeqs"     otus="t1"> </characters> <dict> <key>desc</key> <string>description…</string> </dict> Contents…

  19. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Implementation (5/6)Character Classes Sequence Cells DNA DnaSeqs DnaCells RNA RnaSeqs RnaCells Protein ProteinSeqs ProteinCells Standard StandardSeqs StandardCells Continuous ContinuousSeqs ContinuousCells Restriction RestrictionSeqs RestrictionCells

  20. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Implementation (6/6)Tree Classes Float Int Network FloatNetwork IntNetwork Tree FloatTree IntTree

  21. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status Schema blocks     Parsers & writers     Experiments     To do Resources Current status (1/4)Schema blocks Done: OTUs characters: dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) trees: graphml trees and networks, various edge formats and rootings

  22. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks Parsers & writers     Experiments     To do Resources Current status (2/4)Parsers and writers Nexml parsers and writers: mesquite (java NeXML class libraries) Bio::Phylo (BioPerl compatible) pyNexml (python) DAMBE (Visual Basic) NCL (C++) JavaScript

  23. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers Experiments     To do Resources Current status (3/4)Experiments • Scalability: • Indexed files in dbxml • Created large files from tolweb, rbcl • XInclude with tinyseq xml Semantic annotation (CDAO) using SAWSDL • REST Web services: • ToL service • validation service • nexml2json, nexus2xml • Schema inclusion in wsdl

  24. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments To do Resources Current status (4/4)To do Publish standard More restricted vocabulary attachments (e.g. Darwin core, CDAO-mediated terms) Substitution model descriptions Sets (in progress, using class identifiers) Distances Splits

  25. Introduction     The problem     EvoInfo interests     This subproject     Nexus issues     Parsing     Extensibility     XML goodies Design     Principles     Re-use     Patterns     Inheritance     References Implementation     Approach     ERD     Inheritance     Anatomy     Characters     Trees Current status     Schema blocks     Parsers & writers     Experiments     To do Resources Resources NeXML Base URL: http://www.nexml.org Wiki: /wiki Mailing list: /mail Issue tracker: /tracker SVN repository: /code EvoInfo: http://evoinfo.nescent.org  CDAO: http://www.evolutionaryontology.org

  26. Acknowledgements Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison Additional funding, support: NESCent, GSoC

More Related