1 / 25

Nexml A future data exchange standard for phylogenetics

Addressing the need for a new data exchange standard in phylogenetics, this project aims to create Nexml - an extensible file format to overcome issues present in existing standards like Nexus.

dmadsen
Download Presentation

Nexml A future data exchange standard for phylogenetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NexmlA future data exchange standard for phylogenetics Rutger Vos University of British Columbia

  2. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (1/7)The problem Increased automation in evolutionary informatics is hampered by poorly defined “standards”

  3. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (2/7)EvoInfo interests Semantics: CDAO Addressing interoperability problems by coding our way out of it Syntax: Nexml Transport: PhyloWS

  4. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (3/7)This subproject’s mission • To create a file format like nexus*, but: • Fix (some) problems with nexus • Give access to data at higher level • Be extensible • Expose data to xml goodies *Maddison, Swofford and Maddison, 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol.46(4):590-621

  5. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (4/7)Nexus problems • Hard/impossible to validate • No explicit versions • Nothing ever deprecated • No public extensions • Leads to hacks such as ‘mixed’ data, ‘hot comments’ • Phylogenetics post-’80s in private blocks

  6. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (5/7)Parsing plain text versus parsing XML • Processing nexus data involves lexing + parsing + processing • XML allows choosing a parser library, data can be processed as a structure that hides tokenization issues

  7. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (6/7)Extensibility • ‘Extensible’ file format should provide the ability to: • define new data types that implement described ‘interfaces’ • attach typed data structures to core types • attach custom XML

  8. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (7/7)XML goodies • Large stack of off-the-shelf tools: • XML parser libraries • Web service toolkits • Native XML databases • Editors / IDEs • Serialization / data binding tools

  9. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (1/5)Design principles • Re-use of prior art • Follow design patterns • Referencing • Verbose and compact representations

  10. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (2/5)Re-use of prior art • Generic key/value attachments following apple’s plist semantics: <dict> <key>prior</key> <float>0.78</float> </dict> • Trees and networks following graphml • General file structure following nexus concepts, i.e. blocks that reference each other

  11. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (3/5)XML design patterns • http://www.xmlpatterns.com • “Declare before use” • “Metadata first” • “Venetian blinds” • Abstract inheritance through extension, concrete inheritance through restriction

  12. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (4/5)Inheritance “Base”, optional base/lang/href attributes extends “Annotated”, optional dict elements extends “Labelled”, optional label attribute extends “IDTagged”, required id attribute extends “AbstractElement”, in root schema restricts “ConcreteElement”, in instance document

  13. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (5/5)Referencing • Elements sometimes refer to other elements, much like in nexus • In nexml, elements refer to the id of other elements by the name of the referenced element: <otu id="t1"/> <!-- i.e. OTU, referenced later as: --> <node id="n1" otu="t1"/>

  14. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (1/6)Approach • Schema design • Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. • Processors (perl, java, python, c++, VB) development in parallel • Experiments with xml tools (ws, db, data binding tools)

  15. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (2/6) root element • version="1.0" • generator="mesquite" • Versioned namespace: xmlns:nex="http://www.nexml.org/1.0"

  16. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (3/6)inheritance tree for elements

  17. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (4/6) anatomy of a “block” <characters id="c1" xsi:type="nex:DnaSeqs" otus="t1"> </characters> <dict> <key>desc</key> <string>description…</string> </dict> Contents…

  18. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (5/6)Character Classes Granularity Data type

  19. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (6/6)Tree Classes Branch type Topology

  20. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (1/4)Schema blocks • Done: • OTUs • characters: dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) • trees: graphml trees and networks, various edge formats and rootings

  21. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (2/4)Parsers and writers • Nexml parsers and writers: • mesquite, java, using xmlbeans • Bio::Phylo, perl • pyNexml, python • DAMBE, Visual Basic • stubs for c++ xmlbeans • plans for ruby?

  22. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (3/4)Experiments • Included schema in soap wsdl • Indexed files in dbxml • Created large files from tolweb, rbcl • XInclude with tinyseq xml • REST service described using nexml

  23. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (4/4)To do • Cross-reference with glossary, ontology • Substitution model descriptions • Publish standard • Follow up on earlier feedback (small fixes) • Sets (in progress, using class identifiers) • more restricted vocabulary attachments (Darwin core) • Distances • Splits

  24. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Resources Base URL http://www.nexml.org Wiki https://www.nescent.org/wg_evoinfo/Future_Data_Exchange_Standard SourceForge project http://sourceforge.net/projects/nexml/

  25. Acknowledgements • Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia • Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison • Additional funding, support: NESCent, GSoC

More Related