250 likes | 263 Views
Addressing the need for a new data exchange standard in phylogenetics, this project aims to create Nexml - an extensible file format to overcome issues present in existing standards like Nexus.
E N D
NexmlA future data exchange standard for phylogenetics Rutger Vos University of British Columbia
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (1/7)The problem Increased automation in evolutionary informatics is hampered by poorly defined “standards”
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (2/7)EvoInfo interests Semantics: CDAO Addressing interoperability problems by coding our way out of it Syntax: Nexml Transport: PhyloWS
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (3/7)This subproject’s mission • To create a file format like nexus*, but: • Fix (some) problems with nexus • Give access to data at higher level • Be extensible • Expose data to xml goodies *Maddison, Swofford and Maddison, 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol.46(4):590-621
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (4/7)Nexus problems • Hard/impossible to validate • No explicit versions • Nothing ever deprecated • No public extensions • Leads to hacks such as ‘mixed’ data, ‘hot comments’ • Phylogenetics post-’80s in private blocks
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (5/7)Parsing plain text versus parsing XML • Processing nexus data involves lexing + parsing + processing • XML allows choosing a parser library, data can be processed as a structure that hides tokenization issues
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (6/7)Extensibility • ‘Extensible’ file format should provide the ability to: • define new data types that implement described ‘interfaces’ • attach typed data structures to core types • attach custom XML
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (7/7)XML goodies • Large stack of off-the-shelf tools: • XML parser libraries • Web service toolkits • Native XML databases • Editors / IDEs • Serialization / data binding tools
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (1/5)Design principles • Re-use of prior art • Follow design patterns • Referencing • Verbose and compact representations
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (2/5)Re-use of prior art • Generic key/value attachments following apple’s plist semantics: <dict> <key>prior</key> <float>0.78</float> </dict> • Trees and networks following graphml • General file structure following nexus concepts, i.e. blocks that reference each other
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (3/5)XML design patterns • http://www.xmlpatterns.com • “Declare before use” • “Metadata first” • “Venetian blinds” • Abstract inheritance through extension, concrete inheritance through restriction
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (4/5)Inheritance “Base”, optional base/lang/href attributes extends “Annotated”, optional dict elements extends “Labelled”, optional label attribute extends “IDTagged”, required id attribute extends “AbstractElement”, in root schema restricts “ConcreteElement”, in instance document
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (5/5)Referencing • Elements sometimes refer to other elements, much like in nexus • In nexml, elements refer to the id of other elements by the name of the referenced element: <otu id="t1"/> <!-- i.e. OTU, referenced later as: --> <node id="n1" otu="t1"/>
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (1/6)Approach • Schema design • Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. • Processors (perl, java, python, c++, VB) development in parallel • Experiments with xml tools (ws, db, data binding tools)
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (2/6) root element • version="1.0" • generator="mesquite" • Versioned namespace: xmlns:nex="http://www.nexml.org/1.0"
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (3/6)inheritance tree for elements
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (4/6) anatomy of a “block” <characters id="c1" xsi:type="nex:DnaSeqs" otus="t1"> </characters> <dict> <key>desc</key> <string>description…</string> </dict> Contents…
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (5/6)Character Classes Granularity Data type
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (6/6)Tree Classes Branch type Topology
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (1/4)Schema blocks • Done: • OTUs • characters: dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) • trees: graphml trees and networks, various edge formats and rootings
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (2/4)Parsers and writers • Nexml parsers and writers: • mesquite, java, using xmlbeans • Bio::Phylo, perl • pyNexml, python • DAMBE, Visual Basic • stubs for c++ xmlbeans • plans for ruby?
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (3/4)Experiments • Included schema in soap wsdl • Indexed files in dbxml • Created large files from tolweb, rbcl • XInclude with tinyseq xml • REST service described using nexml
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (4/4)To do • Cross-reference with glossary, ontology • Substitution model descriptions • Publish standard • Follow up on earlier feedback (small fixes) • Sets (in progress, using class identifiers) • more restricted vocabulary attachments (Darwin core) • Distances • Splits
Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Resources Base URL http://www.nexml.org Wiki https://www.nescent.org/wg_evoinfo/Future_Data_Exchange_Standard SourceForge project http://sourceforge.net/projects/nexml/
Acknowledgements • Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia • Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison • Additional funding, support: NESCent, GSoC