240 likes | 314 Views
The DAS Protocol. Andy Jenkinson , EBI. Summary of Topics. Technical overview Principles of communication Pros and cons DAS capabilities. DAS Architecture. A client asks for data from many servers HTTP requests identically structured URLs, the same parameters
E N D
The DAS Protocol Andy Jenkinson, EBI
Summary of Topics Technical overview Principles of communication Pros and cons DAS capabilities
DAS Architecture • A client asks for data from many servers • HTTP requests • identically structured URLs, the same parameters • Each server behaves in the same way • pre-defined set of behaviours • e.g. provide a sequence, provide annotations of a sequence • Each server provides different data in the same format • DAS-XML
DAS Concepts Reference object usually a sequence e.g. “chromosome X” or “NT_025741” Annotation information attached to a location within a segment e.g. “substitution at residue 326 of BRCA1”
DAS Concepts Reference server server that provides “core” reference object data e.g. GRCh37 sequence data Annotation server server that provides annotations of reference objects Segment part of a reference object e.g. “bases 100 to 200 of chromosome X” ties together annotation and reference servers
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML Keyword: constrained
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML Data transport Standard HTTP Includes compression Some additional headers, e.g. to indicate DAS version
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML Well-defined query URLs A client can issue a command http://das.sanger.ac.uk/das/ccds_mouse/features?segment=... ^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ site prefix das source command arguments
The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML XML format server responds with a simple XML document <SEGMENT id=“X” start=“1” end=“100”> <FEATURE id=“exon1”> <TYPE id=“exon”>exon</TYPE>
Why DAS? Fast, targeted queries suitable for visual display Based on existing simple tech XML/HTTP/CGI “dumb server, clever client” - relatively low knowledge barrier for bioinformaticians with data to expose Scalable integrators (client software) get more data for zero cost
Why not DAS? One-dimensional queries query only by sequence position not by developmental stage, tissue type, etc (yet) Constrained generic format clients aren’t “tailored” to each data source possible data types are to some extent limited Not semantically rich ontology support optional
Commands: the basics Sequence give me the DNA sequence for a given segment of a reference object e.g. “bases 100k – 200k of chromosome 15” Features give me all annotations offered by the data source that are attached to a given segment of the sequence
The sequence command /das/<source>/sequence?<params> Parameters: segment=ID:start,end (one or more) ID of reference object Example: /das/<source>/sequence?segment=X:100,200 ;segment=Y:500,600
The sequence command Response: <DASSEQUENCE> <SEQUENCE id="X” start="100” stop="200” version="1.0”> cctgagccagcagtggcaacccaatggggtccctttcca... </SEQUENCE> <SEQUENCE id=”Y” start=”500” stop=”600” version="1.0”> ctggacagcccggaaaatgagctcctcatctctaaccca... </SEQUENCE> </DASSEQUENCE>
The features command /das/<source>/features?<params> Parameters: segment=ID:start,end (one or more) type=foo (zero or more) category=bar (zero or more) Example: /das/<source>/features?segment=X:100,200 ;segment=Y:500,600 ;type=SNP
The features command Response: <DASGFF> <GFF version="1.01" href=”..."> <SEGMENT id="X" start="100" stop="200"> <FEATURE id="X"> <START>100</START> <END>200</END> <TYPE id=”SNP” category=”variation">SNP</TYPE> <METHOD id=”sequencing">sequencing</METHOD> <SCORE>86.4</SCORE> <ORIENTATION>+</ORIENTATION> </FEATURE> ...
Other Commands Stylesheet hints on how to render different types of feature e.g. “exons as blue boxes, SNPs as red triangles” /das/<source>/stylesheet Types lists the types of feature available /das/<source>/types
Metadata Can make a client that knows how to query a server and parse the response BUT something missing… which data sources are available on a server? which commands does a source support? what kind of reference objects does it know about?
The sources command <server>/das/sources Lists a server’s data sources For each source: text description list of “capabilities” (commands) list of coordinate systems (type of reference object) etc
DAS Registry third component of DAS catalogue of DAS sources Human interface validate, register, search, view statistics Programmatic interface http://www.dasregistry.org/das/sources http://www.dasregistry.org/das/coordinatesystem http://www.dasregistry.org/das/organism
Links DAS Homepage http://www.biodas.org/ DAS Specification http://www.biodas.org/documents/spec-1.6.html DAS in Ensembl: http://www.ensembl.org/info/docs/das/index.html Mailing list: http://biodas.org/mailman/listinfo/das