200 likes | 217 Views
Explore critical design points of SimDAL, focus on simulation results access. SimTAP offers easy querying & publication methods, ensuring data standardization and protocol compatibility.
E N D
SimDAL discussions • NCSA, Urbana, May 22th 2012 David Languignon
Content • (Re) introduce SimDAL • Highlight critical design points to discuss today
SimDAL goals • publish simulations results/metadata an easy way • access simulations results/metadata an easy & standard way • same for ‘raw’ simulation material (raw code output etc...)
Proposal Overview • SimDAL components Registries
Discovery Use Case Summary • Simulation discovery • What kind of model is being offered ? • What parameters characterize the model ? • What is the physical meaning of those parameters ? • Simulation’s output datasets discovery • What results can be retrieved ? • What kind of results can be retrieved (meaning) ? All the information is available in SimDM
Problem • SimDM is hard to • Fill data in (publisher) • Query (user)
SimTAP to the rescue • “Formally, SimTAP is a TAP service on top of a table schema that is constrained by one or more instances of the SimDM:resource/protocol/Protocol class defined in the simulation data model” • as formulated by G. Lemson
SimTAP to the rescue • a protocol description file (SimDM Protocol entity) • + the corresponding xsd (SimDM XML serialization) • An algorithm to make a Simple, Flat DataModel from Protocol description • a TAP service to access the new DataModel implementation • a mapping between : • the new DataModel class attributes (# tables fields) • the original Protocol elements
SimTAP daily use • Query (users) must be easy • simple TAP query (flat model) • use of metadata of protocol.xml + mapping file (semantic) • Publication (publishers) must be easy • protocol.xml file • mapping file • plaintext data file used tol fill the database (csv like)
SimTAP pros • Mainly intended for single (flat) table queries • tiny subset of SQL (ADQL) : easy to use ! • fast, no joins (datawarehouse like schema) • tables easy to fill in • Publisher only has to make 2 simple files • TAP compliant • effort/code re-use ! • compliant with VO tools (TopCat etc...)
SimTAP cons • another DM • may miss some SimDM informations
Specifications cutout: string list list list -> dataset to extract a subdataset of datasetId restricted according to attributes_restriction and where only attributes_list attributes of subdataset's objects are present. Apply provided options. cutout(dataset_id, attributes_list, attributes_restrictions_list, options_list) Acces raw data : cutout • huge data : need subset extraction • uws service : async for large data extraction
cutout : example original_dataset <- { id:Halo23_ramses_34, data: [ {mass:1.23e2, nbr_part:3.45e5, ener_pot:2.01, x:1, y:2,z:0}, {mass:1.03e2, nbr_part:2.89e5, ener_pot:1.71, x:23,y:4,z:4}, {mass:3.673e3, nbr_part:9.45e5, ener_pot:2.41, x:4,y:5,z:3}, {mass:1.2e1, nbr_part:1.45e3, ener_pot:0.81, x:3,y:7,z:3} ] } attribute_list <- [mass,nbr_part] attribute_restriction_list <- [ {attribute : x,condition: gt,restriction:0}, {attribute : x,condition: lt,restriction:15}, {attribute : y,condition: gt,restriction:3}, {attribute : y,condition: lt,restriction:8}, {attribute : z,condition: gt,restriction:2}, {attribute : z,condition: lt,restriction:4}, {attribute : mass, condition: ordered, restriction:asc} ] cutout(Halo23_ramses_34,attribute_list, attribute_restriction_list) should produce : [ {mass:1.2e1, nbr_part:1.45e3, ener_pot:0.81, x:3,y:7,z:3}, {mass:3.673e3, nbr_part:9.45e5, ener_pot:2.41, x:4,y:5,z:3} ]
Discussion cutout • Which output data format should we standardize ? • votable • fits • hdf5 • vtk
Registries Discussion : What to put in registries ? • “SimDAL” service url • skos concepts list • redundant with protocol.xml but allows faster and direct research at registry level • protocol.xml..... see SimTAP presentation
Discusion Preview • Should we define a preview feature in (Sim)TAP ? • per column preview (preview field in TAP_SCHEMA) • per line preview (column name standardized but not mandatory) • URL toward • www browser displayable file • xml (Datalink ?) listing several browser displayable files • VoTable integration ? • through VOTable LINK with content-role = “preview”
<?xmlversion="1.0"encoding="utf-8"?> <DATALINK> <LINK> <URL>http://roxxor.obspm.fr/deuvo-ui/dfiles//simtap.objects_34_halo/votable?select=x%2Cy%2Cz%2Cmass%2Cnpart&where=npart+%3E+2e4</URL> <MIME>application/xml</MIME> <DESCRIPTION>subdataset of the fof halo finder postprocessing on top of a Ratra-Peebles universe simulation (boxlength 162, resolution 1024, z=1.5) output. Constraints are number of particles gt 2e4</DESCRIPTION> <SIZE>unknown</SIZE> </LINK> <LINK> <URL> http://roxxor.obspm.fr/deuvo-ui/dfiles//simtap.objects_32_halo/votable?select=x%2Cy%2Cz%2Cmass%2Cnpart&where=npart+%3E+2e4</URL> <MIME>application/xml</MIME> <DESCRIPTION>subdataset of the fof halo finder postprocessing on top of a Ratra-Peebles universe simulation (boxlength 162, resolution 1024, z=2.33) output. Constraints are number of particles gt 2e4</DESCRIPTION> <SIZE>unknown</SIZE> </LINK> </DATALINK> preview : example
Discussion Groups • No group feature in standard TAP • Very useful (required ?) for numerical simulations • Huge amount of columns • Need a grouping feature (at least for display) • group ok in VoTable The information is in SimDM, so must be in SimDAL
Discusison SKOS • How to integrate skos concepts in TAP ? • just put it in place of ucd ? • How to integrate skos in VoTable ? • ucd • through VOTable LINK with content-role = “skos”
Discussion summary • SimTAP DM derivation algorithm : (G. Lemson) • registries : skos list • preview : url target, integration in VOTABLE • groups : how to integrate in TAP_SCHEMA • skos : how to integrate in TAP_SCHEMA, VOTABLE • cutout : output format