200 likes | 217 Views
Explore the integration of SimDAL and SimTAP for easy simulations access, metadata publication, and data query. Discuss critical design points, SimDM utilization, and the benefits and limitations of SimTAP.
E N D
SimDAL discussions • NCSA, Urbana, May 22th 2012 David Languignon
Content • (Re) introduce SimDAL • Highlight critical design points to discuss today
SimDAL goals • publish simulations results/metadata an easy way • access simulations results/metadata an easy & standard way • same for ‘raw’ simulation material (raw code output etc...)
Proposal Overview • SimDAL components Registries
Discovery Use Case Summary • Simulation discovery • What kind of model is being offered ? • What parameters characterize the model ? • What is the physical meaning of those parameters ? • Simulation’s output datasets discovery • What results can be retrieved ? • What kind of results can be retrieved (meaning) ? All the information is available in SimDM
Problem • SimDM is hard to • Fill data in (publisher) • Query (user)
SimTAP to the rescue • “Formally, SimTAP is a TAP service on top of a table schema that is constrained by one or more instances of the SimDM:resource/protocol/Protocol class defined in the simulation data model” • as formulated by G. Lemson
SimTAP to the rescue • a protocol description file (SimDM Protocol entity) • + the corresponding xsd (SimDM XML serialization) • An algorithm to make a Simple, Flat DataModel from Protocol description • a TAP service to access the new DataModel implementation • a mapping between : • the new DataModel class attributes (# tables fields) • the original Protocol elements
SimTAP daily use • Query (users) must be easy • simple TAP query (flat model) • use of metadata of protocol.xml + mapping file (semantic) • Publication (publishers) must be easy • protocol.xml file • mapping file • plaintext data file used tol fill the database (csv like)
SimTAP pros • Mainly intended for single (flat) table queries • tiny subset of SQL (ADQL) : easy to use ! • fast, no joins (datawarehouse like schema) • tables easy to fill in • Publisher only has to make 2 simple files • TAP compliant • effort/code re-use ! • compliant with VO tools (TopCat etc...)
SimTAP cons • another DM • may miss some SimDM informations
Specifications cutout: string list list list -> dataset to extract a subdataset of datasetId restricted according to attributes_restriction and where only attributes_list attributes of subdataset's objects are present. Apply provided options. cutout(dataset_id, attributes_list, attributes_restrictions_list, options_list) Acces raw data : cutout • huge data : need subset extraction • uws service : async for large data extraction
cutout : example original_dataset <- { id:Halo23_ramses_34, data: [ {mass:1.23e2, nbr_part:3.45e5, ener_pot:2.01, x:1, y:2,z:0}, {mass:1.03e2, nbr_part:2.89e5, ener_pot:1.71, x:23,y:4,z:4}, {mass:3.673e3, nbr_part:9.45e5, ener_pot:2.41, x:4,y:5,z:3}, {mass:1.2e1, nbr_part:1.45e3, ener_pot:0.81, x:3,y:7,z:3} ] } attribute_list <- [mass,nbr_part] attribute_restriction_list <- [ {attribute : x,condition: gt,restriction:0}, {attribute : x,condition: lt,restriction:15}, {attribute : y,condition: gt,restriction:3}, {attribute : y,condition: lt,restriction:8}, {attribute : z,condition: gt,restriction:2}, {attribute : z,condition: lt,restriction:4}, {attribute : mass, condition: ordered, restriction:asc} ] cutout(Halo23_ramses_34,attribute_list, attribute_restriction_list) should produce : [ {mass:1.2e1, nbr_part:1.45e3, ener_pot:0.81, x:3,y:7,z:3}, {mass:3.673e3, nbr_part:9.45e5, ener_pot:2.41, x:4,y:5,z:3} ]
Discussion cutout • Which output data format should we standardize ? • votable • fits • hdf5 • vtk
Registries Discussion : What to put in registries ? • “SimDAL” service url • skos concepts list • redundant with protocol.xml but allows faster and direct research at registry level • protocol.xml..... see SimTAP presentation
Discusion Preview • Should we define a preview feature in (Sim)TAP ? • per column preview (preview field in TAP_SCHEMA) • per line preview (column name standardized but not mandatory) • URL toward • www browser displayable file • xml (Datalink ?) listing several browser displayable files • VoTable integration ? • through VOTable LINK with content-role = “preview”
<?xmlversion="1.0"encoding="utf-8"?> <DATALINK> <LINK> <URL>http://roxxor.obspm.fr/deuvo-ui/dfiles//simtap.objects_34_halo/votable?select=x%2Cy%2Cz%2Cmass%2Cnpart&where=npart+%3E+2e4</URL> <MIME>application/xml</MIME> <DESCRIPTION>subdataset of the fof halo finder postprocessing on top of a Ratra-Peebles universe simulation (boxlength 162, resolution 1024, z=1.5) output. Constraints are number of particles gt 2e4</DESCRIPTION> <SIZE>unknown</SIZE> </LINK> <LINK> <URL> http://roxxor.obspm.fr/deuvo-ui/dfiles//simtap.objects_32_halo/votable?select=x%2Cy%2Cz%2Cmass%2Cnpart&where=npart+%3E+2e4</URL> <MIME>application/xml</MIME> <DESCRIPTION>subdataset of the fof halo finder postprocessing on top of a Ratra-Peebles universe simulation (boxlength 162, resolution 1024, z=2.33) output. Constraints are number of particles gt 2e4</DESCRIPTION> <SIZE>unknown</SIZE> </LINK> </DATALINK> preview : example
Discussion Groups • No group feature in standard TAP • Very useful (required ?) for numerical simulations • Huge amount of columns • Need a grouping feature (at least for display) • group ok in VoTable The information is in SimDM, so must be in SimDAL
Discusison SKOS • How to integrate skos concepts in TAP ? • just put it in place of ucd ? • How to integrate skos in VoTable ? • ucd • through VOTable LINK with content-role = “skos”
Discussion summary • SimTAP DM derivation algorithm : (G. Lemson) • registries : skos list • preview : url target, integration in VOTABLE • groups : how to integrate in TAP_SCHEMA • skos : how to integrate in TAP_SCHEMA, VOTABLE • cutout : output format