110 likes | 191 Views
Homogeneous Access to Tabular Data Aur é lien St é b é I ñ aki Ortiz, Kona Andrews, Guy Rixon. Introduction. Three input query methods: Simple Access Query - Key Value Pair filters Complete Access Query - ADQL (synchronous) Asynchronous Querying - ADQL and UWS
E N D
Homogeneous Access to Tabular Data Aurélien Stébé Iñaki Ortiz, Kona Andrews, Guy Rixon
Introduction • Three input query methods: • Simple Access Query - Key Value Pair filters • Complete Access Query - ADQL (synchronous) • Asynchronous Querying - ADQL and UWS • Output result formats / error response handling common to all methods • New proposals for output format selection, empty results, error messages http://esavo.esac.esa.int/doc/Homogeneous_Access.pdf
Simple Access Query • Intended to be easy to implement for server and client • Uses Key-Value Pairs to filter the data to be returned • Allows minimal control over quantity of data to be returned • Presents dataset as flat, single table, hiding inner structure • Invoked via HTTP GET to: • http://service.endpoint.saq{?,&}PARAM=value[&…] • PARAMs should not modify output format or alter data • PARAMs may only limit the quantity of data (rows), the type of specific data or the quantity of information (columns) • Define generic types / families of parameters • Reserve a few parameter names (POS, SIZE, BAND, TIME, …)
Simple Access Query - PARAMs • Single value type: • PARAM_NAME=value • PARAM_column = “value” • PARAM_column > value (upper limit parameter) • List value type: • PARAM_NAME=value1,value2,value3 • PARAM_column IN (“value1”, “value2”, “value3”) • Interval value type: • PARAM_NAME=valueMIN/valueMAX • PARAM_column BETWEEN valueMIN AND valueMAX • PARAM_column > valueMIN (open upper limit interval) • Interval PARAM type: • PARAM_columnMIN < value AND PARAM_columnMAX > value • PARAM_columnMIN < valueMAX AND PARAM_columnMAX > valueMIN • PARAM_columnMAX > valueMIN (open upper limit interval)
Complete Access Query • Intended to give total access and control over the dataset • Uses full query language (defined in other specifications) • Makes dataset’s inner structure available to the client • Users become responsible for data-level considerations • Invoked via HTTP POST to: • http://service.endpoint.caq • Sending the message body: • queryType=queryString • Three query types: • nativeADQL - ADQL against the service’s table/column names • uTypeADQL - ADQL against formal data model using uTypes • directQuery - pass-through to the DBMS, any query language
Asynchronous Querying • Interface identical to the Complete Access Query, plus DEST • Using the UWS for job management and workflow • The DEST parameter for delivery: • DEST=LOCAL • For local staging of the data at the service • DEST=http://my.server/~john/out.vot • DEST=ftp://my.server/~john/out.vot • DEST=vos://my.server!vospace/john/out.vot • For respectively HTTP, FTP or VOSpace delivery
Output Result Formats • Various formats possible, default is VOTable-v1.1 • Two methods to select the output format: • HTTP header “Accept:” with MIME types • OUTPUT parameter with fixed values • The OUTPUT method always overrides the “Accept:” one • Response to valid queries must have the “200 OK” status code and “Content-Type:” header with MIME type of the output format • Empty response to valid queries must have the “204 No Content” status code and no message body • Formats defined: VOTable, CSV or TSV, XML, …
Output Result Formats • The use of OUTPUT and “Accept:” HTTP header • Advantage: consistent across query methods, allows direct human readable webpage interface for a web browser • Disadvantage: output format must be a MIME type, need two methods to select format, because “Accept:” cannot always be used • Alternative: only use the OUTPUT parameter method • NOTE: OUTPUT is not the equivalent to FORMAT from DAL • The use of “204 No Content” for empty responses • Advantage: consistent across result formats, processing power spared • Disadvantage: current clients may not check for this status code • Alternative: return empty VOTable (or equivalent in output format)
Metadata Access Format • All the information needed to call the service should come from it • Should come from a unique endpoint to invoke via HTTP GET • For Simple Access Query, we need: input PARAMs, output FIELDs • For Complete Access Query, we need: tables/columns information • Encoding this information in Registry format would ease many things • If filling those requirements, would use VOSI for Metadata access
Error Responses • Error output is done using the HTTP error codes • Here are a few examples: • 400 Bad Request - malformed input query • 404 Not Found - query to unsupported method • 500 Internal Server Error - general misc server error • 501 Not Implemented - unimplemented optional method • 502 Bad Gateway - backend error (DBMS, store, …) • The message body should contain the error text explanation • Advantage: same system regardless of output format • Disadvantage: limited list of codes we don’t control • Alternative: classical VOTable-based error output
Conclusion • Need work on Metadata access, reserved PARAMs list, output format, empty responses, error mechanism to be complete • Compliant with second generation DAL services: the Simple Access Query method represents the first step (queryData method) • Questions ?