1 / 10

SPARQL 201: Construct queries and data maintenance

SPARQL 201: Construct queries and data maintenance. Nicholas Rejack – nrejack@ufl.edu VIVO Implementation Fest – Boulder, CO Wednesday, May 16, 2012 – 3 :30 – 4:15 PM. Queries for exploring unfamiliar data. When encountering unfamiliar endpoints, how do you explore them?

Download Presentation

SPARQL 201: Construct queries and data maintenance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPARQL 201: Construct queries and data maintenance Nicholas Rejack – nrejack@ufl.edu VIVO Implementation Fest – Boulder, CO Wednesday, May 16, 2012 – 3:30 – 4:15 PM

  2. Queries for exploring unfamiliar data • When encountering unfamiliar endpoints, how do you explore them? • First: check what ontologies exist (Ex. (1)) • Second: find out about the ontologies (2) • Third: Look at all the classes (3), object properties (4), datatype properties (5) • Examine them further, and so on… • How do we know which classes are populated, and how many?

  3. CONSTRUCT: creating data for removal and transform • CONSTRUCT syntax: • CONSTRUCT graph pattern • WHERE • Matching graph pattern • Output to RDF/XML, etc. • -uses: get all the uses of 1 predicate out of VIVO • transform all data in a predictable way • Examples (6) (7)

  4. Finding data that may not be there: OPTIONAL • Graph pattern matches require matches on all the terms • What if you are missing some terms? Use OPTIONAL clause • Warning: multiple OPTIONALs can cause performance decrease • Syntax: • SELECT * • WHERE • { • OPTIONAL { graph pattern } • } • (8)

  5. Sorting: ORDER BY • Order results by using ORDER BY keyword • Can even use bound variable that does not appear in results (e.g. not in SELECT clause) • Syntax: • SELECT * • WHERE • { • Graph pattern • } • ORDER BY bound variable • (9), (10)

  6. Negation • Two options: NOT EXISTS and !bound • Syntax: • SELECT * WHERE • { • NOT EXISTS { graph pattern } • } • (11) • SELECT * • WHERE • { • Graph pattern • OPTIONAL { graph pattern with ?bound variable} • FILTER (!bound(?bound variable))} • (12) (12b)

  7. Data cleanup • SPARQL is one of the best tools for data cleanup. • Use cases: • Grab a batch of related statements. Delete them en masse. • Generate missing data, upload. • Find missing property statements, such as people with no link to a position. Use !BOUND. • Find data that is less than the required # of digits. • Find similar names. Match on last name and first initial, first 2 letters, etc. • Examine everything in a certain class that isn’t in another specified class.

  8. Regex matching • A couple versions: • Matching string of particular length: • SELECT * WHERE • { • Graph pattern • FILTER regex(bound variable, “condition”)) • } • (14) • Match eight characters ending with 1: • Condition = “…….1” • Limit to a certain length: • Condition = “…….+” • Exact match: • Condition = “matching_string” (note: regex not required, can do ?org ufVivo:harvestedBy "DSR-Harvester" . • Use negation where applicable. • Match beginning of string: • Condition: “^abcdef"

  9. GROUP BY and HAVING • When variables have multiple bindings, you get a returned row for each. • E.g.: “show me everyone’s label” returns a row for each URI assignment to each label- if a URI has 4 labels, you get 4 returns • Not useful for counting • Use GROUP BY to collapse on a particular bound variable • Use HAVING to filter on numeric expression • Example: find entities with > 1 label (15) • Side note: use >, < with numeric values to filter • E.g., “FILTER (?value < 10)”

  10. The black art of query optimization • SPARQL can be very slow. • How to increase the speed of returns: • 1) Minimize the number of OPTIONALs you use • 2) “Pre-load” your queries by reducing your result set earlier: • Instead of: • ?x rdf:typefoaf:Person . • ?x ufVivo:ufid ?ufid . • Reduce the result set with: • ?x ufVivo:ufid ?ufid (we assume only people have UFIDs) • 3) Change scope by wrapping lines in { } . Experiment! (Thanks to Alex) • 4) Write your queries in an iterative fashion- comment out lines (#) and slowly increase the complexity

More Related