300 likes | 386 Views
Open Data in Agriculture. Hands-on with data infrastructures that can power your agricultural data products. 12/12/2013 Athens, Greece. Supported by EU projects. OpenLearn and the SPARQL endpoint. Maths, Computing and Technology Faculty The Open University Walton Hall Milton Keynes MK7 6AA.
E N D
Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products 12/12/2013 Athens, Greece Supported by EU projects
Maths, Computing and Technology FacultyThe Open UniversityWalton HallMilton KeynesMK7 6AA www.open.ac.uk mct-research.open.ac.uk Jane Bromley David King David Morse
Objectives • An introduction to the Open University’s free material • Show available metadata • Talk about RDF – the format used for graph databases • How to query the material through SPARQL
http://www.open.edu/openlearn/body-mind/the-real-story-behind-cerealshttp://www.open.edu/openlearn/body-mind/the-real-story-behind-cereals
http://www.open.edu/openlearn/nature-environment/good-food-destroying-biodiversityhttp://www.open.edu/openlearn/nature-environment/good-food-destroying-biodiversity
http://www.open.edu/openlearn/science-maths-technology/science/biofuels/content-section-0http://www.open.edu/openlearn/science-maths-technology/science/biofuels/content-section-0
Open Research Online – publications originating from OU researchers OU Podcasts Course Descriptions Some KMi datasets And…
http://data.open.ac.uk/site/datasets.html Available through standard formats (RDF and SPARQL)
RDF Resource Description Framework • one of the basic building blocks forming web of semantic data • defines a graph database • format defines statements comprising: Subject is the T-shirt Predicate (property) is the colour Object is white subject->predicate->object relationship is called a triple. RDF/XML - the XML form of RDF <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:feature="http://www.linkeddatatools.com/clothing-features#"> <rdf:Description rdf:about="http://www.linkeddatatools.com/clothes#t-shirt <feature:color rdf:resource="http://www.linkeddatatools.com/colors#white"/> </rdf:Description> </rdf:RDF>
The SPARQL endpoint http://data.open.ac.uk/query
selectdistinct?props from <http://data.open.ac.uk/context/openlearn> where { ?subj?props ?obj }
http://www.open.edu/openlearn/science-maths-technology/science/biofuels/content-section-0http://www.open.edu/openlearn/science-maths-technology/science/biofuels/content-section-0
A three step process: • Find all the subjects and chose those relevant to agriculture 2. Find all the OpenLearn Units that have just these subjects 3. Collect the metadata for each of the selected Open Learn units
(1130) as of end of October 2013 http://data.open.ac.uk/topic/psychology http://data.open.ac.uk/topic/sociology http://data.open.ac.uk/topic/social_care http://data.open.ac.uk/topic/educational_practice http://data.open.ac.uk/topic/biology http://data.open.ac.uk/topic/herbicides http://data.open.ac.uk/topic/energyofficial1342688874openlearn_teamadmin http://data.open.ac.uk/topic/unitsdefault1330523206frank_siebertzz884926 http://data.open.ac.uk/topic/pre_course_workdefault1263940536linda_smithlps32 http://data.open.ac.uk/topic/employmentofficial1342688874richard_howesrh4685 http://data.open.ac.uk/topic/using_mathsdefault1231080717peter_mcalisterzz298445 http://data.open.ac.uk/topic/numbersdefault1330523196elizabeth_ellisee944 http://data.open.ac.uk/topic/nuclearofficial1342688874lucy_hendylmf7 http://data.open.ac.uk/topic/environmental_science http://data.open.ac.uk/topic/audio http://data.open.ac.uk/topic/cctv http://data.open.ac.uk/topic/social_workhttp://data.open.ac.uk/topic/scotland http://data.open.ac.uk/topic/personalisation http://data.open.ac.uk/topic/religious_studieshttp://data.open.ac.uk/topic/religion …
Topics relevant to agriculture? 40 topics chosen: <http://data.open.ac.uk/topic/agriculture>, <http://data.open.ac.uk/topic/environment>, <http://data.open.ac.uk/topic/the_environment>, <http://data.open.ac.uk/topic/nature_&_environment> <http://data.open.ac.uk/topic/environmental_science>, <http://data.open.ac.uk/topic/herbicides>, <http://data.open.ac.uk/topic/ecology>, <http://data.open.ac.uk/topic/genetics>, <http://data.open.ac.uk/topic/diversity>, <http://data.open.ac.uk/topic/global_warming>, <http://data.open.ac.uk/topic/biodiversity>, <http://data.open.ac.uk/topic/pollution>, <http://data.open.ac.uk/topic/conservation>, <http://data.open.ac.uk/topic/the_environment>, <http://data.open.ac.uk/topic/climate>, <http://data.open.ac.uk/topic/environmental_studies>, <http://data.open.ac.uk/topic/climate_change>, <http://data.open.ac.uk/topic/sustainability>, <http://data.open.ac.uk/topic/biogas>, <http://data.open.ac.uk/topic/biofuels>, <http://data.open.ac.uk/topic/photosynthesis>, <http://data.open.ac.uk/topic/waste_management>, <http://data.open.ac.uk/topic/landfill>, <http://data.open.ac.uk/topic/economic_growth>, <http://data.open.ac.uk/topic/waste>, <http://data.open.ac.uk/topic/acid_rain>, <http://data.open.ac.uk/topic/weather>, <http://data.open.ac.uk/topic/meteorology>, <http://data.open.ac.uk/topic/natural_resources>, <http://data.open.ac.uk/topic/animals>, <http://data.open.ac.uk/topic/ecological_sustainability>, <http://data.open.ac.uk/topic/overfishing>, <http://data.open.ac.uk/topic/ecosystem>, <http://data.open.ac.uk/topic/the_end_of_nature>, <http://data.open.ac.uk/topic/survival_of_the_fittest>, <http://data.open.ac.uk/topic/barter>, <http://data.open.ac.uk/topic/plants>, <http://data.open.ac.uk/topic/freshwater>, <http://data.open.ac.uk/topic/maps>, <http://data.open.ac.uk/topic/food> ..
A three step process: • Find all the subjects and chose those relevant to agriculture 2. Find all the OpenLearn Units that have just these subjects 3. Collect the metadata for each of the selected Open Learn units
select distinct ?olu from <http://data.open.ac.uk/context/openlearn> where { ?olu <http://purl.org/dc/terms/subject> ?topic . filter ( ?topic in ( <http://data.open.ac.uk/topic/agriculture>, <http://data.open.ac.uk/topic/environment>, .. .. etc. ) ) } → 85 OpenLearn units Units are extracts from OU courses with multiple pages of material and expected to take many hours of study.
http://data.open.ac.uk/openlearn/s250_3 http://data.open.ac.uk/openlearn/sdk125_1 http://data.open.ac.uk/openlearn/t123_1 http://data.open.ac.uk/openlearn/t206_2 http://data.open.ac.uk/openlearn/t213_1 http://data.open.ac.uk/openlearn/s173_1 http://data.open.ac.uk/openlearn/u116_3 http://data.open.ac.uk/openlearn/s278_19 http://data.open.ac.uk/openlearn/t306_3 http://data.open.ac.uk/openlearn/s189_1 http://data.open.ac.uk/openlearn/s344_1 http://data.open.ac.uk/openlearn/s324_1 http://data.open.ac.uk/openlearn/s250_2 … …
http://data.open.ac.uk/openlearn/s250_2 http://www.open.edu/openlearn/science-maths-technology/science/ environmental-science/social-issues-and-gm-crops/content-section-0 This unit is an adapted extract from the course Science in context (S250)
A three step process: • Find all the subjects and chose those relevant to agriculture 2. Find all the OpenLearn Units that have just these subjects 3. Collect the metadata for each of the selected Open Learn units
Python script to dump the metadata import urllib.parse import urllib.request # To run: python get_SPARQL_from_OpenData.py # Edit this file in two places to choose output format as json or rdf/xml def run_SPARQL(course_id): ''' returns results of SPARQL query''' # EDIT HERE # place course_id in request # req = urllib.request.Request('http://data.open.ac.uk/openlearn/{}'.format(course_id), headers={'Accept': 'application/rdf+json'}) req = urllib.request.Request('http://data.open.ac.uk/openlearn/{}'.format(course_id), headers={'Accept': 'application/rdf+xml'}) # fire off the query f = urllib.request.urlopen(req) # pass back the query result having rendered it readable first return(f.read().decode('utf-8')) if __name__ == '__main__': llist = ['a180_2', 'b823_1', 'd837_1', 'dd100_7', 'e500_11', 'k111_1', …] for course_id in llist: print(course_id) # run query with chosen course id # result = run_SPARQL(course_id) # EDIT HERE # with open('{}.json'.format(course_id), 'w', encoding='utf-8', newline='\n') as f: with open('{}.xml'.format(course_id), 'w', encoding='utf-8', newline='\n') as f: f.write(result)
json format { "http://data.open.ac.uk/openlearn/s250_2" : { "http://purl.org/dc/terms/language" : [ { "type" : "literal" , "value" : "en-gb" , "datatype" : http://www.w3.org/2001/XMLSchema#string } ] , "http://data.open.ac.uk/openlearn/ontology/relatesToCourse" : [ { "type" : "uri" , "value" : http://data.open.ac.uk/course/s250 } ] , "http://purl.org/dc/terms/title" : [ { "type" : "literal" , "value" : "Social issues and GM crops" , "datatype" : http://www.w3.org/2001/XMLSchema#string } … … rdf/xml format <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:j.0=http://dbpedia.org/property/ xmlns:j.1="http://xmlns.com/foaf/0.1/" xmlns:j.3=http://web.resource.org/cc/ xmlns:j.2=http://www.w3.org/TR/2010/WD-mediaont-10-20100608/ xmlns:j.4=http://purl.org/dc/terms/ xmlns:j.5=http://data.open.ac.uk/openlearn/ontology/ xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <j.1:Document rdf:about="http://data.open.ac.uk/openlearn/s250_2"> <j.2:locator rdf:resource="http://www.open.edu/openlearn/nature-environment/the-environment/environmental-science /social-issues-and-gm-crops/content-section-0"/> <j.5:relatesToCourse rdf:resource="http://data.open.ac.uk/course/s250"/> <j.4:creator rdf:resource="http://data.open.ac.uk/organization/the_open_university"/> <j.4:subject rdf:resource="http://data.open.ac.uk/topic/risk"/> <j.4:published rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime >2011-06-02T23:00:00Z</j.4:published> … …
Summary: A three step process: 1. Find all subjects/keywords relevant to agriculture 2. Identify OpenLearn Units with these subjects 3. Collectthe metadata for each Open Learn unit All the scripts (and more) are available
Thanks j.m.bromley@open.ac.uk David.King@open.ac.uk David.Morse@open.ac.uk