160 likes | 304 Views
Scratchpads for community involvement for natural history collections . Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences Natural History Museum London. Fourth Annual Summit | Feb 21-23 | Tucson, AZ. What is we try to tackle?.
E N D
Scratchpads for community involvement for natural history collections Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences Natural History Museum London Fourth Annual Summit | Feb 21-23 | Tucson, AZ
What is we try to tackle? The long tail of Biodiversity data Inaccessible | native format/private silos Disconnected | not aggregated or discoverable Redundant | overlapping efforts no coordination Cluttered | small and dispersed datasets Typically produced by small communities 20% 80%
Can VREs help? Virtual Research Environments efficient in incentivising and enabling researchers to mobilise their data Online collaborativeenvironments Underlyingtechnologies that help semantically enrich and aggregatedata on a higher level Provide efficient tools, simple interfaces and comprehensive documentation Incentivise researchers to enter, share and finally mobilise their datasets “Our goal is to make every researcher digital”
Enter – Structure – Curate – Link – Publish Biodiversity data online 7 years of continuing development | 3 major Grants | Industry leading platform
660 Scratchpads Communities by 7,100 active registered users covering 90,000 taxa in 615,000 pages. In total more than1,600,000 visitors Per month unique visitors to Scratchpads sites 65,000 unique visitors/month
Taxa (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies) Conservation Projects Regions Societies
Biodiversity standards (TDWG, DwC, Audubon) In-house data External data & services A Scratchpad is a gateway to big data
Unstructured Overlapping Disconnected Native formats & vocabularies Source data Atomisation Controlled vocabularies are key for efficient data capture Mark-up / Data annotation Aggregate Publish Collaborate Curate Link
In order to capture and annotate data we need Fine grained pre-defined fields and Comprehensive controlled vocabularies 1. 2.
What we currently use DwC IUCN ISO web service EOL 1. 2. • Capturing specimen record data • Taxonomic identification • Date • Collector • Location • Continent • Country • State/Province • GPS • Locality • - Area/place • - Habitat • - Substrate (Environmental material) • - biome Generating character/trait projects Morphological/anatomical characters Ecological traits Usually transcribed from label Some inferred by curators Provided in highly inconsistent way
Users of Virtual Research Environments consumersas well as contributorsto ontologies Top-down approach Communities working on ontologies Biodiversity communities Bottom-up approach
End-user community involvement Bottom-up approach Community involvement Top-down approach Ontology granularity Deep hierarchy High level approach
1. Simpleand intuitive end-user products 2. Controlled vocabularies over highly structured ontologies for data capture 3. Mechanisms for updating vocabularies basedon usercustom entries
Before we can widely implement the use of ontologies we need to shift from one to another Ontologies as Infrastructure Ontologies as Research Good for Knowledge representation and reasoning Good for Data capturing Communal / agreed Persistent Essential Robust & reliable Concerns specific communities Experimentation Frequent changes
The e-infrastructures pyramid Researchers & Research communities VREs, online tools Webservices, interoperable systems Vertical approach Ontologies, Computing Horizontal approach
Thank you Comments/questions? @DimitrisKoureas d.koureas@nhm.ac.uk