320 likes | 454 Views
TUE / Information Retrieval Reference Structures part 2 – NSR architecture Trezorix & RNA-project. TUE / Information Retrieval / Reference Structures (2). contents. chapter 1 – flat file system advantages / disadvantages.
E N D
TUE / Information RetrievalReference Structurespart 2 – NSR architectureTrezorix&RNA-project
TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / extra data
TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data
TUE / Information Retrieval / Reference Structures (2) flat file system / advantages and disadvantages advantages all relevant information is closely gathered around the concept, so the concept data don’t have to be queried together speed disadvantages redundancy no guaranteed integrity
TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data
TUE / Information Retrieval / Reference Structures (2) search engine / features phrase searchingboolean operatorsproximity searchingdirected proximity searchingphonic searchingstemmingnumeric range searchingfuzzy searchingconcept searchingautomatic term weightingpositional scoringvariable term weighting combining nearly all search types
TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries
TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries the columns show the respective times for the query (in seconds), the total number of documents returned by both programs, and the documents returned by one program that were not returned by the other
TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries * * dtSearch is faster in all except two, indicated by asterisks
TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries the documents returned by dtSearch are always equal to, or a superset of, the documents returned by MS-SQL - MS-SQL missed some documents due to malformed punctuation, e.g., no space after a period, so that a term of interest is “conjoined” to the first word of the next sentence
TUE / Information Retrieval / Reference Structures (2) search engine / performance comparison with database search comparison of MS-SQL vs. dtSearch for 14 queries source: Journal of the American Medical Informatics Association
TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data
TUE / Information Retrieval / Reference Structures (2) identifiers / scientific names advantages straightforward human readable disadvantages spelling mistakes difficult to define as unique id’s (special characters, etc.) homonyms occur, no unique identifying there is a ‘system behind’, which is dangerous in fact, there are even more ‘systems behind’
TUE / Information Retrieval / Reference Structures (2) identifiers / tcn code tcn code: taxon code nederland advantages good indentifier, can’t get lost when splitting a concept, two new identifiers are created disadvantages when splitting a concept, strange things happen not generally accepted (mainly fresh water organisms) there is a ‘system behind’, which is dangerous
TUE / Information Retrieval / Reference Structures (2) identifiers / uri’s as identifiers advantages flexible human readable (to a certain extend) disadvantages bound to a domain name, no guarantee for persistency local rules within domains
TUE / Information Retrieval / Reference Structures (2) identifiers / digital object identifiers advantages accepted for ISO standardisation can be used to identify any media or content already over 20 million DOI’s assigned disadvantages a DOI has to be registered with a Registration Agency (a small fee per DOI)
TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data
TUE / Information Retrieval / Reference Structures (2) external links live queries to external websites: data of Ministery of Agriculture (LNV) nature observations site (waarnemingen.nl)
TUE / Information Retrieval / Reference Structures (2) external links server side: caching possibilities client side: scalability Ajax
TUE / Information Retrieval / Reference Structures (2) external links findability: dynamically obtained, so cannot be queried solution:spidering, indexing with free text search engine solution:integration of databases
TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data
TUE / Information Retrieval / Reference Structures (2) NSR between source databases source databases:external databases with authorized data which supply essential elements to the system examples:taxonomic thesaurusimage library related issues:bringing data from different sources together for one presentationexternal source keeps its own independant existancelive imports/updates, deletion of old records
TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / nasty extra data
TUE / Information Retrieval / Reference Structures (2) NSR towards compagnion databases or websites naming system:image librarywhale beachings and observations site (walvisstrandingen.nl) presentation of taxonomics structure:nature observations site (waarnemingen.nl) how far can we go?generalisation?specific applications? Google and other web search engines
TUE / Information Retrieval / Reference Structures (2) contents chapter 1 – flat file systemadvantages / disadvantages chapter 2 – search enginefeatures / performance comparison with database chapter 3 - identifiersscientific names / tcn code / uri’s as identifiers / digital object identifiers chapter 4 – external linkslive queries / server side – client side / findability chapter 5 – NSR between source databaseswhat are source databases? / related issues chapter 6 – role of NSR towards compagnion databases or websitesnaming system / presentation of taxonomic structure / how far can we go? chapter 7 – datamodelsreference structures / taxonomic objects / extra data
TUE / Information Retrieval / Reference Structures (2) datamodels / reference structures SKOSSimple Knowledge Organisation SystemW3C standard RDF vocabulary compliant to ISO 2788 and ISO 5964 (thesaurus standards)for defining ‘simple’ structures, like thesauri, glossaries, taxonomies, etc.undemanding in terms of expertise and effortcomplemental with OWL OWLWeb Ontology LanguageW3C standardRDF vocabularyfor defining complex conceptual structures demanding in terms of expertise and effortcomplemental with SKOS
TUE / Information Retrieval / Reference Structures (2) datamodels / taxonomic objects Darwin Corerecommended standard (Taxonomic Database Working Group)small set of data element definitions (44) flat structure for sharing and integration of primary biodiversity data ABCDrecommended standard (Taxonomic Database Working Group)access to biological collections data (hence ABCD)comprehensive set of data elements (700)hierarchical structure, ontologyexchange of primary biodiversity datacompatible with Darwin Core NBN datamodelNational Biodiversity Network (UK)reliable exchange of biodiversity data from heterogeneous sources hierarchical structure, ontologymapping to Darwin Core, ABCD, etc.
TUE / Information Retrieval / Reference Structures (2) datamodels / extra data species counterfor each taxon the number of underlying species is displayed availability of photographs of a speciesfor each taxon images of underlying taxa is displayed
TUE / Information RetrievalReference StructuresTrezorix&RNA-projectend of part 2 www.rnaproject.orgwww.soortenregister.nl
TUE / Information RetrievalReference StructuresassignmentTrezorix&RNA-project
TUE / Information Retrieval / Reference Structures / assignment assignmentmake a website for digital access to (part of) the NSR collection goalto illustrate use of different reference structures for browsing and searching of digital collections stepsdescribe a NSR datamodelrepresent the structure part of the NSR in SKOSrepresent as much of the ‘extra data’ as possible in SKOSrepresent those data which don’t fit into SKOS in OWLpresent the result in a websiteuse the open source RDF-framework Sesame for storage of structuresuse the open source search engine Lucene for findability of NSR-elements applicable ‘extra data’change record datadata for species counterdata about availability of photographs of a species what we supplya limited NSR data set (for instance ‘songbirds’)extra datamore technical background about the NSR
TUE / Information RetrievalReference StructuresTrezorix&RNA-projectend www.rnaproject.orgwww.soortenregister.nl