170 likes | 351 Views
http:// intranet.lternet.edu/im/news/committees/working_groups/controlled_vocabulary Working Group: “Synthesis through data discovery and use: Past Present and Future Wed. 10-12pm. Long-Term Ecological Research. Controlled Vocabulary Working Group – Report September 2009.
E N D
http://intranet.lternet.edu/im/news/committees/working_groups/controlled_vocabularyhttp://intranet.lternet.edu/im/news/committees/working_groups/controlled_vocabulary Working Group: “Synthesis through data discovery and use: Past Present and Future Wed. 10-12pm Long-Term Ecological Research Controlled Vocabulary Working Group – Report September 2009
Background and Past Activities • Finalizing the list – who approves? • Procedures for managing the list • Next steps • Tool development • Keywording • Searching • Hierarchies/polytaxonomys/thesauri/ontologies Agenda for Vocab working group
For past activities, see the report at: http://intranet.lternet.edu/im/node/114 and http://intranet.lternet.edu/archives/documents/Newsletters/DataBits/06spring/ • Summary: • Eclectic keywords make searching difficult – most terms are used only once! • No easy way to group or organize similar datasets to facilitate “browse” searches The Problem
Assembled list of LTER EML Keywords • Cross linked that list to: • NBII Thesaurus Words • GCMD Keywords • Metacat Searchers • Edited • Changed words to preferred forms (kept track of synonyms) • Removed specific places, taxonomic names Steps takeN
Selected • Keywords shared with GCMD and NBII, or • Keywords used at more than one LTER site • Reviewed • Removals and additions were suggested • Voting via SurveyMonkey • Edited • Added words voted for • Removed words voted against • When vote was close – went with current status Steps Taken
640 keywords 148 synonyms 201 NBII keywords 21 GCMD keywords The LIST
Is additional editing required? • Who decides if it is an LTER “official” list? • And what does it mean if it is? • What procedures should be followed for subsequent editing of the list? • Who should manage the list database? • Term • Scope • Definition • Synonyms LTER SCIENCE KEYWORD LIST 1.0???
Autocomplete search tool - Duane Costa Autocompletekeywording tool - Duane Costa Update-document-keywords tool? Advanced search tool? Next steps - Tools
There is general agreement that keywords are most useful when they can be tied to other keywords • How do we create the needed keyword taxonomy(s)? • Barbara Benson has done some work looking at other hierarchies (KNB, GCMD) • GiriPalanisamy has sent us the broader, narrower and related terms for the ~1/3 of the words that are also in the NBII thesaurus Next steps Hierarchies
the existing KNB browse hierarchy is rather limited (the LTER version that gives the number of hits is a good feature) • a browse hierarchy could be useful to sites in developing one at the site • it could be hooked into any tools that are developed to assist in assigning keywords to datasets • it could be used in a tool that enables the creation of a browse hierarchy from a keyword list • it could assist in searches done by keywords in offering an option to go up a level from the keyword to a broader concept and thus yield a high number of hits in the search Hierarchies - Rationale
Taxonomic and place keywords were excluded from the science keywords • Do we need a gazetteer for places? • Do we need taxonomic lists & tools for taxonomic information? • Are there other types of lists that are needed? Next steps – Other lists
Feedback on tools • Ideas for additional tools • Hierarchy Discussion topics
LTER words emerging organically • Not just general search • Other efforts • Vegetation ecology community interested in ontologies for vegetation traits • LTER words are not specialized • Would be good to keep in touch with other efforts • SONET – intercommunication (Gries) critical • Rob Raskin taking GCMD and ontologizing it • NASA is developing “Suite” – upper level ontology • Semtools – (O’Brien) – using Morpho and making it better database management system – using subsumption hierarchies in OWL • OWL allows use of generic applications (JENA) – standard format Around the room – next step
Autocompletion tools helpful for NEW EML • But need tools for updating existing metadata • Having a first cut of recommendations would help • Tool that does suggestions based on document content would be helpful • Semantic annotation • Hook to parents, children and related • Educate PI’s on using list is important • Just availability of list is important Around the room
Automatic annotation with broader terms • Identify “unfindable” datasets – what datasets have no LTER Keywords or synonyms? • Go dataset by dataset and see which have no hits • EML is limited in how it assigns keyword lists • Could target tools at keyword set • Namespacing control could be relaxed to go beyond “theme” and “place” Next Steps
Ecotrends – predated LTER list • Would have been good to have LTER list • Eventually would like to integrate • May be able to exploit synonym rings • When title and dataset don’t match – Title says “Productivity” but attribute is “biomass” need to examine holistically • Linking terms to definitions needed • Also taxonomic database would be useful for “bugs” (true bugs vs insects) NEXT STEPS
Practices in design • When develop – always think about how they are tied to organizational routines • Think proactively about how to make it routine – getting people to think in categories • Pursue Polytaxonomys based on Barbara’s list • Develop synonym list further • See how keyword lists match • AND has 3-level hierarchy • Start at top or bottom in adding…. NexT Steps