1 / 1

Abstract:

Search Form with Autocomplete. Language and Search System Selection. Translator and Search Enhancer. Preferred Term Lists. Multilingual Thesaurus. Metacat Data Catalog. List of Datasets. Multilingual and Thesaurus-Based Search Tools for International Long-Term Ecological Research Data.

trella
Download Presentation

Abstract:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Form with Autocomplete Language and Search System Selection Translator and Search Enhancer Preferred Term Lists Multilingual Thesaurus Metacat Data Catalog List of Datasets Multilingual and Thesaurus-Based Search Tools for International Long-Term Ecological Research Data Kristin Vanderbilt1, Nicolas Bertrand2, David Blankman3, Xuebing Guo4, Don Henshaw5, Honglin He4, Karpjoo Jeong6, Eun-Shik Kim7, Chau-Chin Lin8, Sheng-Shan Lu8, Margaret O’Brien9, Éamonn Ó Tuama10, Takeshi Osawa11, John Porter12, Wen Su4 and other participants of the 2008 and 2012 ILTER workshops. 1Sevilleta LTER, University of New Mexico, Albuquerque, NM, USA; 2CEH Lancaster, Lancaster Environment Centre, Lancaster, United Kingdom; 3Israel LTER, Jerusalem, Israel; 4Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China; 5H.J. Andrews LTER, U.S. Forest Service Pacific Northwest Research Station, Corvallis, OR, USA; 6Department of Advanced Technology Fusion, Konkuk University, Seoul, Korea; 7Department of Forestry, Environment, and Systems, College of Forest Science, Kookmin University, Seoul, Korea; 8Taiwan Forestry Research Institute, Taipei, Taiwan; 9Santa Barbara Coast LTER, Marine Science Institute, University of California at Santa Barbara, Santa Barbara, CA, USA; 10Global Biodiversity Information Facility Secretariat, Copenhagen, Denmark; 11National Institute for Agro-Environmental Sciences, Tsukuba, Japan; 12Virginia Coast Reserve LTER, Department of Environmental Sciences, University of Virginia, Charlottesville, VA, USA Scientists seeking data should be able to efficiently and reliably locate ILTER datasets through searching… Abstract: Data in ILTER data archives are only useful if it can be found by researchers. At the international-level the challenge of searching datasets is compounded by the need to deal with multiple languages. A series of workshops in China explored the challenges of creating an information management system for the International LTER. During a 2008 workshop at Lake Taihu, participants recommended that each ILTER region host a Metacat to house network metadata as Ecological Metadata Language documents. Since that time, Metacat-based metadata catalogs have been established in Taiwan, Japan, Spain, Brazil, and Malaysia. A second workshop in Shanghai, China in 2012 explored the options for using a multilingual controlled vocabulary that would allow researchers to discover ILTER data on an international scale. The latter workshop led to development of prototype search tools that incorporate both translation and search enrichment services. The search enrichment services allow automated search on more specific terms (i.e., “narrower terms”). Thus a search on “forest ecosystems” would include datasets whose metadata included "boreal forests", "clearcuts", "forests", "old-growth forests" and "old growth” as well. Adding a translation layer adds additional search terms ("Bosque", “ Foresta", "Forst", "Forêt", "Las", "Metsä", "Skog", "Wald", "森林", and "皆伐"). The prototype tools use EnvThes (an existing multilingual thesaurus that already fully incorporates the U.S. LTER controlled vocabulary) as their thesaurus, but are web-service based, allowing them to be incorporated into a wide array of customized searching applications. Prototype tools can be seen at: http://vocab.lternet.edu/ILTER • Prototype Multilingual Data Discovery System: • A system for searching multilingual metacats has been developed • A user selects a term from the controlled vocabulary in their language of choice, and the query will be translated via the multilingual thesaurus (Figure 3) to find datasets documented in other languages (Figure 4) Objective: Researchers need access to data, regardless of the language used in the metadata. Our objective is to facilitate discovery of ILTER data regardless of the languages used duiring their creation Figure 3. Prototype system for multilingual searching of ecological data. User selects the language they wish to use for search input and the Metacat to be used in the searching. Based on that selection an appropriate autocompletion list can be used to help guide the searcher towards terms in the thesaurus. The thesaurus can then be used to select additional terms such as synonyms or narrower terms, prior to preparing a search. The enhanced search “pathQuery” can then be sent to the Metacat, previously selected, after being translated into the language appropriate for a particular metacat. • Challenges: • Often several terms are used for one concept in a single network, making it necessary to search on multiple equivalent terms • One site uses CO2 another Carbon Dioxide, another Carbon-dioxide • Carbon to Nitrogen Ratio, C:N, C:N Ratio, Carbon-to-nitrogen Ratio • Searching on a broader term does not return narrower terms • Searching on “Landscape Change” doesn’t find data sets related to “desertification” even though desertification is a kind of landscape change • The meaning of a term in one language may fall along the continuum of exact equivalence to non-equivalence in another language (Figure 1 and Table 1) Table 1. Inexact equivalence of US English and Japanese ‘wetlands’ concepts Select Languages for input and searching Figure 1. Terms selected from more than one natural language vary in the extent to which they represent the same concepts. These variations can be seen as forming a continuum that ranges from exact equivalence in meaning through a) inexact equivalence, b) partial equivalence, c) single to many equivalence, to d) non-equivalence. Autocomplete helps guide choices to words in the Thesaurus Groups of terms can be selected based on their relationship the input term and searched Select a Metacat to search • Goals: • Identify solutions that will permit data discovery by a linguistically diverse set of researchers • Identify a list of preferred terms and create a controlled vocabulary that could be used by sites in creating metadata documents • Focus on ILTER-wide searches • Want to facilitate cross-site synthesis on an international basis • Translatepreferred terms to multiple languages to create a multilingual controlled vocabulary • Enable locating datasets through searching • Enhance existing data discovery tools to make them work better through the organization of terms in ways that facilitate linkages among them Results for a search of the TFRI Metacat using Chinese Terms Figure 4. Sequence of web pages used to conduct a multilingual search for data. The search process can be simplified by adding “default” choices. • Solutions: • US LTER adopted a 634 term controlled vocabulary in 2010. • It has been incorporated in to EnvThes, the EnvEurope project’s multilingual thesaurus(Figure 2) • Providing a rich context for terms by exploiting interrelationships with other terms helps overcome some of the ambiguity introduced through translation Figure 2. Lexical resources such as taxonomys, thesauri and ontologies help provide linkages between concepts. These linkages can be exploited to interconnect concepts in different languages. The EnvThes thesaurus fully incorporates the U.S. LTER Thesaurus (http://vocab.lternet.edu), along with other resources. It also has translations of many of the terms into 13 languages, including French, German, Chinese , Japanese, Italian, Finnish, Polish, Portuguese, Swedish and English. • Products: • The ILTER data discovery prototype interface is accessible at: http://vocab.lternet.edu/ILTER. • More about the EnvThes thesaurus is available here: • http://www.lter-europe.net/info_manage/EnvThes • The US LTER Controlled Vocabulary Working Group has developed a number of applications and web services to support the use of the LTER Controlled Vocabulary. This page provides a listing of resources: • http://im.lternet.edu/vocab_resources • Copies of code etc. are available on the LTER SVN site (http://svn.lternet.edu) in the "vocab" tree. Focus of U.S. Controlled Vocabulary WG and EnvThes Acknowledgements: The two workshops held to support development of the ILTER Multilingual Information Management System were held at Lake TaihuField Station and Eastern China Normal University in Shanghai with the generous support of the Chinese Ecological Research Network (CERN) and the National Ecosystem Research Network of China (CNERN). The U.S. National Science Foundation supported travel of U.S. participants.

More Related