280 likes | 429 Views
Resource Discovery (metadata and searching). Working Group Report. Issues discussed . What kinds of resources should EMELD provide search services for? What should the design be for an EMELD search interface? How can EMELD get good metadata into its search database?
E N D
Resource Discovery(metadata and searching) Working Group Report
Issues discussed • What kinds of resources should EMELD provide search services for? • What should the design be for an EMELD search interface? • How can EMELD get good metadata into its search database? • What level of metadata should be exposed?
What resources? • Anything that might be of value to the endangered language's linguist. • Language data • Tools • Advice (including reviews) • People • "Gateway" websites
What resources? • But, there's no reason to rely on this working group for "what". • A questionnaire distributed via Linguist
What resources? • Two kinds of best practice resources • Resources with best practice metadata • These resources can be discovered • Non-digital resources encouraged • Digital resources discouraged, but allowed
√ What resources? • Best practice digital resources • All digital resources encouraged to be of this type • Benefits • Enhanced search features (due to document interoperability) • Special "BP globe of approval"
What resources? • Side Note • Best Practice "approval" system should be tied into a larger system through which digital resources could be listed as "publications" • A topic for another working group? (Perhaps OLAC?)
What resources? • Issues which need to be addressed • Metadata for resources interesting to linguists but which are not linguistic data • Needed: Best practice metadata standards for • Tools • Advice • People • ... • Test: EMELD could see how it would classify everything in BPU.
How to search? • Assumption: Metadata and data is distributed • Query Language • Metadata: OLAC standard • Data from interoperable documents: A new standard
How to search? • Resource Query Language Ideal • A generalized query protocal used across the linguistics community • A series of "methods" to be defined can be called on these resources to retrieve structured linguistic data matching query parameters
How to search? • Problems implementing ideal • No clear sense as to what "methods" are needed. • One solution: Examine results from questionnaire
How to search? • Problems implementing ideal • Very few repositories allow their data to be accessed in a generalized way • First step: Encourage documentation of repository data access systems and develop a metadata standard for this
How to search? • Long term implementation issues • An OLAC Query Language Protocol • A well-defined linguistic query language • A system for "packaging" queries • Linguistic data search registry • Linguistic sites register they are data access sites • They also register implemented search methods • EMELD will archive best-practice documents for data access for data creators not capable of implementing the query protocol
How to search? • Pilot project • Take some small subset of resources • Data inputted via Field • Nijmegen? SIL? AIATSIS? AILLA? • Take FIELD search out of FIELD • Search over that small set of resources • Ideally, keep both resources in separate databases to begin to develop query interchange protocol
How to search? • Another project: Grammatical thesaurus • Develop a grammatical thesaurus that gives common synomyns for a given grammatical term (Ex. oral stop, plosive) • This could then be used to allow a user's search to be expanded to include synonyms for a given term. • In all likelihood, there are other applications of this.
How to search? • Search interface • EMELD should implement a VISER-like service for access to its database • There are two distinct kinds of searches • Resource location • Resource data search
How to search? • Search interface • The details of the search interface implemented by EMELD are hard to conceive of until more resources can be accessed through it • A questionnaire can help with this area too. • EMELD could ask people to try the search and evaluate it • Starting with the people in this room
Getting the data • Sticks • EMELD Ambassadors • Assisted by Linguist Spider
Getting the data • Carrots • Support harvesting metadata in document headers for submitted URL's. • Resources with best practice metadata can be referenced using some standard EMELD URI which can be used as a reference • These resources could be posted and advertised on Linguist • (but consult Baden first)
Getting the data • Juiciest Carrots (Best Practice resources only) • "Preferred" EMELD URI's • Marked as such in a search • Could undergo "advanced" search techniques • Be peer-reviewed and vetted by LDRA • (Linguistic Digital Resource Association)* *This organization does not exist, as far as I know.
Granularity • Right now there are no recommendations for the granularity of exposed metadata records • Large archives, for example, have hierarchical structure, one level of which must be isolated (the IMDI session, for example) • Cutting-edge archives don't work well with the resource=object model. Their resources are "created" based on the user's needs
Granularity • The lack of recommendations on this issue inhibits metadata creation • Granularity makes a big difference as to what content is searchable • Two different audience's in need of advice • "Real" archives (a.k.a. trusted repositories) • Individuals
Granularity • Recommendation: EMELD should encourage IMDI and OLAC to devise best-practice recommendations for granularity
The questionnaire • Two broad kinds of questions: • What kinds of things would you like? • What kinds of would you hate hate? (Dafydd's Corollary)
The questionnaire • Part one: Search capabilities • How do you want to conduct your search (google-style, directory-style, pull-down menus...)? • What kinds of searches are you doing already on other sites? • Search within results? (We wanted this.) • Thesaurus-based search
The questionnaire • Part Two: Search content • Free entry (like Google) • Feature-based entry • Statistical questions • Phonetic characters • Geographical search • Time search • ...
The questionnaire • Part Three: Results • Google-like results • Journal abstract search-like results • Restricted results (only return web sites, .pdf documents, ...) • ...
The questionnaire • Format • Online submission • Combination multiple choice (for the uncreative) and free form (for the creative) • Encourage people to envision the search of the year 2503