40 likes | 138 Views
Methods for Data Discovery – Portals. Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such as data reformatting, subsetting, brokering, etc.
E N D
Methods for Data Discovery – Portals • Portal facilitates access to and also assimilation of data • Portal is not simply a web site: it offers services such as data reformatting, subsetting, brokering, etc. • Portal is not just a collection of information and links: portal takes you elsewhere through a service • Portal answers questions: abstracts data or does simple analysis • Identify phases: • Phase 1: need a simple presence (web page) to start: avoid initial overreaching • Could be multiple portals/interfaces • Define discovery • Identifying what you know you want • Also, importantly, “accidental” discoveries that derive from the broad scope of disciplines and nations • PIs want “definitive datasets”: vetted for quality, coverage, etc. • Metadata is key • In US, 10% of all IT spending is for metadata generation • 85% of data is unstructured • Need a new means—other than a list returned from a search—to present the data to the users • Vetted datasets • Desired and useful • Danger of cliques taking control • Root of ‘vet’ also leads to ‘veto’; overreaching? • A desired interface: a list that is classified and aggregated • Who are the users? Don’t forget education and outreach community
Methods for Data Discovery – Portals • IPY legacy: • Need long term stewardship of metadata and data • Define audiences: scientists and public • Public needs access to information products • Phase 0: list of datasets and datacenters • Phase 1: metadata for datasets • 2: publications • 3: Services: visualizations • Start with a single data center (?) NSIDC? • Stages: • 1. IPY project honeycomb charts: identify sources of data • Done by 2007 • Science base • Dataflows: • Regional focus, discipline focus which point to archive or individuals • 2. Complementary Portals (links) • 3. Services that allow discovery (esp. databases) of unexpected connections • Search – access • Interactive – community tools • Visualization • Integrative
Methods for Data Discovery – Portals • Portal must be accessible though search engines (Google) • Alignment of commercial interests with IPY • GoogleBase as a metadata service • Target audiences: scientists and education and outreach • Also recognize that • Not designing a portal—actually designing a process • Portal captures user interaction and uses this to enhance future use (e.g. Amazon) • Need to address ontology, metadata design, data collection design early in the process; counterpoint: we don’t have enough a priori information to design • Data managers come up with good plans, but implementation is spotty, unless compelled • Location is a common element that could tie discovery and integration together • Involve projects in classifying the honeycomb and building the initial lists in Stage 1
Methods for Data Discovery – Portals • Addendums following group discussion • Who is going to do this? (Implementation plan) • Agencies • National Committees • PIs • DIS • Arctic Council working groups • International bodies • NGOs • Use lessons learned from groups like ice coring, oceanographers, etc. who are already good at sharing data • All of this goes into the “funding agency data management letter”; can this be articulated in time? • Letter needs to go to agency IPY point of contact. • Three questions • Who is responsible for IPY? • How will info be used • Wher will info go? (ipy.org) • Create metadata to describe portals • AMD is an example for metadata and services descriptions • Enable search of portals • Annotate with keywords to limit search results • Geographic focus • Stakeholders • Disciplines • Create an online mechanism for users to input list of portals and annotate them; that is, put the burden on the community • Suggestions: use GCMD and AMD • Use this to solicit feedback and ideas that are desired by the user community