1 / 15

Harmonization and Integration of Semi-Structured Data Through Wikis and Controlled Tagging

Harmonization and Integration of Semi-Structured Data Through Wikis and Controlled Tagging. E. M. Robinson, R. B. Husar Washington University, St. Louis, MO. Abstract:.

avalon
Download Presentation

Harmonization and Integration of Semi-Structured Data Through Wikis and Controlled Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harmonization and Integration of Semi-Structured Data Through Wikis and Controlled Tagging E. M. Robinson, R. B. Husar Washington University, St. Louis, MO

  2. Abstract: The contents of cyberspace are increasingly generated and distributed by individuals. This is manifested by the explosive growth of web-based social software like wikis, media-sharing services and blogs. This architectural, technological and cultural transformation of the Internet, commonly referred to as Web 2.0, is good news for the Earth Science community since it offers new possibilities for sharing and harvesting community-provided content as well as collaboratively creating new things. One key feature of all of these new softwares is the end-user's ability to add tags, adding value by extending the metadata of the particular object. Ad hoc tagging (folksonomy) gives a rich description of the internet resources, but it has the disadvantage of providing a fuzzy schema. The semantic uniformity of the internet resources can be improved by controlled tagging which apply a consistent namespace and tag combinations to diverse objects. We have used the above tagging approaches in order to gather internet resources pertaining to air quality events. Initial event analysis of the southern Georgia fires, which burned in April and May, 2007, began with filtering and harvesting user-contributed web content. The Google Blog Search of 'Florida smoke' returned several thousand entries, many of them unrelated to the wildfires. Visually scanning the blog entries yielded a number of interesting posts, which were given the controlled tags '070508+Florida+Smoke' in the social bookmarking tool del.icio.us. Additional smoke photos were found in the photo-sharing service, Flickr and given the same set of controlled tags. Together, these tools yielded a rich but only qualitative description of the Georgia Fires. Because of the common set of controlled tags these web objects (i.e. links and photos) were harvested in a wiki environment, which also contained the links to quantitative air quality analysis based on satellite and surface observations.

  3. Goal: • Cross-leverage the shared resources on the Web, while maintaining autonomy of different services. • Better apply decision-support material in research, regulation and policy • Amplify and connect minds Approach: • Harvest and aggregate Web content • Use collaborative wiki workspaces • Create knowledge products through communal and individual analysis

  4. User-Generated Content • Web 2.0 software allows users to easily add objects to the web • : Links • : Photos • : Video • : Presentations • : Blogs/Wikis • Structured metadata is already encoded on these types of data (date, user, type) • All objects have URL

  5. Wiki View • Wikis originally used just for collaborative writing • Features: • Editable by web users • Tags • Discussion pages • Versioning • Now they are dynamic workspaces, able to embed web objects from disparate sources • Add additional context, facilitate collaborative analysis • Allow two-way transfer of knowledge Edit Discuss Collaborate

  6. Tags • Keywords added to web objects either by provider or user Pro: • Tags can be added by anyone, to any URL • Allow for multiple types of categorization, not just one hierarchy • Can tag in any service Con: • Uncontrolled number of tags • Multiple words with same meaning • Can tag in any service

  7. Controlled Tag-based Mediation • Users can be mediators of web-based content by “wrapping” it with a unique controlled tag (or set of tags) in two ways: • Use Del.icio.us to homogenize the heterogeneous objects • Create wiki page as the web object. Add semantic tags. • Create wiki page which harvests queries and adds context to create emergent, reusable knowledge Controlled Tag-based Connectivity

  8. Communal Event Analysis Southern California Fire Smoke Given the high density and short response of user-generated content about air pollution events it is said that the Earth, has now acquired a "skin" for the detection of changes in the environment.

  9. Control Tag: 071022SoCalSmoke • Quantitative: • Harvest links and relevant datasets • Controlled tagging in the wiki (datasets) and in Del.icio.us (links) • Query/RSS from Del.icio.us and wiki into EventSpace wiki page • Qualitative (Blogs, Flickr, YouTube): • Use service to perform coarse filtering • Controlled tagging in del.icio.us • RSS feed from del.icio.us into the EventSpace Datasets Links

  10. Multiple Wiki Views Data System Profiles Data System Wrap Data System Metadata Semantic Tags

  11. Needed consistent description of multiple, autonomous data systems • All of the data systems were web-based, however the metadata about them was distributed. • Used semantic tagging in the ESIP wiki to wrap distributed, heterogeneous data system metadata into a homogenous view for easy comparison of systems. • Semantic tags are sets of tags with a specific type and attribute • Type determines kind of response that can be given (text, enumeration, date, location) • Attribute is the semantic tag name • Queried semantic tags returns filtered list

  12. Multiple Views Community Data Sharing - ‘DataSpaces’ Catalog – Find Dataset Dataset Wrap metadata with Semantic Tags Reuse Meta DataSpaces

  13. Two parts: • Semantic Tags: Structured • User-added content: Unstructured • Semantic Tags: • Define common features of all datasets (BBox, time range, provider) • Can be queried within the wiki to show a subset of datasets. • Ready for Export/Harvesting with RDF Feeds for use by Registries, Catalogs, XSLT transformations • User-added Content: • Feedback/FAQ’s from users about datasets • Tagged, relevant papers about dataset • Dataset lineage • …

  14. Summary • By adding unique tags, groups can collaboratively curate lists of resources • The wiki allows the integration of seemingly unrelated information from distributed web objects to be brought together by harvesting unique tags. • Tagging within the wiki allows emergent structure to evolve.

  15. Future Work • Continue to learn how to add structure with tagging • Continue to mash structured tagging with the wiki ‘canvas’ • Use tagging as a way to allow feedback from user to provider. • Facilitate community tagging and collaboration

More Related