1 / 18

Data on the Web Life Cycle

A comprehensive overview of the data on the web life cycle, including data collection, generation, distribution, and usage, along with examples of best practices for each stage.

mreynolds
Download Presentation

Data on the Web Life Cycle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

  2. Outline • Definition of data on the Web • Data on the Web life cycle • Spiral model • Overview • Data collection • Data generation • Data distribution • Data usage • Data on the Web life cycle + best practices • Examples of best practices

  3. Data on the Web Data from diverse domains (ex: governmental data, cultural heritage, scientific data, cross domain) available on the Web on a machine processable format.

  4. Data on the Web Life Cycle A set of tasks or activities that take place during the process of publishing and using data on the Web. The processmaypassthrough some numberofiterationsandmayberepresentedusing a spiralmodel.

  5. Data on the Web Life Cycle Author: Bernadette Lóscio

  6. An overview of the Data on the Web life cycle

  7. Data on the Web Life Cycle • Data collection • Sources selection: identification of data sources that may offer relevant data (ex: relational databases, xml files, excel documents)

  8. Data on the Web Life Cycle • Data Generation (1st iteration) • Dataset project • Define the schema of the target dataset (structural metadata) • Choose standard vocabularies • Data (ex: FOAF, DC, SKOS, Data Cube) • Dataset (ex: DCAT, PROV, VoiD, Data Quality Vocab) • Data Catalog (ex: DCAT) • Choose data formats (machine processable data) • Create new vocabularies • …

  9. Data on the Web Life Cycle • Data Generation (2nd iteration) • ETL process (Extract, Transform and Load) • Extract data from the selected data sources, transforms the data according to the decisions made during the dataset project and loads the data into the target dataset • Metadata generation • Produce (manually or automatically) structured metadata according to the metadata standards defined during the dataset project

  10. Data on the Web Life Cycle • Data Distribution (1stiteration) • URIs project • Design URIs that will persist and will continue to mean the same thing on the long term • Choose a solution(s) for data publishing • data catalogue, API, SPARQL endpoint, dataset dump, …

  11. Data on the Web Life Cycle • Data Distribution (2nd iteration) • Publish data and metadata • Make data and metadata available on the Web • Data Distribution (3rd iteration) • Update data • Make a new version of the dataset available on the Web • Update metadata • Make a new version of the metadata available on the Web

  12. Data on the Web Life Cycle • Data usage • Explore data • Identify important aspects of the data into focus for further analysis • Analyze data • Develop applications, build visualizations, … • Give feedback • Provide useful information about the dataset (ex: dataset relevance, data quality,…) • Provide data usage descriptions

  13. An overview of the Data on the Web life cycle + best practices

  14. Data on the Web Best Practices • Best practices may be applied during the whole process of publishing and using data on the Web. • Best practices may be defined according to the activities performed in each one of the quadrants (or tasks).

  15. Data on the Web Life Cycle + Best Practices Author: Bernadette Lóscio

  16. Examples of Best Practices • Data collection • Best practices: • Have a catalogue to describe potential data sources, i.e., data sources that could provide data to be published on the Web • … • Data Generation • Best practices • Document the process of data generation • Use standard vocabularies to describe data • Use standard vocabularies to describe datasets and data catalogues (ex: DCAT) • Provide stable URIs • Provide data on machine processable formats • Provide metadata to describe data • …

  17. Examples of Best Practices • Data Distribution • Use standard ways to distribute data (ex: data catalogues and APIs) • Provide details about data access • Provide details about data licence • Provide details about dataset provenance and quality • Provide a schedule of dataset updates • Keep a dataset history • Provide ways to collect data consumers feedback • Announce the publication of new datasets or new versions of existing datasets • … • Data usage • Provide feedback about datasets • Provide descriptions about the usage of the dataset • …

  18. Data on the Web Best Practices • For each best practice, a guidance of how to implement must be provided • Some best practices may have more than one way of implementation (to be continued)

More Related