180 likes | 400 Views
Data on the Web Life Cycle. Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014. Outline. Definition of data on the Web Data on the Web life cycle Spiral model Overview Data collection Data generation Data distribution D ata usage Data on the Web life cycle + best practices
E N D
Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014
Outline • Definition of data on the Web • Data on the Web life cycle • Spiral model • Overview • Data collection • Data generation • Data distribution • Data usage • Data on the Web life cycle + best practices • Examples of best practices
Data on the Web Data from diverse domains (ex: governmental data, cultural heritage, scientific data, cross domain) available on the Web on a machine processable format.
Data on the Web Life Cycle A set of tasks or activities that take place during the process of publishing and using data on the Web. The processmaypassthrough some numberofiterationsandmayberepresentedusing a spiralmodel.
Data on the Web Life Cycle Author: Bernadette Lóscio
An overview of the Data on the Web life cycle
Data on the Web Life Cycle • Data collection • Sources selection: identification of data sources that may offer relevant data (ex: relational databases, xml files, excel documents)
Data on the Web Life Cycle • Data Generation (1st iteration) • Dataset project • Define the schema of the target dataset (structural metadata) • Choose standard vocabularies • Data (ex: FOAF, DC, SKOS, Data Cube) • Dataset (ex: DCAT, PROV, VoiD, Data Quality Vocab) • Data Catalog (ex: DCAT) • Choose data formats (machine processable data) • Create new vocabularies • …
Data on the Web Life Cycle • Data Generation (2nd iteration) • ETL process (Extract, Transform and Load) • Extract data from the selected data sources, transforms the data according to the decisions made during the dataset project and loads the data into the target dataset • Metadata generation • Produce (manually or automatically) structured metadata according to the metadata standards defined during the dataset project
Data on the Web Life Cycle • Data Distribution (1stiteration) • URIs project • Design URIs that will persist and will continue to mean the same thing on the long term • Choose a solution(s) for data publishing • data catalogue, API, SPARQL endpoint, dataset dump, …
Data on the Web Life Cycle • Data Distribution (2nd iteration) • Publish data and metadata • Make data and metadata available on the Web • Data Distribution (3rd iteration) • Update data • Make a new version of the dataset available on the Web • Update metadata • Make a new version of the metadata available on the Web
Data on the Web Life Cycle • Data usage • Explore data • Identify important aspects of the data into focus for further analysis • Analyze data • Develop applications, build visualizations, … • Give feedback • Provide useful information about the dataset (ex: dataset relevance, data quality,…) • Provide data usage descriptions
An overview of the Data on the Web life cycle + best practices
Data on the Web Best Practices • Best practices may be applied during the whole process of publishing and using data on the Web. • Best practices may be defined according to the activities performed in each one of the quadrants (or tasks).
Data on the Web Life Cycle + Best Practices Author: Bernadette Lóscio
Examples of Best Practices • Data collection • Best practices: • Have a catalogue to describe potential data sources, i.e., data sources that could provide data to be published on the Web • … • Data Generation • Best practices • Document the process of data generation • Use standard vocabularies to describe data • Use standard vocabularies to describe datasets and data catalogues (ex: DCAT) • Provide stable URIs • Provide data on machine processable formats • Provide metadata to describe data • …
Examples of Best Practices • Data Distribution • Use standard ways to distribute data (ex: data catalogues and APIs) • Provide details about data access • Provide details about data licence • Provide details about dataset provenance and quality • Provide a schedule of dataset updates • Keep a dataset history • Provide ways to collect data consumers feedback • Announce the publication of new datasets or new versions of existing datasets • … • Data usage • Provide feedback about datasets • Provide descriptions about the usage of the dataset • …
Data on the Web Best Practices • For each best practice, a guidance of how to implement must be provided • Some best practices may have more than one way of implementation (to be continued)