170 likes | 249 Views
Google Refine for Data Quality / Integrity . Context. BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality / Integrity. Context. BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection
E N D
Context BioVeL Data Refinement Workflow • Synonym Expansion / Occurrence Retrieval • Data Selection • Data Quality / Integrity
Context BioVeL Data Refinement Workflow • Synonym Expansion / Occurrence Retrieval • Data Selection • Data Quality / Integrity
In Google’s Own Words “Google Refine is a power tool for - working with messy data, - cleaning it up, - transforming it from one format into another, -- extending it with web services, - and linking it to databases”
In Google’s Own Words “Google Refine is a power tool for - working with messy data, - cleaning it up, - transforming it from one format into another, - - extending it with web services, - and linking it to databases” …. and can be run in isolation
Installation • Download zip file from http://code.google.com/p/google-refine/wiki/Downloads • Extract file • Run google-refine.exe
Features Clustering / Grouping use case : group taxon name and merge similar groups
Features Filtering use case : filter out records which do not have ‘museum’ / ‘university’ / ‘marine’ in data provider name
Features Data Exclusion use case : exclude records that have been faceted / filtered
Features Extending Data use case : add ISO country code column use case : add column(s) by parsing taxon name
Features Reconciling Data use case : retrieve associated names from ‘WORMS’
Features Save / Replay User Actions use case : extract scientific names from name labels
Features Build Extensions use case : BioVeL Extension - interaction with Taverna - add additional functionality specific to the BioVeL context (e.g ECAT Name Parser)
Future Possibilities remote server could be deployed as a remote server with the possibility to use shared resources (extensions, data, history actions)
Future Possibilities integration with existing applications, either as a module or using REST API calls
Future Possibilities central application which can be used to run scripts, call web services and even interact with software applications
Thanks Questions / Suggestions / Comments