80 likes | 216 Views
Working Group: DFT - some use cases - Peter Wittenburg, Raphael Ritz. Use Case: CLARIN Collection Builder.
E N D
Working Group: DFT- some use cases -Peter Wittenburg, Raphael Ritz
Use Case: CLARIN Collection Builder Collection building across repositories is essential for running analysis. PhD students f.e. create their collection which exists of a MD object having a PID and storing many PIDs to refer to its components. Such a collection is an aggregation, but has an identity, can be cited, etc. PID data MD MD data collection builder data MD MD data data MD MD data data MD MD data MD+ data MD MD data repository 2 repository 1
Use Case: Replication in EUDAT Data Domain community 1 community 2 community 3 replicating data from different types of data organizations from different communities = creating a coherent domain of data is not trivial data MD data MD data PID data MD+ MD+ data MD+ data EUDAT data domain
Use Case: EUDAT Data Domain urgently need PID Info Types: • cksm • cksm type • data_URL+ • metadata_URL+ • ROR_flag • mutability_flag • access_rights_store • etc.
Use Case: Curate Gappy Sensor Data in Seismology sensing p(t-5) p(t-4) p(t-3) p(t-2) p(t-1) p(t) real-time data due to limited delay, gappy data since referred data has gaps, dynamic data since gaps are being filled at random times real-time p(t-5) p(t-3) p(t-1) p(t) receiving p(t-4) p(t-2) packaging completing window (t1,k) window (t1,k+1) analyzing, referring window (t2,k) window (t1,k+3)
Use Case: Crowd Sourcing Data Management buffering, chunking annotating intermediatestore stream data metadata database annotations Curation Structuring PID system Curation/Structuring - create proper MD with all relations - register PIDs for all objects permanent store stream data metadata annotations analysis
Use Case: Language Technology Workflows collection collection collection processing module k processing module l processing module m metadata metadata metadata metadata k metadata k metadata l PID system processing modules are service objects
Use Case: Language Technology Workflows processing module x read collection MD read MD interpret MD & get PID get data process data & create new data register PID create MD* update collection if more in collection go 2 end • Typical processing modules are: • tokenize texts • tag part of speech in texts • parse (shallow) of texts • recognize named entity in texts • filter speech • detect voiced segments • detect speaker of segments • do rough classification of segments • etc.