1 / 44

http://www.dans.knaw.nl Dirk Roorda, coordinator infrastructure

http://www.dans.knaw.nl Dirk Roorda, coordinator infrastructure. Overview. Part 1: The rising role of data Part 2: The free use of data Part 3: The care for data Part 4: The re-use of data. Part 1: The rising role of data. http://en.wikipedia.org/wiki/Exabyte

kisha
Download Presentation

http://www.dans.knaw.nl Dirk Roorda, coordinator infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://www.dans.knaw.nl Dirk Roorda, coordinator infrastructure

  2. Overview Part 1: The rising role of data Part 2: The free use of data Part 3: The care for data Part 4: The re-use of data

  3. Part 1: The rising role of data http://en.wikipedia.org/wiki/Exabyte Internet size (May 2009): 500 EB 500.000 PB 500 million TB 500 million fat USB disks 500 billion memory cards of 1 GB 70 memory cards per person

  4. Data deluge http://www.datadeluge.com/http://en.wikipedia.org/wiki/File:Tree_of_life_SVG.svg http://tolweb.org/tree/

  5. Where does it come from? • Instruments • satellites, sensors, dna-sequencing • Records • administrations, censuses, surveys • Digitisation • the analog legacy • Hobby • pictures, movies, genealogy • Integration • better interoperability of existing data

  6. The driving force Information and Communication Technology Babbage Analytical Engine 1870

  7. A datacenter Genealogy 2,5 PB 5328 servers 1,12 MW http://www.ancestry.com/ http://blog.familytreemagazine.com/insider/Inside+Ancestrycoms+TopSecret+Data+Center.aspx

  8. A closer look • Linguistics • text corpora, automatic translation • Philology • how to read a million books? • History • historical census data • Archeology • archive law, commercial research

  9. Linguistics and Philology A chronometric approach to Indian alchemical literature Assessing frequency changes in multistage diachronic corpora Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets A Corpus Study of the Rigveda Dictionary generation for less-frequent language pairs using WordNet An exercise in non-ideal authorship attribution: the mysterious Maria Ward http://llc.oxfordjournals.org/

  10. History http://www.volkstellingen.nl/nl/

  11. http://www.volkstellingen.nl/en/

  12. Archaeology http://edna.itor.org/nl/intern/upload_directory/a00002/downloads/IMG0013.tif

  13. Archaeology (2) http://edna.itor.org/nl/oai/oai_addi/oai_addi/OAI:EVALMA:a00002.xml/

  14. Part 2: The free use of Data

  15. Open Access Data is information Information is knowledge Knowledge is power Why share it?

  16. Open Access Shared knowledge is double knowledge Without free sharing of knowledge, scientific progress will halt Tensions between sharing and not sharing remain, though

  17. A good Example http://www.ploscompbiol.org/home.action

  18. Work to do • organise your data • let your data work together with those of others • (colleagues, future scientists, the public) • ask new questions to the data • because there is so much of it • create new (virtual) data collections

  19. Part 3: The care for data

  20. Research Data Recycling • existing data • collecting by experiments, surveys • primary research data • verifying results by others • preserving unique data from experiments • compilation, aggregation, annotation • databanks • data mining, analysis, visualisation • new data as research input

  21. Challenge: Software Operating system (DOS, Windows 95, ...) Programming Languages (Basic, Pascal) File formats (Word Perfect, dBase) Applications (Addressbook, Websites) Old data may be locked up in old software.

  22. Meeting the challenge To prevent the problem in the future Backward compatibility Open Standards Open Source Applications Modular software engineering keep data separated from interface and business logic To remedy the problems of the past Emulation Migration

  23. Challenge: Human organisation Forgotten jargon Forgotten knowledge No metadata Websites with broken links

  24. Jargon • II.17. Posterior berry aneurysm with subarachnoid bleed. • II.18. Subarachnoid bleed with extension into the ventricles. • II.19. Ruptured berry aneurysm at the end of the internal carotid artery, with obstructive hydrocephalus. Morgagni found the rupture. • II.22. Subarachnoid hemorrhage. http://www.pathguy.com/morgagni.htm

  25. Meeting the challenge Persistent Identifiers Enough Metadata Codification of knowledge and practices Wikipedia Datamanagement early on

  26. Part 4: The re-use of data

  27. Data management Use common infrastructure rather than private means Use open formats rather than proprietary formats Use open source software rather than closed software Use standard ways of documenting data taxonomies, ontologies, metadata schemes

  28. Common Infrastructure Local file shares University repository DANS European Infrastructures

  29. DANS http://easy.dans.knaw.nl/dms

  30. EASY

  31. Dataset

  32. Datafiles

  33. Metadata

  34. linguists make their technology accessible - resources algorithms techniques humanities and social sciences - they are the target users

  35. Geleerdenbrieven=Circulation of Knowledge Archiving = circulation of information

  36. Keep imagining

More Related