1 / 27

Weaving the Web of European Social Science

Weaving the Web of European Social Science. Jostein Ryssevik Director of Technology and Development Nesstar. The last page of the Internet. Go there. Next. ). (. time spent on digesting and thinking. Maximize. time spent on finding and accessing. Tools for thought.

buffy
Download Presentation

Weaving the Web of European Social Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

  2. The last page of the Internet Go there Next

  3. ) ( time spent on digesting and thinking Maximize time spent on finding and accessing Tools for thought “Much more time went into finding or obtaining information than into digesting it.” Dr. J.C.R. Licklider

  4. As We May Think • Vannever Bush and Memex (1945) • A “furniture” able to enter and hold large amount of scientific information on microfilm • ...all indexed • ...all hyperlinked (or linked together by trails) • ...even supporting the idea of user-supplied links as well as user-supplied annotations • ...allowing public information to be privatized/customized • ...as well as private information to become public

  5. 50 years later • ...but even today, comparative social science research in Europe is hampered by the fragmentation of the scientific information space. • Data, information and knowledge are scattered in space and divided by technical, lingual and institutional barriers. • As a consequence too much of the research are based on data from a single nation, carried out by a single-nation team of researcher and communicated to a single-nation audience.

  6. Scenarios from “The Social Science Dream Machine” • …a user is looking for data on political trust from three different regions of Europe • He uses the geographical interface to a research location service to circle in the relevant regions on the map and enters the additional search criteria. • From the returned hit list he is able to browse layers of increasingly detailed metadata describing potential sources. • He is even allowed to perform simple statistical analysis and visualizations on-line to make sure that the data fulfills his requirements. Several datasets can be brought to the desktop at the same time to ease the comparison. • As soon as a decision has been made, the chosen datasets can be downloaded and automatically converted to the format of his favorite statistical package. All relevant metadata travels along with the data to assist the researchers in his analysis.

  7. Another scenario • …a user analysing a group of variables in dataset X would like to know if there are similar datasets from other countries that could be used for a comparative study. • ..by hitting the “get comparable dataset” button, a list of potential datasets is immediately returned. • …she would also like to have an overview of knowledge products (papers, articles etc.) based on this study • …as references and links to derived knowledge products are an integrated part of the metadata of each single dataset a sorted list can be displayed by a single keystroke. Some of these references are also including e-mail and website-addresses to the relevant researchers. • …finding a problem with one of the variables, she writes a note and appends it to the ”user experience-section” of the metadata to allert future users (she also submits a rating of the resource according to an established quality standard within here community) • ...and when the research paper is ready and published in an on-line journal, links to the dataset is added to allow future users to revisit her analysis

  8. Yet another scenario... (I promise – this is the last one) • …a user that is reading an article in an on-line journal finds a link that connects him to the data that was used by the author to underpin the argument. The link allows the user to rerun the analysis, and also to dig deeper into the same data-source. • ...through his ”digital research assistant” (an active agent), he is also also made aware of several other relevant data sources published after the article was written and he uses these to challenge the conclusion of the author • ...he also leaves the query with his ”digital reserach assistant” to make sure that he is alerted if a new dataset meeting his requirements is published somewhere around the world at a later stage (the agent is instructed to only include datasets from “trusted sources” and moreover to exclude any dataset below a certain sample size)

  9. The Web revolution – a global Memex • The very first really global information system • The very first “many to many” communication technology • From many local to a single global hypertext-space • True multi-media: The Web has taken all existing media as its content • Decentralized • Scalable

  10. Local Global Unlimited amount of information (unbounded) Limited amount of information (bounded) The complete information space is known, categorized and indexed Only parts of the information space are known, categorized and indexed A single unified system of concepts securing unambiguous categorization of recourses as well as semantic interoperability across resources (“complete understanding”) Several partially competing systems of concept producing alternative categorization of resources as well as low levels of semantic interoperability across resources (“partial understanding”) Deep standardization Shallow standardization Centralized: a single publisher and registration authority Decentralized: millions of publishers – no central authority Link integrity Broken links (the famous 404) Limited scalability Unlimited scalability Local versus global information systems • The original Web • a naming convention (the URL) • a simplistic hypertext format (HTML) • a simple transport protocol (HTTP) “The design of the Web fundamentally differed from traditional hypertext systems in sacrificing link integrity for scalability.” Tim Berners-Lee: “Web Architecture: Describing and Exchanging Data”, http://www.w3.org/1999/04/WebData

  11. The current Web The Semantic Web Towards the Semantic Web “Human endeavor is caught in an eternal tension between the effectiveness of small groups acting independently and the need to mesh with the wider community. A small group can innovate rapidly and efficiently, but this produces a subculture whose concepts are not understood by others. Coordinating actions across a large group, however, is painfully slow and takes an enormous amount of communication. The world works across the spectrum between these extremes, with a tendency to start small—from the personal idea—and move toward a wider understanding over time.” Tim Berners-Lee, James Hendler and Ora Lassila The Semantic Web, Scientific American, May 2001 “The Semantic Web is an extension of the current Web in which information is given well defined meaning, better enabling computers and people to work in cooperation.” Tim Berners-Lee • From documents to data • From brainware to software • From machine readable to machine “understandable” information • Metadata – the glue of the Semantic Web • A framework for “knowledge representation” – RDF • The introduction of namespaces (allowing different system of terms and concepts to cohabitate in a single information system) • Partial understanding/agreement • The vision: the creation of a dynamic framework facilitating cooperation/interoperability across domains and communities - gradually expanding the “web of understanding”.

  12. Data Kowledge products 100110100101 110110011001 Software Brainware Brainware The Semantic Web at work

  13. Hype or reality.....? • Not many Semantic Web applications out there so far • RDF not really taken off • ...and the other layers of the semantic Web still on the planning stage • ....“it will never work” • ....”will be overshadowed and overtaken by the other buzz-word of today's Web community – “Web services”” • ...on the other hand: • ...Berners-Lee and his crew has been right before • ...the amount of RDF-coded information is increasing • ...the number of tools and platforms is increasing • ...the “big guys” are gradually entering the scene, most noteworthy Adobe’s new “eXtensible Metadata Platform (XMP) using RDF to model and represent metadata within documents and other information objects.

  14. Labeled stuff Unlabeled stuff The bean example is taken from: A Manager’s Introduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf Metadata – data about.....

  15. Sharing The functions of metadata Finding Understanding Assessing

  16. An example from the world of statistics • Human readable information about data quality: • information about sampling methods and procedures allowing a human reader to assess whether a statistical resource has been created according to a certain standard • Machine understandable information about data quality: • sample size • response rate • sampling methodology (according to a specific classification of methodologies – a controlled vocabulary identified by a namespace where the meaning of the different terms are well defined) • the number of publications produced on the basis of the resource • ratings (given by persons and organizations according to a well defined rating system – allowing a user to instruct a software agent to skip resources where the quality score given by a defined set of trusted organizations are below a certain limit) Users of metadata • Metadata for human consumption • .... any human readable information that allow a user to find, assess and understand information objects in order to use them in a sound way • Metadata for machine consumption • ....any machine processable information that allow a piece of software to find, assess and “understand” information objects in order to manipulate them in a sound way on behalf of a human user • Metadata meant for humans versus machines are often just different representations of the same type of information.

  17. ...all hyperlinked Catalogue info Finding Data dictionary Understanding Data quality Assessing Articles/reports Sharing 1 Notes/discussions Sharing 2 Metadata on the Web Data

  18. An extended metadata concept • Not only descriptions of data but also knowledge products derived from the use of data • A dynamic concept - metadata are constantly developing through the life-time of a dataset • A variety ofauthors - many who are not data producers • ...as opposed to what we might call the “Cathedral View” on metadata: • .... metadata seen as a coherent and centralised collection of information with clearly defined boundaries, provided by a single authority for a defined community of users.

  19. Established in 1995 to create a universally supported metadata standard for the social science community • Initiated and organised by the the Inter-University Consortium for Political and Social Research (ICPSR), Michigan, USA • Members coming from social science data archives and libraries in USA, Canada and Europe and from major producers of statistical data • First version of the standard expressed as an SGML-DTD • Translated to XML in 1997 • Extensive testing carried out spring-summer 1999 • DDI 1.0 published spring 2000 • DDI 1.1 with minor revisions and some additions published autumn 2001 • The DDI 2.0 process???

  20. Achievements • Acceptance • fast take-up in the community of data archives and data libraries world-wide • Community building • revitalised the co-operation and sharing of know-how and technologies among the archives and libraries • Strengthening of the ties to the data producers • Software development

  21. The DDI in action – what do we know? • The costs of migrating a data archive to the DDI is high (much higher than the cost of any DDI-compliant archiving software) • The DDI is a “cathedral standard” (no data provider is buying the full package – they are all using the DDI building blocks to build their own more modest “parish churches”). • The DDI is a very loose structure loaded with alternatives and ambiguities (a single study described by two different archives will probably look quite different) • The DDI is only telling half the story (data providers will have to add their own local guidelines on top of the DDI (controlled vocabularies, mandatory elements etc) to secure internal standardization • The DDI is inflexible (there is no extension mechanism that allows a data provider to add local elements without breaking the standard) • A pure “bottom-up” approach: The DDI is used to describe concrete files or products coming out of the statistical process. It has no level of abstraction above or beyond a physical statistical product • Machine-understandable versus human-understandable: Using XML does not automatically create metadata that is complete and logical enough to drive software processes

  22. Nesstar - vision To develop a truly distributed platform for electronic publishing of statistical data, building on object technology, open (metadata-) standards and lightweight Internet protocols. ....or simply To bring the models, technologies and collective energy of the Web to the world of statistics.

  23. An architecture for a totally distributed global data library The ability to locate multiple data sources across national boundaries The ability to browse detailed information about these data sources ..and to do data analysis and visualisation over the net ..or to download the appropriate subset of data in one of a number of formats Supporting standard micro-data as well as aggregated tables/cubes Allowing the user to bookmark/hyperlink resources in the data and metadata repositories searches datasets analysis (tables, models etc.) ..and to hyperlink these resources from external Web-objects (like texts) Support for metadata indexing using a multilingual thesaurus Powerful data preparation tools, including a system for remote publishing of data to NESSTAR servers Nesstar – an overview

  24. Nesstar - background • Originally funded by EC under the Electronic Publishing Program • NESSTAR (1998-1999) • FASTER (2000-2001) • Nesstar Limited established in June 2001 • Owned jointly by The Norwegian Social Science data Services (NSD) and UK Data Archive • Offices in Colchester (UK) and Bergen (Norway)

  25. Data Publishers/owners End-user client Nesstar - a fully distributed Data Web

  26. MADIERA (Multilingual Access to Data Infrastructures of the European Research Area) • An EC funded project to commence October 2002 • Purpose: to develop and build a European-wide virtual social science data library based on Nesstar and Semantic Web technology • Building blocks: • A shared metadata standard (the DDI) • A shared technological platform (Nesstar) • A multilingual thesaurus (ELLST) • A classification system for social science data resources • A georeferencing system for social science data resources • A methodology for identification of comparable data • A feedback system for user supplied information • A system for creating links to between data and knowledge products

  27. From data graveyards to knowledge greenhouses ? Where are we heading....?

More Related