260 likes | 470 Views
The International Data Service Center of IZA Nikos Askitas May 2009 IZA Program Directors' Meeting Bonn, Germany. Finnish Snippets. Määränpäähän = to the destination Lentokorkeus = Altitude Maanopeus = speed Huom = Caution Lapsille = Children
E N D
The International Data Service Center of IZANikos AskitasMay 2009IZA Program Directors' MeetingBonn, Germany
Finnish Snippets • Määränpäähän = to the destination • Lentokorkeus = Altitude • Maanopeus = speed • Huom = Caution • Lapsille = Children • Yliopistonkatu = Yli + opiston+Katu = High+School+Street = University Street ….and the question:
Some Stats • Finns just surpassed the Danes as the heaviest drinkers of Europe. • 50% of all alcohol is drunk by 10% of the Finnish drinkers while 90% is drunk by 50% of them. • Most of the alcohol is drunk Friday to Sunday! • 80% of the households are online! • They have a low Gini and recycle heavily. • They are hard to find: 15 Finns/Km^2!
History I The story begins (I am told) back in the 90s: • As a result of a large-scale research project on anonymization of micro data (Hauser/Müller), the concept of „factual anonymization“ becomes accepted • Memorandum by Hauser, Wagner, Zimmermann: Erfolgsbedingungen empirischer Wirtschaftsforschung und empirisch gestützter wirtschafts- und sozialpolitischer Beratung (IZA DP No. 14) An intensive debate is thus seeded on the improvement of the data infrastructure in Germany. The “Commission to improve the informational infrastructure by co-operation of the scientific community and official statistics” (KVI) is put together by the Federal Ministry of Education and Research.
History II The KVI made several recommendations on how to improve the informational data infrastructure in Germany. Several research data centers and service data centers have been organized based upon these recommendations, of which the only one representing labor economics was IdZA (IDSC's predecessor): Research data centers • The RDC of the Federal Statistical Office • The RDC of the Statutory Pension Insurance • The RDC of the Federal Employment Agency Data service centers • German Microdata Lab (GML) at GESIS • IDSC of IZA After a successful evaluation of the first 3 year long pilot phase, which started in 2003, a second phase of financing was awarded to IZA (2008-2010). IDSC in its new form represents IZA's reinforced commitment to the field.
About IDSC in a nutshell • IDSC is International. No national borders or other artificial frontiers. Draw knowledge and experience internationally. Cater to the international research community in line with the Institute’s vision of a virtual institute. • IDSC is about Data. A large component of our work is about inventing, developing, integrating and applying solutions for computing with data and in particular with “difficult” i.e. highly sensitive/confidential data. • IDSC is about Service. IDSC services the data needs of the IZA resident research community, the various global and virtual IZA research communities (Fellow and Affiliate networks etc) and the research community at large in that order. The meaning of the order is dual on the one hand it expresses priority in the sense that IDSC serves the local community first and the remote ones afterwards and on the other hand it expresses deployment order in the sense that local deployment is a preparation step for the large scale deployment. • IDSC is a Center. Besides being an organizational unit of IZA, we are an entity in our own right, with relations to the other IZA units and the world. As a a center, we also aim at becoming the focal point of an international, data-related, service-oriented network of economic-minded technologists and technologically savvy economists. We also aim at becoming the place for scientists to look for data support, data access support and data services with an emphasis on labor economics. • IDSC is about Cooperation. We seek to establish long lasting, mutually beneficial cooperations and partnerships in order to leverage complementary competence and create data products which would otherwise be impossible (cross disciplinary products).
Boards and Committees: Composition The IDSC calls upon the Institute’s resources and the collective expertise of its interdisciplinary Data Committee as well as the guidance of an international Scientific Advisory Board.Scientific Advisory Board • Prof. Dr. Rainer Winkelmann, Univ. of Zurich, Head Advisory Board • Prof. John M. Abowd, Ph.D. Cornell University • Prof. Dr. Ulrike Rockmann, Statistisches Landesamt Berlin-Brandenburg • Prof. Dr. Joachim Wagner, Univ. of Lüneburg • Prof. Christian Zimmermann, Ph.D. University of Connecticut IZA Data Committee • Dr. Douglas J. Krupka, Senior Research Associate • Dr. Nikos Askitas, Head of IDSC and Deputy Head of IT • Dr. Marco. Caliendo, Director of Research • PD Dr. Hilmar Schneider, Director of Labor Policy • Georgios Tassoukis, Database Manager
Boards and Committees: Role • The Scientific Advisory Board The board is new and will have its first meeting on May 29. The meetings will be most likely annual. The role of the board is advisory in nature and aims at keeping the Center close to the needs of the scientific community by adding the „external perspective“. • The Data Committee The IDSC leverages the expertise of this interdisciplinary committee in order to develop a comprehensive approach to working with data which covers technical, technological, legal, ethical and educational aspects.
The core “business areas” of the IDSC • Events We run and jointly organize a variety of high-profile events in Bonn and other locations in Germany and abroad, with topics ranging from new developments in metadata standards to aspects of open access to Primary Research Data. • Data Documentation (idsc.iza.org/metadata). • Remote Processing (idsc.iza.org/josua) • Data Enclave New business areas: data visualization, web based teaching tools, News in Labor economics, web based collaboration tools, internet measurements, geo coding, documentation of qualitative datasets etc
Events • A new highlight in the series of IDSC events is the Annual European DDI User Group Meeting (EDDI). EDDI was conceived and proposed by this speaker and is endorsed by the DDI alliance. We now run the event jointly with GESIS. EDDI 09 is planned for the 4th of December 2009 at IDSC of IZA in Bonn. On December 3 the DDI alliance will be sponsoring a day course in DDI at the IDSC. Kevin Schurer, Director of the UKDA accepted my invitation to open the event. • IDSC hosted the Open Data Foundation Europe 09 Meeting in April 2009. • The German Stata User Group Meeting 2009 is coming up on June 26 with in-depth presentations from experienced Stata users and experts. • In September David M. Drukker, Stata Corp will teach a day course on Stata/Mata at the IDSC of IZA. Tentative topics: programming statistical methods in Stata/Mata, discrete choice analysis, statistical simulations.
The Red Cube Seminar The Red Cube Seminar is organized by the IDSC of IZA. Its aim is to provide a forum for high-quality technology presentations related to the institute's research context. With international lecturers and an audience from Germany and neighboring European countries, it aims at becoming a focal point for data technologists and data analysts. Presentations of broad interest take place biweekly and are open to the public, while the internal working seminars are targeted at members of the IZA technology group and may have a more narrow and application-oriented focus. • www.iza.org/redcube • http://www.google.de/search?q=Red+Cube+Seminar&btnG=Suche&hl=de&sa=2
The (original) motivation "Germany as the third export economy in the world should have an appropriate portion of publications in international Economics journals based on data about Germany. This is NOT the case and the extend to which this is due to the lack of proper, translated, accessible documentation we got work to do."
idsc.iza.org/metadata • Ever expanding number of datasets covered • PDF book for every dataset • DDI source code for every dataset • Searchable web presentation for every dataset • Search API so remote sites can integrate a search. • Concept Hierarchy Assisted search. • Analytics (which dataset, which variable, how many hits, how much time) • A stata metadata (ssc install metadata) which uses the API to search the metadata.
Google: “British Labor Force Survey” Google: “IAB Employment Sample” Google: “Sample Survey of Income and Expenditures Germany”
Data Now provided you have the metadata and the research idea you’d like to be able to compute against the corresponding data. This data is often sensitive and closed (often for good reason). How to unlock the data for research without violating the non disclosure and other regulations of the data provider or data producer? JoSuA!
The IDSC data enclave The enclave conforms to the strictest of data security standards while it strives to achieve the highest possible degree of scientific freedom. To achieve this a properly stratified way to interface with the scientists is used: • Locally through an ultra thin network segment • Remotely via JoSuA and other tools Several Research Projects based on highly sensitive datasets are currently hosted within the IDSC Data Enclave some of which are below (sample publications): • “The Long-term Effects of Start-up Subsidies“ • “Hartz1b: Evaluation of Further Promotion of Education and Training-Programs” • “Eval5hi: IZA Evaluation Dataset” • “Schuleingangsuntersuchung – Einkommensentwicklung und die Gesundheit von Kindern“ based on data from the Bayerisches • Landesamt für Gesundheit und Lebensmittelsicherheit • PISA 2003 2004
Cooperations and Partners • IQB The Research Data Center of the IQB (The Institute of Quality Development in Education) and IDSC completed a plausibility study in 2008 regarding the use of JoSuA for the purpose of offering datasets from the IQB inventory for computing from afar. The two centers have just formalized this cooperation into a concrete one year pilot project. Besides catering to the traditional research customers of IQB, the purpose of this cooperation is to investigate the possibility of sharing information with other research areas including labor economics. Abstracts for the datasets involved can be found here. • ROA (Research Centre for Education and the Labour Market) and IDSC have begun a cooperation centered around two datasets from ROA. IDSC will host and promote the datasets as well as work together with ROA on documenting them in a standard way. • IAB There are several layers of cooperation between IZA, IDSC and the Research Data Center of IAB (Institut für Arbeitsmarkt- und Berufsforschung OR Institute for Employment Research). These include dataset documentation and the preparation of these into English, and the use of JoSuA at IAB (installed there a couple of weeks ago). • DANS The new DDI 3.0 standard allows reuse of already described concepts, questions, etc. by referencing them. This requires that all these items be uniquely identified. Existing technology to allow retrieval of the identified items will not be able to deal with such a large amount of identifiers. IZA and DANS (Data Archiving and Networked Services) helped specify a solution that uses DNS to deal with this scale-problem. Together they plan to build a proof-of-concept that demonstrates this solution. • GESIS and the IDSC jointly organize EDDI, the Annual European DDI User Group meeting. EDDI 09 takes place on December 4 at IDSC. The DDI alliance is to sponsor a one day DDI workshop taught by Wendy Thomas on December 3 at IDSC. • FDZ RV The cooperation with the RDC of the RV has so far involved documentation streamlining: translations from German to English and production of DDI metadata containers.
Contact • Where/who/how • Helpdesk
N. Askitas Head of the IDSC @ IZA IZA, P.O. Box 7240, 53072 Bonn, Germany Phone: +49 (0) 228 - 38 94 -525 Fax: +49 (0) 228 - 38 94 180 E-mail: nikos@iza.org http://www.iza.org