230 likes | 418 Views
Christophe Reffay UMR STEF, ENS Cachan - IF É Christophe.Reffay@ens-cachan.fr. Methodology for transliteracy research (the case of Concours Castor) Sharing research data inside the Translit project and beyond. 27 June 2013 – AIERI – Dublin, Ireland.
E N D
Christophe Reffay UMR STEF, ENS Cachan - IFÉ Christophe.Reffay@ens-cachan.fr Methodology for transliteracy research (the case of Concours Castor)Sharing research data inside the Translit project and beyond 27 June 2013 – AIERI –Dublin, Ireland Christophe Reffay
Introduction:Towards a world with (open) data • Check (transparency)Open Notebook Science (J.-C Bradley, 2006) • Make data tangible [replication possible]Dataverse (G. King, 2007)Benchmarks: compare algorithms on datasetsComputer science (i.e.: MLcomp, IPOL,…) • Many others… Christophe Reffay
Logics motivating data sharing in humanities (S. Duchesne, 2013) • Patrimonial • Economical • Scientific Publication Data Historical doc. Hard to build Make the proof Christophe Reffay
Data sharing initiatives in humanities • France • Calico (Bruillard) • Mulce (Reffay & Chanier, 2007) • Huma-Num (TGE Adonis): • beQuali (Duchesne), DataPublication (Chanier), … • United Kingdom • ESDS Qualidata in UK Data Service • Datacite: a list of 650 repositories Data digitally available make it technically manageable Christophe Reffay
What makes datasets… • Sharable? • Visibility: standard metadata (OAI) • Access: Public?/not, Long term, Curation • Ethics: consent / Anonymisation • Reusable? (readable and computable) • Documentation (data, process, context) • Structure (transparent, manageable) • Format (interoperable) • Interesting? / Re-Analysable? Christophe Reffay
“Translit” project: Trans-literacy on:Media – Information - Computer • Researchers from 3 different cultures • Looking at their common concepts • Sharing some vocabulary • Analysing common or transferable skills • Having shared or separated experiments • Information search tasks observation • Computer Science: Beaver contest Christophe Reffay
Broadcast? or Tuned sharing path? • Whole documented package (all at once) • Huge effort for the provider • Is it adapted to potential re-users? • A path to be built by both parties • General description of the context and global description of data (as a “Data paper”) • Declaration of interest – Start reuse • Tuning data towards new research questions Christophe Reffay
Beaver contest: Introduction(fr: Concours Castor Informatique) • Goal: Discover some principles • Specific/Useful in Informatics • Directly available from any connected classroom – (duration 45 min.) • With funny/interactive tasks • Without any pre-requisite • Teams: 1 or 2 pupils • 1 subscription per group (by the teacher) • For 2012: 90 794 pupils, 721 schools Christophe Reffay
A task example: The Text Machine Paste Paste Paste QUESTION QUESTION Christophe Reffay
Some statistics publicly available Source: http://castor-informatique.fr/resultats.php (June 2013) Christophe Reffay
Data and documentationavailable for research • The data collected during the contest • More than 90 000 participants (single/pair) • 721 schools • All results for each task/team in a database • All interaction registered in a database • Some observations in classrooms • Contest rules: (web site documentation) • All the tasks: (web site try it) • Questionnaires (2011; 2012) • Interviews (coming soon…) database Christophe Reffay
#Question_id Key Folder Name AnswerType ExpectedAnswer #Contest_id Name Level Year Status NbMin Folder MinScore MaxScore Order Answer Score Date #Team_id #group_id StartTime EndTime Score IsUnofficial #School_id Name Region NbStudents Validated #Group_id #School_Id Grade GradeDetails #Access Name NbStudents Nb_Team #Contest_Id StartTime IsUnofficial #Cont_id Lastname Firstname Gender #Team_id GlobalRank RankInSchool The current format: SQL Databases Example: List of scores and time for all official teams in the 2012 contest for grade 6 in the “Bordeaux” region. SQL request: SELECT DISTINCT T.ID, T.`startTime`, Q.key, TQ.`score`, TQ.`date` FROM `group` G JOIN `school` S ON ( G.schoolID = S.ID ) JOIN `team` T ON (T.groupID = G.ID) JOIN `team_question` TQ ON (T.ID=TQ.teamID) JOIN `question` Q ON(TQ.questionID=Q.ID) WHERE T.isUnofficial=2 AND S.`region`="bordeaux" AND G.`grade`=6 ORDER BY T.ID C-Q Contest Question T-Q School Group Team Contestant Christophe Reffay
Result for Request 1 (3096 lines) Christophe Reffay
Example N°2: # team /grade/region SQL Request: SELECT G.grade,S.region, COUNT(*) FROM `group` G JOIN `team` T ON ( T.groupID = G.ID ) JOIN `school` S ON ( G.schoolID = S.ID ) WHERE G.contestID=6 AND T.isUnofficial=2 GROUP BY G.grade, S.region Christophe Reffay
Example N°2: # team /grade/region Resulting table: Christophe Reffay
Localisation of the 721 schools (2012) Christophe Reffay
First steps of the sharing path? For each participant/team define: • Level (grade 6 to 13), Group, Region, • Gender (M / F / MM / FF / MF) • For each task: Right / Wrong / No answer=> Statistics gender/level/tasks • For each task: Content of answers=> didactics • For each task: Time/sequence=> Behaviour Christophe Reffay
An interesting comparison with Social Network Analysis: Jacob L. Moreno (1943) measured affinity networks and showed that: Pupils of age 11-13 do prefer same gender peers Gender/Tasks Statistics (grade=6) Christophe Reffay
Documenting the research process:on the fly => “Roadmap” (I. Quentin) • Intermediary hypotheses & objectives • The rationales of giving up • The needs for new data • The origin of the data (who/when/where) • All needed access information • How data were collected • Analysis methods and their accuracy Christophe Reffay
Suggestions - Proposal • Clean up your data ASAP • Reference version in open formats • Document your investigation process • For yourself, your team/project • For forthcoming partners => Make your data visible & accessible • Make your data interesting for someone else : communicate/publish a "Data Paper" Christophe Reffay
Any Internet user Any researcherunder contract? Authenticated? Data Paper? Circles for access to data TranslitProject participants Informationsearchtaskobservation Concours Castor Informatique Christophe Reffay
Documenting a dataset • Define guidelines for each kind of data • Questionnaires, interviews, Observations • Databases (traces, results, …) • Define a manageable way to build this • Only useful information (upon request) • Written by requesters? • Capitalize requests and documentation Christophe Reffay
Thank you. Merci "Go raibh maith agat" Christophe Reffay UMR STEF, ENS Cachan - IFÉ Christophe.Reffay@ens-cachan.fr