160 likes | 329 Views
Importing and Exporting DataShop Data http://pslcdatashop.org Slides current to DataShop version 4.1.8. Brett Leber Interaction Designer. Is your data right for DataShop?. It might be if it… was produced by an intelligent tutoring system
E N D
Importing and Exporting DataShop Data http://pslcdatashop.org Slides current to DataShop version 4.1.8 • Brett Leber • Interaction Designer
Is your data right for DataShop? It might be if it… • was produced by an intelligent tutoring system • follows a student action, tutor response sequence (untutored actions OK) • is primarily textual • encodes some notion of “steps” What kind of data do you have?
Benefits of importing your data DataShop offers: • Web-based visualization and analysis tools for exploring your data • Secure storage and backup • A location on the web where anyone you want can access your data • Web services for programmatic access
How do I get data in? • Directly/Real-time • Some tutors are logging directly to the PSLC logging database • CTAT-based tutors (when configured correctly), can log to disk or to the logging database over the internet • Indirectly • Other tutors are logging to their own file formats or their own databases • These data require a conversion process • Many studies are in this category
XML vs. tab-delimited format XML • Richer description than tab-delimited • More fields • Problem start time • Problem description • Problem tutor flag • More verbose • Requires some familiarity with XML • Not especially readable Tab-delimited • More concise • Can edit in Excel • More easily shareable • Less rich than XML • Missing problem start time, description, and tutor flag
Tutor Message Format <context_message context_message_id="02CE3AE5-F6D5-9177-913F-C34730F1096C" name="START_PROBLEM"> • <meta> • <user_id>student01</user_id> • <session_id>08xz013</session_id> • <time>2010/02/22 06:43:47.002</time> • <time_zone>US/Eastern</time_zone> • </meta> <dataset> <name>Learn a Language Fall 2007</name> <level type="unit"> <name>Learning Logging</name> <problem><name>Translating Tech Talk</name></problem> </level> </dataset> </context_message>
Tutor Message Format • <tool_message • context_message_id ="02CE3AE5-F6D5-9177-913F-C34730F1096C"> • <meta> • <user_id>student01</user_id> • <session_id>08xz013</session_id> • <time>2010/02/22 06:45:48.014</time> • <time_zone>US/Eastern</time_zone> • </meta> • <semantic_event • transaction_id="B503948-9164-DD83-EBB2-1589FD38D435" • name="ATTEMPT" /> • <event_descriptor> • <selection>_level0.VideoPlayerInstance1.sliderButtonName</selection> • <selection type="media_file">mymovie.flv</selection> • <selection type="clip_length">00:08:00.0</input> • <action>cue</action> • <input type="start_cue">00:04:34.8</input> • <input type="stop_cue">00:05:42.2</input> • </event_descriptor> • </tool_message>
Tutor Message Format • <tutor_message • context_message_id ="02CE3AE5-F6D5-9177-913F-C34730F1096C"> • <meta> • <user_id>student01</user_id> • <session_id>08xz013</session_id> • <time>2010/02/22 06:43:56.367</time> • <time_zone>US/Eastern</time_zone> • </meta> • <semantic_event • transaction_id="B503948-9164-DD83-EBB2-1589FD38D435" • name="RESULT" /> • <event_descriptor> • <selection>_level0.VideoPlayerInstance1.sliderButtonName</selection> • <selection type="media_file">mymovie.flv</selection> • <selection type="clip_length">00:08:00.0</input> • <action>cue</action> • <input type="start_cue">00:04:34.8</input> • <input type="stop_cue">00:05:42.2</input> • </event_descriptor> • <action_evaluation>INCORRECT</action_evaluation> • <tutor_advice>Your answer is not correct. Select only the portion of the video where the man it talking about his family.</tutor_advice> • <skill> • <name>family_words</name> • <category>video_portion_selection</category> • </skill> • </tutor_message>
Same thing in tab-delimited And so on
Tools: XML vs. tab-delimited format XML • Java Logging Library • Log in XML to disk or to a logging server • http://pslcdatashop.org/about/libraries.html • Flash Logging Library • Log to a logging server • http://ctat.pact.cs.cmu.edu/index.php?id=logging-flash • Build a tutor with CTAT without programming • Can log to disk or to a logging server • http://ctat.pact.cs.cmu.edu • Convert to XML via your own program • Transform existing log data into valid Tutor Message Format • Validate your XML with a tool we’ve created • http://pslcdatashop.web.cmu.edu/xmlvalidator.html Tab-delimited • DataShop Import Tool • Verify your import file with our Verification Tool • http://pslcdatashop.web.cmu.edu/importverify.html
Documentation For XML: • Guide to the Tutor Message Format:http://pslcdatashop.org/dtd/guide/ For tab-delimited format: • http://pslcdatashop.org/about/importverify.html To learn about terminology: • http://pslcdatashop.org/help?page=terms To learn about existing DataShop output formats: • http://pslcdatashop.org/help?page=export
Case Study: Chinese Writing Study Fall 2009 http://www.learnlab.org/research/wiki/index.php/Perfetti_-_Read_Write_Integration • Researchers presented the DataShop team with their data, which was a tabular format unlike the DataShop format. • DataShop team consulted with the research team to see which DataShop-required fields were missing and which new fields were extra. • DataShop team and researchers arrived at definitions of problems, steps, and knowledge components. • DataShop requires a correct/incorrect tagging of each attempt, so correctness was determined by a threshold (eg, 0.5) • DataShop consultant (Alida) wrote a converter to convert from this tabular format to XML, and imported into DataShop.
Future of importing and the format • Push-button import • Richer, more-flexible format • Multimedia (audio) • Dialogue data
Exporting from DataShop • From the website: • By transaction • By student-step • By student-problem • From web services: • By transaction • By student-step
Exporting from DataShop • Log in to the web application. • Choose a dataset. • Click “Export” tab. • Choose a level of granularity (transaction, step, or problem). • Choose a sample. • Click export button. Tip: “All Data” sample is cached for transaction export, so choosing that sample results in fastest export.