1 / 54

PSLC DataShop Introduction pslcdatashop Slides current to DataShop version 4.1.8

PSLC DataShop Introduction http://pslcdatashop.org Slides current to DataShop version 4.1.8. John Stamper DataShop Technical Director. John Stamper DataShop Technical Director Alida Skogsholm DataShop Manager, Developer Brett Leber Interaction Designer Duncan Spencer DataShop Developer

vquintanar
Download Presentation

PSLC DataShop Introduction pslcdatashop Slides current to DataShop version 4.1.8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PSLC DataShop Introduction http://pslcdatashop.org Slides current to DataShop version 4.1.8 • John Stamper • DataShop Technical Director

  2. John Stamper DataShop Technical Director Alida Skogsholm DataShop Manager, Developer Brett Leber Interaction Designer Duncan Spencer DataShop Developer Shanwen Yu DataShop Developer Sandy Demi QA (Quality Assurance – Testing) The DataShop Team 2

  3. Central Repository Secure place to store & access research data Every LearnLab and every study Supports various kinds of research Primary analysis of study data Exploratory analysis of course data Secondary analysis of any data set Analysis & Reporting Tools Focus on student-tutor interaction data Learning curves & error reports provide summary and low-level views of student performance Performance Profiler aggregates across various levels of granularity (problem, dataset levels, knowledge components, etc.) Data Export Tab delimited tables you can open with your favorite spreadsheet program or statistical package New tools created to meet highest demands What is DataShop? 3

  4. Repository

  5. Web Application • Knowledge component model analysis with learning curves • Learning curve point decomposition

  6. Web Application • Performance Profiler tool for exploring the data • Easy knowledge component model creation

  7. What does the data look like? • Transaction • A transaction is an interaction between the student and the tutoring system. • Students may make incorrect entries or ask for hints before getting a step correct. Each hint request, incorrect attempt, or correct attempt is a transaction; and a step can involve one or more transactions. • Step • A step is an observable part of the solution to a problem. Because steps are observable, they are partly determined by the user interface available to the student for solving the problem.

  8. Directly Some tutors are logging directly to the PSLC logging database CTAT-based tutors (when configured correctly) Indirectly Other tutors are logging to their own file formats or their own databases These data require a conversion process Many studies are in this category How do I get data in? 8

  9. Improving learning by improving the cognitive model: A data-driven approach Cen, H., Koedinger, K., Junker, B.  Learning Factors Analysis - A General Method for Cognitive Model Evaluation and Improvement. 8th International Conference on Intelligent Tutoring Systems. 2006.Cen, H., Koedinger, K., Junker, B.  Is Over Practice Necessary? Improving Learning Efficiency with the Cognitive Tutor. 13th International Conference on Artificial Intelligence in Education. 2007. Koedinger, K. Stamper, J. A Data Driven Approach to the Discovery of Better Cognitive Models . 3rd International Conference on Educational Data Mining. 2010. Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. (in press) A Data Repository for the EDM commuity: The PSLC DataShop. To appear in Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC Press.

  10. Why we need better expert & student models in ITS Two key premises • Expert & student model drives instruction • Cognitive model in Cognitive Tutors determine much of ITS behavior; Same for constraints… • These models are sometimes wrong & almost always imperfect • ITS developers often build models rationally • But such models may not be empirically accurate • A correct cognitive model should predict task difficulty and transfer => generate smooth learning curves => Huge opportunity for ITS researchers to improve their tutors

  11. Cognitive Model Determines Instruction

  12. Cognitive Tutor Technology Cognitive Model: A system that can solve problems in the various ways students can 3(2x - 5) = 9 If goal is solve a(bx+c) = d Then rewrite as abx + ac = d If goal is solve a(bx+c) = d Then rewrite as abx + c = d If goal is solve a(bx+c) = d Then rewrite as bx+c = d/a 6x - 15 = 9 2x - 5 = 3 6x - 5 = 9 • Model Tracing: Follows student through their individual approach to a problem -> context-sensitive instruction

  13. Cognitive Tutor Technology Cognitive Model: A system that can solve problems in the various ways students can Hint message: “Distribute aacross the parentheses.” Bug message: “You need tomultiply c by a also.” Known? = 85% chance Known? = 45% 3(2x - 5) = 9 If goal is solve a(bx+c) = d Then rewrite as abx + ac = d If goal is solve a(bx+c) = d Then rewrite as abx + c = d 6x - 15 = 9 2x - 5 = 3 6x - 5 = 9 • Model Tracing: Follows student through their individual approach to a problem -> context-sensitive instruction • Knowledge Tracing: Assesses student's knowledge growth -> individualized activity selection and pacing

  14. If you change cognitive model you change instruction • Problem creation, selection, & sequencing • New skills or concepts (= “knowledge components” or “KCs”) require: • New kinds problems & instructional activities • Changes to student modeling – skillometer, knowledge tracing • Feedback and hint message content • One skill becomes two => need new hint messages for new skill • New bug rules may be needed • Even interface design – “make thinking visible” • If multiple skills per step => break down by adding new intermediate steps to interface

  15. Expert & student models are imperfect in most ITS • How can we tell? • Don’t get learning curves • If we know tutor works (get pre to post gains), but “learning curves don’t curve”, then the model is wrong • Don’t get smooth learning curves • Even when every KC has a good learning curve (error rate goes down as student gets more opportunities to practice),model still may be imperfect when it has significant deviations from student data

  16. PSLC DataShop Tools http://pslcdatashop.org Slides current to DataShop version 4.1.8 Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. (in press) A Data Repository for the EDM commuity: The PSLC DataShop. To appear in Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC Press.

  17. Analysis Tools • Dataset Info • Performance Profiler • Error Report • Learning Curve • KC Model Export/Import

  18. Explore data through the DataShop tools Where is DataShop? http://pslcdatashop.org Linked from DataShop homepage and learnlab.org http://pslcdatashop.web.cmu.edu/about/ http://learnlab.org/technologies/datashop/index.php Getting to DataShop 22

  19. Creating an account On DataShop's home page, click "Sign up now". Complete the form to create your DataShop account. If you’re a CMU student/staff/faculty, click “Log in with WebISO” to create your account. 23

  20. Getting access to datasets • By default, you will have access to the public datasets. • Of these, we recommend three for getting started: • Geometry Area (1996-1997) • Joint Explanation - Electric Fields - Pitt - Spring 2007 • Chinese Vocabulary Fall 2006 • For access to other datasets, contact us:datashop-help@lists.andrew.cmu.edu 24

  21. DataShop – Dataset selection Private datasets you can’t view. Email us and the PI to get access. Datasets you can view or edit. You have to be a project member or PI for the dataset to appear here. Public datasets that you can view only. 25

  22. Dataset Info • Meta data for given dataset • PI’s get ‘edit’ privilege, others must request it Papers and Files storage Problem Breakdown table Dataset Metrics 26

  23. Performance Profiler Multipurpose tool to help identify areas that are too hard or easy • View measures of • Error Rate • Assistance Score • Avg # Hints • Avg # Incorrect • Residual Error Rate View multiple samples side by side • Aggregate by • Step • Problem • Student • KC • Dataset Level Mouse over a row to reveal uniqueness

  24. Error Report • Provides a breakdown of problem information (by step) for fine-grained analysis of problem-solving behavior • Attempts are categorized by evaluation View by Problem or KC

  25. Learning Curves Visualizes changes in student performance over time • Hover the y-axis to change the type of Learning Curve. • Types include: • Error Rate • Assistance Score • Number of Incorrects • Number of Hints • Step Duration • Correct Step Duration • Error Step Duration Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC

  26. Learning Curves: Drill Down Click on a data point to view point information • Click on the number link to view details of a particular drill down information. • Details include: • Name • Value • Number of Observations • Four types of information for a data point: • KCs • Problems • Steps • Students

  27. Learning Curve: Latency Curves For latency curves, a standard deviation cutoff of 2.5 is applied by default. The number of included and dropped observations due to the cutoff is shown in the observation table. Step Duration = the total length of time spent on a step. It is calculated by adding all of the durations for transactions that were attributed to a given step. Error Step Duration = step duration when first attempt is an error Correct Step Duration = step duration when the first attempt is correct

  28. Dataset Info: KC Models Toolbox allows you to export one or more KC models, work with them, then reimport into the Dataset. • Handy information displayed for each KC Model: • Name • # of KCs in the model • Created By • Mapping Type • AIC & BIC Values • DataShop generates two • KC models for free: • Single-KC • Unique-step • These provide upper and lower bounds for AIC/BIC. Click to view the list of KCs for this model.

  29. Dataset Info: Export a KC Model Select the models you wish to export and click the “Export” button. Model information as well as other useful information is provided in a tab-delimited Text file. Selecting the “export” option next to a KC Model will auto-select the model for you in the export toolbox. Export multiple modelsat once.

  30. Dataset Info: Import a KC Model When you are ready to import, upload your file to DataShop for verification. Once verification is successful, click the “Import” button. Your new or updated model will be available shortly (depending on the size of the dataset).

  31. Web Services Why Web Services?? Get Web Services Download Getting Credentials Authentication & DatashopClient What is an ID? How to get a dataset ID How to see some transaction data Add a little Swing… Web Services URL 35

  32. Why Web Services?? • To access the data from a program • New visualization • Data mining • or other application 36

  33. Get Web Services Download 37

  34. Getting Credentials 38

  35. Authentication & DatashopClient Put your token and secret access key in a file named ‘webservices.properties’ 39

  36. What is an ID? The DataShop API expects you to reference various objects by “ID”, a unique identifier for each dataset, sample, custom field, or transaction in the repository. The ID of any of these can be determined by performing a request to list the various items, which lists the IDs in the response. For example, a request for datasets will list the ID of each dataset in the “id” attribute of each dataset element. 40

  37. How to get a dataset ID • Use DatashopClient class provided in datashop-webservices.jar • Pass in a URL to form the request • Results include datasets that you have access to java –jar dist/datashop-webservices.jar “https://pslcdatashop.web.cmu.edu/services/datasets” <?xml version="1.0" encoding="UTF-8"?> <pslc_datashop_messageresult_code="0" result_message="Success. 255 datasets found."> <dataset id="145"> <name>Handwriting/Examples Dec 2006</name> … </dataset> </pslc_datashop_message> 41

  38. How to get a dataset ID java –jar dist/datashop-webservices.jar “https://pslcdatashop.web.cmu.edu/services/datasets?access=edit” > datasets.xml 42

  39. Open XML in browser and search 43

  40. Back to command line 44

  41. How to get a sample ID java –jar dist/datashop-webservices.jar “https://pslcdatashop.web.cmu.edu/services/datasets/313/samples” <?xml version="1.0" encoding="UTF-8"?> <pslc_datashop_messageresult_code="0" result_message="Success. 2 samples found."> <sample id="933"> <name>All Data</name> <description>Default Sample that contains all transactions.</description> <owner>%</owner> <number_of_transactions>11394</number_of_transactions> </sample> <sample id="936"> <name>articleTutor-B</name> <description>Default Sample that contains all transactions.</description> <owner>liuliu@ANDREW.CMU.EDU</owner> <number_of_transactions>2707</number_of_transactions> </sample> </pslc_datashop_message> 45

  42. How to see some transaction data Problem Hierarchy Problem Name Step Name Outcome Input Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B "The wo Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B "The wo Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B "The wo Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B "The wo Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B ___ oxy Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B ___ oxy Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B ___ oxy Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B ___ oxy Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B She too Unit IWT_S09articleTutorB-A, Section IWT Tests and Tutors articleTutor-B ___ big… Request a subset of columns for a given dataset and the ‘All Data’ sample which is the default java edu.cmu.pslc.datashop.webservices.DataShopClient “https://pslcdatashop.web.cmu.edu/services/datasets/313/transactions?limit=10&cols=problem_hierarchy,problem_name,step_name,outcome,input” 46

  43. import edu.cmu.pslc.datashop.webservices.DatashopClient; public class WebServicesDemoClient extends DatashopClient { … private static final String DATASETS_PATH = "/datasets/"; private static final String TXS_PATH = "/transactions?headers=false” + "&cols=problem_hierarchy,problem_name,step_name,outcome,input"; private WebServicesDemoClient(String root, String apiToken, String secret) { super(root, apiToken, secret); }; public TreeMap<TransactionDataSubset, Integer> runReport(String datasetId) { String path = DATASETS_PATH + datasetId + TXS_PATH; HttpURLConnection conn = serviceGetConnection(path); conn.setRequestProperty("accept", "text/xml"); TreeMap<TransactionDataSubset, Integer> map = new TreeMap(); try { InputStream is = conn.getInputStream(); BufferedReader reader = new BufferedReader(new InputStreamReader(is)); String row = null; while ((row = reader.readLine()) != null) { TransactionDataSubset t = TransactionDataSubset.createTransaction(row); … 47

  44. Add a little Swing… java –classpath “../dist/datashop-webservices.jar;.” WebServicesDemoClientUI dataset 313 48

  45. To get more details… http://pslcdatashop.org/about/webservices.html http://pslcdatashop.org/downloads/WebServicesDemoClient_src.zip 49

  46. KDD Cup 2010 EDM Challenge • › http://pslcdatashop.org/KDDCup • Awarded to the PSLC and DataShop • First time the challenge used education data • This year’s challenge asked participants to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems. • The competition addressed questions of both scientific and practical importance. • Improved models could be saving millions of hours of students' time (and effort) in learning algebra. • These models should both increase achievement levels and reduce time needed to learn.

More Related