220 likes | 516 Views
Automated Scoring of Open-ended Ethics Questions. Kelly Laas Alan D. Mead Illinois Institute of Technology. Agenda. Importance of ethics and ethics assessment Use of scenario-based assessment and open-ended responses Research Goal Method Results Future Directions.
E N D
Automated Scoring of Open-ended Ethics Questions Kelly Laas Alan D. Mead Illinois Institute of Technology
Agenda • Importance of ethics and ethics assessment • Use of scenario-based assessment and open-ended responses • Research Goal • Method • Results • Future Directions
Interprofessional Projects Program • Brings together students from all IIT disciplines (engineering, business, architecture, psychology, the sciences, social sciences and the humanities) to solve a real-world problem. • All students take a mandatory 2 semesters of IPRO. • Examples of projects are finding new uses for Chicago-area abandoned buildings, testing and developing products for local businesses, and designing a portable operating room for use in disaster areas.
Importance of ethics • Ethical problems arise at all stages in life, and are part of all professions, disciplines and jobs. • Teaches students critical thinking, how to deal with “grey areas” when the best course of action is unclear. • ABET – and other accreditors-require that all students develop an understanding of professional and ethical responsibility.
Goals for IPRO Ethics Component • Raise Ethical Awareness – Students gain a stronger awareness of and sensitivity to ethical issues as they arise in the course of research. • Improve Ethical Knowledge - Students gain a basic understanding of relevant ethical topics in their IPRO project, and how these issues should be addressed as their project work progresses. • Improve Knowledge of Resources - Students get a basic idea of resources, people and policies they can turn to when ethical questions arise. • Understand Importance of Ethics - Students begin to appreciate the importance of ethics in research and practice.
Ethics Module Samples • Ethics Roundtable – students give a short presentation and then ask questions from a panel of “experts” drawn from IIT and Chicago’s professional community. • Professional code of ethics discussion • Case study discussion • Faculty or student developed ideas, which often lead to new modules being developed
Importance of ethics assessment • Are teaching methods working? Are students growing in the following areas from pre to post test? • Try to measure: • Ethical Sensitivity -Do students recognize ethical issues when they arise? • Ethical Knowledge -Do students know about applicable codes, guidelines, laws, developed skills, e.g. discussing issues in clear way? • Ethical judgment - Can they take a plausible course of action using relevant knowledge? • Increased Ethical Commitment -Will they act more ethically in the future? Hardest to Measure Davis, Michael. “Instructional Assessment in the Classroom: Objectives, Methods and Outcomes.” in Practical Guidance on Science and Engineering Ethics Education for Instructors and Administrators. Washington D.C.: National Academies Press, 2013. p. 30.
Scenario-based Assessment “At the beginning of the semester, IPRO 111 went on a tour of Gelco Inc., a company who is one of the largest widget producers in North America. The goal of IPRO 111 was to assist Gelco in developing a more efficient way of shipping widgets …” • What are the ethical issues facing Marco in this case? • What factors does Marco need to consider in reaching a decision about what to do?
Why open-ended responses? • Unlike existing multiple-choice assessment methods, no limit to number of answers to scenario. “Ethical problems are “ill-structured: because there is often no clearly specific goal, only incomplete information, several possible solutions, and several parts to reach each. Since a single, simple response is not an option (or at least not a good one), students must investigate the problem, seek relevant information, consider alternative possible solutions, and evaluation short and long term consequences.” Keefer, M.W. and Davis, M. (2012).“Curricular Design and Assessment in Professional Ethics.” Teaching Ethics, 13(1): 89.
Research Goal • Open-ended responses are great, but scoring them by hand is time-consuming • This research seeks to provide an automated scoring mechanism • Ideally, to replace human grading entirely • But it would also be helpful to provide a second, independent score to complement human grading or as a first analysis in a computerized assessment process
Automated Essay Scoring Methods • PEG = Regression using surface characteristics as predictors • Text categorization (Naive Bayes and k-nearest neighbor) • Semantic similarity • Natural Language Processing • Or a combination of these methods
Classification • “Essay” scoring systems are oriented to extracting “quality of writing” metrics • We’re interested in similarity to exemplars • Content is key, not its expression • Therefore, we want a supervised method • That leaves naïve Bayesian classifier and LSA-based similarity • LSA is relatively hard and requires a lot of data
Naïve Bayesian Classification • Goal: Compute P(domain|item) for each domain and classify as domain of maximum probability • Predicted domain = argmax P(domain|item) = argmax P(item|domain)P(domain)/P(item) = argmax P(item|domain)P(domain) = argmax P(w1,w2,…,wn|domain) ≈ argmaxP(w1|d.)P(w2|d.) …P(wn|d.) • “Naïve” refers to the assumption of independence of the predictors; • P(w1,w2|domain)= P(w1|d.)P(w2|d.) • Calculating P(w1|domain) is easy, count how many items have this word and how many items in total for this domain
Method • 45 students completed Marco Gelco scenario • 2 cases deleted due to missing data • 3 exemplar answers • Responses to four our open-ended questions (we’ll focus on Q1) • Leave-one-out-crossvalidation (LOOCV) • Hold out response 1, train NB classifier, classify 1 • Hold out response 2, train NB classifier, classify 2 • Etc…
Using NB to score responses • Pre-process text • Normalize text, eliminate symbols, etc. • Stem words (using Porter algorithm) • Train NB classifier on each “score class” • Train classifier for all score=3 responses, then all score=2 responses, etc. • Training involves counting conditional presense • Classify response in class with maximum probability given words used
Porter Stemming and Lemmatization • Words change form as they are used • Plan, plans, planned, planful, etc. • This could cause classifier to be trained incorrectly • A “lemma” is “the canonical form of an inflected word” • But lemmatization is difficult • Porter stemming is an easier heuristic method • Series of rules for removing word suffixes • As a heuristic, some mistakes happen
Confusion Matrix for Q1 • What are the ethical issues facing Marco in this case? • 40% strictly correct; 87% if we count the 21 responses actually 2 but graded as a 3
What happened to score=0? • We’re working on that • A score of zero represents either a non-response or a completely unaware answer • It is hard (impossible) to gather a representative sample of non-responses or completely wrong responses to train a classifier on score class 0 • Instead, we need to develop heuristics to tell us when classification cannot be performed
Unknown Words • 131 words were unknown: able; accident; actually; after; against; allegiance; allowed; any; assist; assuming; because; been; benefit; both; build; carefully; chose; comes; commercial; committing; communications; competition; conscious; consider; contract; courses; create; derived; despite; details; didnt; directly; does; doesnt; doing; due; effectively; execution; experience; explained; facing; former; friend; gained; gathered; given; good; graduation; granted; how; implications; inc; inevitably; infringing; initial; inside; inspired; instead; intended; lack; leave; legal; likely; limit; main; make; many; most; must; needs; nonuse; observed; obtained; offered; old; opportunities; piece; plans; position; potentially; prevent; previous; privilege; probably; problem; procedure; production; profits; proposition; proprietary; protected; purposes; pursuit; question; read; recalling; regarding; relationships; research; resources; rights; save; say; secret; sell; shouldnt; similar; skills; solution; some; someone; somewhere; stay; such; supposed; team; tell; theft; their; there; think; ultimately; understanding; value; waiver; wants; was; which; while; who; would
Future Directions • Better coverage of unknown words (and mis-spellings) • Phrase parsing using n-gram analysis • not good ≠ good • Requires larger sample size • Develop heuristics for score class=0 • Analyze reliability of human scores