270 likes | 384 Views
Michael Curtotti and Eric McCreath. A Right of Access Implies a Right to Know: An Open Online Readability Research Platform. Motivation – Is there a problem with the readability of legislation?. Legislation has a significant new audience: the general public
E N D
Michael Curtotti and Eric McCreath A Right of Access Implies a Right to Know: An Open Online Readability Research Platform
A Right of Access Implies a Right to Know - Readability Research Platform Motivation – Is there a problem with the readability of legislation? Legislation has a significant new audience: the general public Historical audience – lawyers, judges, government officers Conclusions that can be drawn from existing research Plain language drafting does improve the readability of legislation Researchers often conclude that legislation is very hard for large proportions of the population - even in those cases where plain language drafting is used.
A Right of Access Implies a Right to Know - Readability Research Platform Outline Full paper: available via LVI2013 & SSRN Existing readability research is extensive & covers fields under discussion (refer to paper) The Readability Research Platform & Approaches for Evaluating Readability Initial findings on legislation and graded readers using RRP and Weka machine learning package Workshop using ipython and the RRP to extract readability/linguistic data Readability Research Possibilities
A Right of Access Implies a Right to Know - Readability Research Platform Readability Research Platform
A Right of Access Implies a Right to Know - Readability Research Platform Platform Features Traditional Readability Metrics Cloze Tests Subjective User Evaluations Natural Language Processing and Machine Learning Command line tools for remote data extraction
A Right of Access Implies a Right to Know - Readability Research Platform RRP Performance
A Right of Access Implies a Right to Know - Readability Research Platform Approaches to Assessing Readability Traditional readability metrics Human evaluation Comprehension testing Cloze Testing Crowd Sourcing Natural Language Processing and Machine Learning
A Right of Access Implies a Right to Know - Readability Research Platform Readability Metrics
A Right of Access Implies a Right to Know - Readability Research Platform Readability Metrics Indirectly measure vocabulary and syntactic complexity Over 200 measures developed Primarily designed for gradining reading materials for learner readers – typically passages of 100 words or more Not designed for measuring the difficulty of single sentences Not designed for measuring the readability of legislation
A Right of Access Implies a Right to Know - Readability Research Platform Coleman-Liau Index = 0.588 * L – 0.296*S – 15.8 (L = average letters per 100 words, S = average sentences per hundred words) SMOG index = Dale Chall uses a list of 3000 'easy' words and their cognates and average sentence length ARI = 4.71*(char.length/words) + (words/sentences) – 21.43
A Right of Access Implies a Right to Know - Readability Research Platform Cloze Test
A Right of Access Implies a Right to Know - Readability Research Platform Cloze Test results 0-35% indicates reader frustration 35-49% instructional – the reader needs assistance to understand the material 50% + independent reader
A Right of Access Implies a Right to Know - Readability Research Platform Cloze Test results 0-35% reader frustration 35-49% instructional – reader needs assistance 50% + independent reader
A Right of Access Implies a Right to Know - Readability Research Platform Crowd sourcing & Subjective Eval.
A Right of Access Implies a Right to Know - Readability Research Platform Natural Language Processing
A Right of Access Implies a Right to Know - Readability Research Platform Scope of NLP Current scope of RRP Characters Vocabulary Syllables / Morphemes Syntax Lemmas/Words / Parts of Speech Phrases / Chunks / ngrams Clauses Trees Named Entities Sentences ….. Relations Discourse Features
A Right of Access Implies a Right to Know - Readability Research Platform Machine learning for readability Labelled or Unlabelled Data
A Right of Access Implies a Right to Know - Readability Research Platform Research Questions & Initial Findings 1. Do traditional readability metrics or surface features of a sentence assist us in assessing the readability of the sentence? 2. Does parts of speech or chunk data from a sentence assist in assessing its readability? 3. Do features such as the above provide us with a measure of whether legislative `sentences' are `normal' English?
A Right of Access Implies a Right to Know - Readability Research Platform Question 1: very littleQuestion 2: It helps - accuracy is low Visualization produced using Weka Software
A Right of Access Implies a Right to Know - Readability Research Platform Question 3: Yes – legislative English is very different (within sample) Visualizations produced using Weka Software
A Right of Access Implies a Right to Know - Readability Research Platform Question 3: Yes – legislative English is very different – parts of speech data PCA on Brown Corpus & Legislative Corp. Visualizations produced using Weka Software
A Right of Access Implies a Right to Know - Readability Research Platform Machine learning can use POS to distinguish legislative sentences from a wide range of other English sentences.
A Right of Access Implies a Right to Know - Readability Research Platform Using the RRP for research:Sending get requests using browser address bar
A Right of Access Implies a Right to Know - Readability Research Platform Using the RRP for Research
A Right of Access Implies a Right to Know - Readability Research Platform Sending post or get requests using ipython
A Right of Access Implies a Right to Know - Readability Research Platform Crowd sourced data collection
A Right of Access Implies a Right to Know - Readability Research Platform Researchpossibilities