150 likes | 238 Views
WP1.3 - Quality Assessment. Dr Lee Gillam; Neil Newbold L.Gillam@surrey.ac.uk; N.Newbold@surrey.ac.uk University of Surrey 23rd May 2006. Overview. Task: to introduce, complementary to the ISO process of standards production, specific quality assessment steps
E N D
WP1.3 - Quality Assessment Dr Lee Gillam; Neil Newbold L.Gillam@surrey.ac.uk; N.Newbold@surrey.ac.uk University of Surrey 23rd May 2006
Overview • Task: to introduce, complementary to the ISO process of standards production, specific quality assessment steps • Vision: Longer term, a content management system for developing standards is envisaged, although this is beyond the scope of LIRICS • Methodology: evaluate the efficacy of UniS’ content analysis applications (System Quirk), developed in prior research, including EU-co funded projects, to analyse language consistency and document coherence, identify unexplained terminology, and generate “understandability” metrics • Definition: Quality (ISO 9000): degree to which a set of inherent characteristics fulfils requirements
Overview • Structure of an ISO standard • ISO boilerplate, contents, foreword • Introduction • Scope • Normative References: • “The following referenced documents are indispensable for the application of this document.” - cross-document understanding • Terms and Definitions • 3.2 code table • table of code elements (3.4) as part of a code • may inherit from other documents • [Content]
Overview • ISO Comments
Lines of Inquiry • Readability metrics: avg. word length; avg. sentence length; num. Multiword expressions; reading age;…. • Clarity of message: Plain English Campaign; Simplified English (AECMA) • Terminology use: num. known terms; num. unknown terms; • Automation? • Hypermedia?
Plain English Campaign • Advantages of plain English • Faster to write • Faster to read • You get your message across more often and in a friendlier way • Main ways to make writing clearer • Keep your sentences short • Prefer active verbs • Choose words appropriate for the reader • Use positive language • The A to Z of alternative words
Plain English Campaign • Golden Bull Awards • Australian Taxations Office for its Goods and Services legislation ‘For the purpose of making a declaration under this Subdivision, the Commissioner may: a) treat a particular event that actually happened as not having happened; and b) treat a particular event that did not actually happen as having happened and, if appropriate, treat the event as: i) having happened at a particular time; and ii) having involved particular action by a particular entity; and c) treat a particular event that actually happened as: i) having happened at a time different from the time it actually happened; or ii) having involved particular action by a particular entity (whether or not the event actually involved any action by that entity).’
Plain English Campaign • Golden Bull Awards • Wanadoo for 'Wireless and Talk' terms and conditions • ‘The failure to exercise or delay in exercising a right or remedy under this Agreement shall not constitute a waiver of the right or remedy or a waiver of any other rights or remedies and no single or partial exercise of any right or remedy under this Agreement shall prevent any further exercise of the right or remedy or the exercise of any other right or remedy. The rights and remedies contained in this Agreement are cumulative and not exclusive of any rights or remedies provided by law.’ • A reorganisation announcement by Marconi's EMEA (Europe, Middle East, Africa and Australasia) division • 'The benefit of having dedicated subject matter experts who are able to evangelise the attributes and business imperatives of their products is starting to bear fruit.'
Take Sheffield’s GATE • Implement Plain English lookup - prior efforts: MHA • Integrate with POS analysis • New draft of ISO 704 • 1715 possible Plain English replacements identified • Manual evaluation required • Move towards automation
Automation • From words to patterns - “learning”
Incorporate known terms - ISO 16642; ISO 12620 • “preferred term”? “deprecated”? • Overlaps between known terms and Plain English? • Explore “principle of substitutability” and hyperlinking
Summary • Vision: Longer term, a content management system for developing standards is envisaged, although this is beyond the scope of LIRICS …. or is it? • Expanded methodology: Integration of existing systems and standards - GATE, System Quirk components, ISO 16642, ISO 12620 … linguistic annotation / lexical markup? - “Eat our own dog food”; “Drink our own Champagne” • Degree of automation: planned integration and evaluation of System Quirk components for automatic keyword analysis, ontology learning and indicative text summarisation. Significant evaluation and further development required. • Exploitation: Results useful to the standards community at large? To document authoring in general?
Recent Dissemination • Selection: • Lee Gillam, Debbie Garside, Chris Cox. (2006) "Information volumes and linguistic diversity: meeting the challenges for content management". 3rd International Conference on Terminology, Standardization and Technology Transfer, 25-26 August, Beijing, PRC. Accepted. • Khurshid Ahmad, Lee Gillam and David Cheng. (2006) Sentiments on a Grid: Analysis of Streaming News and Views. Proc. of 5th Intl. Conf. on Language Resources and Evaluation (LREC). • Lee Gillam and Khurshid Ahmad. (2006) Financial data tombs and nurseries: A grid-based text and ontological analysis. Proc. of1st Intl. Workshop on Grid Technology for Financial Modeling and Simulation (Grid in Finance 2006). See http://www.gridinfinance.org/ for details • Lee Gillam. Sentiment Analysis and Financial Grids: presentation at the UK National Centre for Text Mining’s workshop on Bridging quantitative and qualitative methods for social sciences using text mining techniques, 28 April 2006. • Lee Gillam. No Place for Sentiments?: forthcoming Access Grid Seminar presentation for the UK’s National Centre for e-Social Science, 8 June 2006 • Related activities: • ISO 639-6: Committee Draft (CD) accepted; move towards Draft International Standard (DIS) • ISO 639-4: Description of the Language Documentation Interchange Format (LDIF)