670 likes | 825 Views
Global User-Generated Content: The Final Localization Frontier. Merle Tenney. Agenda. Dimensions of Content Dimensions of Translation Global Language Tools UGC Translation Practices Best Current Practices Global UGC Desiderata Call to Action. Dimensions of Content. UGC Pre–Web 2.0.
E N D
Global User-Generated Content: The Final Localization Frontier Merle Tenney
Agenda • Dimensions of Content • Dimensions of Translation • Global Language Tools • UGC Translation Practices • Best Current Practices • Global UGC Desiderata • Call to Action
UGC Pre–Web 2.0 • 1965 Mainframe-based email, instant messaging • 1969 ARPANET • 1978–79 Bulletin board systems, discussion forums (Usenet) • 1983 Internet (TCP/IP) • 1991 World Wide Web (HTTP) • 1993–2003 Blogging, social network services, user classifieds, user auctions, wikis, social bookmarking, photo sharing
UGC Post–Web 2.0 • 2004 Tim O’Reilly, John Battelle, Dale Dougherty define Web 2.0 • Leaders: Yahoo! Groups (1998), MySpace (2003), LinkedIn (2003) • 2004 Facebook • 2005 YouTube • 2006 Twitter • 2009 Foursquare
Content Types • Managed content (MC) • Semi-managed content (SMC) • User-generated content (UGC) • Individual content • Community content • Computer-mediated communication (CMC)
Managed Content • Authors: professional communicators (information developers) • Examples: user interfaces, user assistance, technical documentation; marcom materials, newsletters; web pages, institutional blogs • Requirements: institutional voice, subject matter expertise, polished writing • Tools: content management systems, publishing systems, office applications, blogging software
Semi-Managed Content • Authors: information workers • Examples: technical reports, design documents; technotes, knowledge base articles, technical blogs, industry discussion lists • Requirements: technical expertise, effectual writing • Tools: content management systems, office applications, social network services, blogging software
User-Generated Content • Authors: users and communities • Examples: user profiles, blogs, discussion lists, wikis, reviews, ratings, tags, classifieds, auction listings, user documents, user multimedia • Requirements: informed opinion, interesting content, effectual writing • Tools: office applications, wiki software, social network services, blogging software, classified ad systems, customer feedback forums
Computer-Mediated Communication • Authors: everyone • Examples: emails, microblogs (tweets), direct messages, status updates, SMS messages (texting) , instant messages, chat sessions • Requirements: interesting message, succinct, comprehensible writing • Tools: email, blogging, microblogging, instant messaging, chat rooms, social network services, e-commerce, virtual worlds, online games
Content Structure • Structured content • Semi-structured content • Unstructured content
Structured Content • Description: content taken from a closed set of values specified by developers, such as list values, numbers, and related data types • Examples: numerical data, structured keywords, taxonomies, values, lists (ratings, dates, gender, marital status, language, country, etc.) • Translation: no translation per se; language-neutral data, multilingual textual expressions of underlying data handled by UI localization or locale-based data formatting
Semi-Structured Content • Description: content taken from a constrained and self-organizing but not closed set of values developed by users • Examples: user classifications, common search terms, user keywords, tag clouds, folksonomies • Translation: specialized bilingual terminology, with fallback to machine translation as needed
Unstructured Content • Description: open, unconstrained user text • Examples: wikis, articles, blogs, discussions, reviews, chats, instant messages, emails • Translation: machine translation in pull contexts, including cross-language search; computer-aided translation in push contexts
Content Forms • Text • Graphics • Audio • Video • Virtual reality • Location-based services
Nontextual Content Forms • Integrated text • Titles, legends, labels, callouts, subtitles, transcriptions, text layers, text tracks • Associated text • Metadata, tags, comments • Accessibility text • alt, longdesc attributes
Global Content Creation • Zero translation (ZT) • Machine translation (MT) • Human translation (HT) • Transcreation (TC) • Original content (OC)
Translation Modes • Machine translation • Unedited MT • Translation wiki • Human translation • Volunteer translators • Users & Friends • Community • Paid translators • Semi-professional • Professional
Individual UGCTranslation Modes Individual UGC
Community UGCTranslation Modes Community UGC
Push and Pull Translation Frameworks • Differences in applications and translation requirements • Push mode content translation • Proactive, for anticipated demand • Reactive, for attested demand
Push & Pull Translation Comparison 10/15/2008 Web 2.0 Globalization – Merle Tenney 26
Push & Pull Translation Comparison Web 2.0 Globalization – Merle Tenney
Global Content Creationand Translation • Authoring and editing • Automatic translation • Computer-aided translation
Authoring and Editing • Spelling checkers • Style and grammar checkers • Language compliance checkers • Intelligent content reuse/authoring memory • Electronic references • Explanatory dictionaries • Thesauri • Bilingual dictionaries • Style guides
Automatic Translation (AT) • AT > MT (machine translation) • AT ≥ MTM (machine translation + translation memory) • Translation pre-editing tools (language compliance checker + authoring memory) • Automatic text categorization (for selection of terminologies and translation memories) • Translation memory (TM) • Machine translation (MT)
Computer-Aided Translation (CAT) • SL & TL text fields • Translation tools • Machine translation • Translation memory • Translation search • Terminology access • TL authoring and editing tools • General authoring and editing tools • Translation QA and translation post-editing tools • Translation leveraging updates • Terminology updates • Translation memory updates
Problems with UGC — Low Quality • Terse, ungrammatical constructions • nonstandard CAPITALIZATION • Missing, creative punctuation • Accidental, intentional misspellings • Nonstandard diction—colloquial abbreviations & acronyms, leetspeak, emoticons
Problems with UGC — Intrinsic Characteristics • Cryptic, clipped style (chats, IMs, tweets) • Conversational style • Diverse term variants • Wide range of lexicon • Frequent neologisms
Solutions for Problematic UGC — Low Quality • Better writing, self-editing • Editing by others (designated content agents) • Authoring and editing tools • Translation pre-editing tools • Dialect translation tools
Solutions for Problematic UGC — Intrinsic Characteristics • MT based on leveraged resources produced as by-product of CAT translations • Terminologies and translation memories based on automatic text categorization • Continued improvement in pull (MT) translation environments dependent on quantity and quality of effort in related push (CAT) translation environments • Ergo, need to support push translation environments and aggregated, quality-controlled user, community, and professional translation resources
UGC Translation • Push translation implementations • Google Translator Toolkit • Pull translation implementations • Outlook email translation (PROMT) • Mojofiti blog translation (Google) • eBay listing translation (SYSTRAN) • Translation viewers • Unedited MT (Microsoft) • Translation Wiki (Microsoft)
Translation Viewers • Bilingual text display in web browser or document editor • Translation views—different strokes for different folks • Single-language view (SL or TL) • Original or translated content, with rollover display of corresponding sentence from translated or original content • Dual-language view (SL and TL) • Side-by-side or over-and-under display of original and translated content, with synchronized scrolling and sentence highlighting
Global UGC Infrastructure Bing Translator Source Text Rollover Mode
Global UGC Infrastructure Bing Translator Target Text Rollover Mode
Global UGC Infrastructure Bing Translator Side-by-Side Mode
Global UGC Infrastructure Bing Translator Over-and-Under Mode