180 likes | 303 Views
Visualizing Translation Variation : Shakespeare's Othello. Zhao Geng 1 , Robert S. Laramee 1 , Tom Cheesman 2 , Alison Ehrmann 2 , David M. Berry 2 , 1 Visual Computing Group, Computer Science Department, Swansea University, UK, { cszg,r.s.laramee }@ swansea.ac.uk
E N D
Visualizing Translation Variation : Shakespeare's Othello Zhao Geng 1, Robert S. Laramee 1, Tom Cheesman 2, Alison Ehrmann2 , David M. Berry 2, 1Visual Computing Group, Computer Science Department, Swansea University, UK, {cszg,r.s.laramee}@swansea.ac.uk 2College of Arts and Humanities, Swansea University, UK, {T.Cheesman,d.m.berry} @swansea.ac.uk, alison.ehrmann@t-online.de
Overview • Introduction and Motivation • Related Work • Background Data • Text Pre-processing • Structure-aware Treemap • Parallel Coordinates For Multi-Document Comparison • Conclusion • Acknowledgement
Introduction • Shakespeare's plays have been translated into dozens of languages for about 300 years • For every translation: • A different interpretation of the play • Reflect changing culture or express individual thought by the authors • Build a wide connection between different regions and reveal a retrospective view of their histories • At the moment, researchers from Modern Languages collect a large number of German translations of Shakespeare's play, Othello
Motivation • Goals of Visualization • Present different facets of the data • Analyze the data in detail • Explore the relationships and patterns to make new hypotheses • Complex Multi-Dimensional Data Set (translation, author, place, year, popularity) • Exploratory Specifications • Where, when, into which languages has Othello been translated ? • How have translators influenced one another ? • How do versions vary globally / locally ? • Which translation is more similar to the original play ?
Related Work • Multiple Document Visualization prototypes: • ThemeRiver [HHWN02] • Parallel Tag Clouds [CBW09] • DocuBurst [CCP09] • SparkClouds [LRKC10]
Related Work (Cont.) • Ben Fry, “ On the Origin of Species: The Preservation of Favoured Traces” (2009), http://benfry.com/traces • Lev Manovich, “Cultural Analytics: Visualizing Cultural Patterns in the Era of “More Media” http://manovich.net/articles/ • Stephan Thiel, “Understanding Shakespeare: Towards a Visual Form for Dramatic Texts and Language ( 2010), http://www.understanding-shakepseare.com/
Background Data ( Cont. ) • 57 translations of Othello from 7 various countries, ranging from 1766 to 2006
Text Pre-Processing • Document Collection • Document Standardization • In ASCII format stored in standard text editor • Tokenization • Break the stream of characters into words or tokens • Reduce the common words • Language dependent • Lemmatization • Convert to a standard form • Stemming to a root • Concordance • Tokens + Frequency • Vector Generation (LSI Model)
LSI Model LSI model ( Latent Semantic Indexing) : • Tf ( Term Frequency) : the frequency of a term Θ occurs in a document. • Idf ( Inverse Document Frequency ): the inverse of the number of documents a term Θ occurs in a document corpus. • The weight of a term Θ can be defined as : W(Θ) = Tf(Θ) × Idf(Θ) • Each document D then becomes a vector : D = ( W(Θ1), W(Θ2), … W(Θn) )T • Similarity between two documents D1 and D2 is measured by the angle of two vectors: cosSim(D1,D2) = ( D1 D2) / (|D1| × |D2|)
Structure-aware Treemap • Meta Data Hierarchy • Century -> Decades -> Country -> Author->Title • Visualization • Treemap • Aggregation of numerical values • Re-ordering of hierarchies • DOI-Tree • Structural clarity • Initiate a searching task
Conclusion • Interactive system for exploring variation among different German translations of Othello • Structure-Aware Treemap is developed for metadata analysis • Focus+Context Parallel Coordinates incorporate an objective similarity measure and allows the multi-document comparison • In the future, we will expand to the analysis of the whole play of Othello • Utilize more methods from computational linguistics to summarize more semantic feature
Acknowledgement Thanks for listening ! Any questions ?