1 / 24

Scientific Visualization of Language Data Chris Culy Winter 2011

Scientific Visualization of Language Data Chris Culy Winter 2011. LInfoVis* (< Language Information Visualization, cf. InfoVis) : the visualization of language related information, especially on computer displays * Not a standard term (not yet, anyway). What are we doing?.

iona
Download Presentation

Scientific Visualization of Language Data Chris Culy Winter 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

  2. LInfoVis* (< Language Information Visualization, cf. InfoVis): the visualization of language related information, especially on computer displays * Not a standard term (not yet, anyway) What are we doing? LInfoVis Winter 2011 Chris Culy

  3. LInfoVis Winter 2011 Chris Culy “Visualization has to be more than pretty pictures. It has to inform. It has to challenge. It has to further our understanding. Visualizing data is not about pretty pictures.” Robert Kosara on www.eagereyes.org

  4. What are we not doing? (Only language, no other data.) Source: Lewis Carroll. Alice's Adventures in Wonderland. Ch. 3 LInfoVis Winter 2011 Chris Culy

  5. LInfoVis Winter 2011 Chris Culy Gray Area Numeric information derived from language data e.g. frequencies, statistical measures, etc. There are lots of chart/graphing packages e.g. With spreadsheets, in R, etc. But, if there is an interesting and useful way to incorporate the language data, we'll do that

  6. LInfoVis Winter 2011 Chris Culy Corpus Clouds http://www.eurac.edu/en/research/institutes/multilingualism/Projects/LInfoVis/CorpusClouds.html

  7. LInfoVis Winter 2011 Chris Culy Presentation vs. Analysis Presentation: Convey information known to the author To an audience other than the author Typically static (e.g. charts in a paper) Analysis Present information that is not (well) known to the user Help the user understand (“make sense of”) the information Often interactive, though not necessarily Different goals, different techniques

  8. LInfoVis Winter 2011 Chris Culy Why visualization? The human visual system is very efficient at discovering certain patterns in large amounts of information. The eye has on average: 92 million rods (for light level) 4.6 million cones (for color) Curcio, C. A., Sloan, K. R., Kalina, R. E. and Hendrickson, A. E. (1990), Human photoreceptor topography. The Journal of Comparative Neurology, 292: 497–523. doi: 10.1002/cne.902920402 updated 10-12 times per second Things are more much more complicated than those basic numbers, but still ... Preattentive processing: recognition of features before conscious processing We can take advantage of this capacity to help linguists analyze language, especially in finding patterns

  9. LInfoVis Winter 2011 Chris Culy What makes LInfoVis special? Textual elements are: Categorical not numeric in general, no scale of comparison Hearst M. 2009. Search User Interfaces. Cambridge University Press. NB: we will (almost?) always have non-textual data, but we will always need to show the textual elements as well

  10. LInfoVis Winter 2011 Chris Culy What makes LInfoVis special? Language is: not mappable -- there is in general no more compact way to visualize language (that is humanly comprehensible) i.e. unlike numbers, we can't map word to size, shape, color, etc. cf. Culy, C., Lyding, V., and Dittmann, H. 2011. "xLDD: Extended Linguistic Dependency Diagrams" in Proceedings of the 15th International Conference on Information Visualisation IV2011, 12, 13 - 15 July 2011, University of London, UK. 164-169.

  11. LInfoVis Winter 2011 Chris Culy What makes LInfoVis special? Linguistics has: particular data structures (like any field) standard ones used in different ways e.g. trees, feature structures, KWIC with particular (conventional) visual representations e.g. dependency structures as arcs

  12. LInfoVis Winter 2011 Chris Culy What makes LInfoVis special? Linguists: Often want to exam the original data, not just the measurements/summary More than some (most?) fields e.g. word frequencies in a text/corpus -- linguists want to be able to exam the source data, to see the words in context

  13. LInfoVis Winter 2011 Chris Culy Goethe on seeing Goethe Man sieht nur das, was man weiß. You only see what you know. Culy You can only visualize what you have.

  14. LInfoVis Winter 2011 Chris Culy The real Goethe on seeing Man erblickt nur, was man schon weiß und versteht. You glimpse only what you already know and understand. Kanzler F. v. Müller, Unterhaltungen mit Goethe, 24, April 1819, cited in Lexikon Goethe-Zitate Was man weiß, sieht man erst! You see first what you know! In: Einleitung in die Propyläen That's more optimistic!

  15. LInfoVis Winter 2011 Chris Culy Some challenges in LInfoVis Dealing with the categorical/non-mappable nature of language How can we show textual data in an effective way? Exploit the capabilities of the human visual system Cater to our general cognitive capabilites Interaction is key

  16. LInfoVis Winter 2011 Chris Culy Some challenges in LInfoVis Dealing with large amounts of data e.g. 2560x1440 monitor = 3,686,400 pixels, but one pixel is pretty small, and 3.7M is a lot smaller than the amount of information in a small corpus: Penn Treebank has 4.5M words, plus POS, parses etc Particular subsets of interest will be smaller, but they often (usually?) contain more information than can fit on a screen What are effective strategies for dealing with large amounts of data? From a visualization perspective From an architectural/programming perspective

  17. LInfoVis Winter 2011 Chris Culy Some challenges in LInfoVis What are the most useful levels of abstraction for LInfoVis tools? i.e. what functionalities should LInfoVis components contain?

  18. LInfoVis Winter 2011 Chris Culy Other practical challenges How to integrate LInfoVis into workflows Of people: How can LInfoVis be made useful to people doing linguistic analysis? Of programs: How can LInfoVis programs be integrated with other tools? e.g. Weblicht What are the roles of LInfoVis components? Producer/consumer Read only vs. read/write (i.e. using LInfoVis tools to modify/create data) What's the division of labor between LInfoVis components and others? How do we maintain the connection with the original data?

  19. LInfoVis Winter 2011 Chris Culy Where do LInfoVis visualizations come from? Use existing visualizations as is Modify and adapt existing visualizations Add Infovis techniques to standard linguistic diagrams New approaches

  20. LInfoVis Winter 2011 Chris Culy Why components? In many applications, the visualizations are custom-designed for the application and tightly integrated with it. But, reinventing the wheel is not very interesting or productive. LInfovis visualizations could be more like graphs/charts and parsers: components that can be used with a variety of data of the same type Line graphs can be used with data from any field Parsers can be used with grammars for any language Claims (Culy): Linguistic data of the same “type” can be visualized meaningfully by the same visualization(s). There are enough data sets with the same “type” to make (a) interesting, and hence components worth creating.

  21. LInfoVis Winter 2011 Chris Culy Structure of the course A mix of theory and practice Survey of visualization theory and general techniques (CuC) Presentation of particular techniques and applications (everyone) Read articles, with one person responsible for presenting them Programming exercises Introduction to Javascript (as necessary) Basic drawing (with Java, Javascript) Some higher level visualization toolkits (e.g. Processing, Protovis/D3) Project

  22. LInfoVis Winter 2011 Chris Culy The project Goal: develop a scientific visualization of some kind of linguistic data Start thinking about what kind of data you want to visualize, and where you'll get it Who: Small groups If you are inexperienced in programming, work with someone who is more experienced What you'll need to provide me at the end of the term: 1. A functioning visualization, with some sample data to visualize 2. Technical documentation of how the visualization works, and how to use it e.g. Javadoc and help/readme/tutorial 3. A short (~15 pages) paper describing the visualization: background, its goals, how it works, and future directions 4. If you have gotten feedback from real or potential users, include that in the paper

  23. LInfoVis Winter 2011 Chris Culy Practical information http://www.sfs.uni-tuebingen.de/~cculy/courses/W2011/vis/ cculy@sfs.uni-tuebingen.de Office: 1.07 Tel: 07071/29-7 3966 Sprechstunden (Office hours): T 14-15, Th 16-17

  24. LInfoVis Winter 2011 Chris Culy For next time Read the tutorial (link web site) Through “Principles: visual variables (2)”

More Related