230 likes | 398 Views
Representing Linguistic Data. Maha Shouman. TextArc. Data target: Raw text Medium-sized Traditional techniques: Structured word lists (indices, concordances) Automatic summary generation Exclude original linearity!. Concordance http://www.opensourceshakespeare.com. Index
E N D
Representing Linguistic Data Maha Shouman
TextArc • Data target: • Raw text • Medium-sized • Traditional techniques: • Structured word lists (indices, concordances) • Automatic summary generation • Exclude original linearity!
Concordance http://www.opensourceshakespeare.com Index http://www.i75online.com/FLAIndexPage1.html
ThemeRiver • Data target: • Large text collections • Temporal patterns • Thematic changes • Traditional techniques: • Histogram • Other visualizations focus on documents
3D ThemeRiver? www.cs.sunysb.edu/~vislab/papers/3DThemeriver.pdf
The Word Tree • Visualization + information retrieval • Graphical Key Word In Context (KWIC) • Format for concordance • KWIC + suffix tree
Click Shift-Click