210 likes | 328 Views
Automatically Building Concept Structures and Displaying Concept Trails for the Use in Brainstorming Sessions and Content Management Systems. Chris Biemann, Karsten Böhm, Gerhard Heyer, Ronny Melz University of Leipzig I2CS 2004 – Guadalajara - Mexico 06-23-2004. Support for Creativity.
E N D
Automatically Building Concept Structures and Displaying Concept Trails for the Use in Brainstorming Sessions and Content Management Systems Chris Biemann, Karsten Böhm, Gerhard Heyer, Ronny MelzUniversity of Leipzig I2CS 2004 – Guadalajara - Mexico 06-23-2004
Support for Creativity Acquisition of Knowledge Gathering information from structured and unstructered texts, databases, document collections, web etc. Processing KnowledgeGeneration of semantic maps and associations in cooperative teamwork meetings Using Knowledge Visualization of terms and relations. Filters define views on semantically relevant contents and structures.
Goal 1: Computer-aided Associating Software realizes • Protocol function by displaying identified keywords • Adding associations from database • Displaying keywords reflecting semantical similarity Desired effects: • Users can remember the session later easily • During the session, associations remind users of terms they might have forgotten otherwise • The weight and the relatedness of differrent topics in a session becomes visible
Goal 2: Semantic Map and Red Thread Software realises: • Calculation and visualisation of large document collections by using important terms (keywords) • Positioning of terms reflects semantic closeness • Small documents can be drawn into the semantic map: red thread functionality Desired effects • A fixed map gives rise to orientation in the contents of the document collection • Important terms can be overseen quickly • Red thread functionality can be used for „fast reading“
Data Sources • Projekt Deutscher Wortschatz: Word list and co-occurrences- for associations- as a reference corpus for the semantic map • Manual Annotation: typed (coloured) edges and nodes- Semantic primitives- Semantic relations
Calculating Associations: Statistical Co-occurrences • Co-occurrence: occurrence of two or more words within a well-defined unit of information (sentence, nearest neighbors) • Significant Co-occurrences reflect relations between words • Significance measure (log-likelihood) • This measure defines the association degree between all words. High degrees result in edges in the semantic map
Example for Co-occurrences Significant Co-occurrences of Guadalajara: Camarena (194), Mexico (104), Mexican (58), kidnapped (43), Zavala (40), ranch (40), Avelar (37), abducted (35), Alvarez (33), drug (33), Camarena's (32), pilot (32), Caro (30), Enrique (29), agent (27), Enforcement (25), Quintero (25), gynecologist (23), tortured (23), Jalisco (22), DEA (21), Drug (21), miles (21), torture (21), Alfredo (20), Machain (18), Feb (16), bodies (16), southeast (16), Monterrey (15), Rafael (15), found (15), Radelat (14), Paso (13), consulate (13), Administration (12), Salazar (12), body (12), killed (12), outside (12), Vasquez (11), Verdugo (11), bullet-riddled (11), murder (11), El (10), Humberto (10), Lopez (10), lord (10), Felix (9), Gallardo (9), Hernandez (9), Mexico's (9), arrested (9), cartel (9), Alberto (8), City (8), March (8), Zuno (8), city (8), homicide (8), indictment (8), kidnapping (8), Caro-Quintero (7), February (7), Tijuana (7), Zuno-Arce (7), buried (7), marijuana (7), racketeering (7), slayings (7), 31-year-old (6), April (6), Consulate (6), Culiacan (6), Javier (6), Machain's (6), agents (6), office (6) Significant left Neighbours of Guadalajara: outside (12), near (5) Significant right Neighbours of Guadalajara: gynecologist (27), office (8), street (8), home (6), Haggadah (5), drug (5)
Calculating Semantic Maps Requires: document collection • Calculate co-occurrences and keywords by differential frequency analysis: important words are much more frequent in the document collection than in a large reference corpus • take the highest ranked words from the differential frequency analysis as nodes • Take highly significant co-occurrences to existing nodes as further nodes • Remove stopwords (functional words, determiners...) • Insert edges between nodes that have a high association degree by co-occurrence significance
Positioning in Semantic Maps force-directed: nodes and edges are thrown on a plane and then driven to equilibrium by minimizing the energy
Domain Adjustment PARTIAL OVERLAP PARTIAL OVERLAP Session knowledge / Project knowledgeEnrichment of database by incorporating task-specific knowledge and know-how. Community knowledge / Domain knowledgeGeneration of a semantic map by processing domain-relevant documents and incorporating existing ontologies. Wortschatz- Database (Very Large Corpus)
Visualization Extension of Touchgraph (www.touchgraph.com): • Force-directed model for positioning • Label filling colours for runtime-type (keyword, associated, red thread) • Label edge colours for semantic primitives • Edge colours for semantic relations • Nodes can be displayed as lables or dots Is-A Relation white: keyword by user co-hyponymy Relation grey: association from DB primitive: Noun primitive: organisation
Zooms • Conceptual zoom: lexicalize nodes or display them as dots • Granularity: reduce number of visible nodes • Optical zoom: size of window compared to total size of the map
Adding nodes in Association Mode • User keywords are added to the graph. They fade if they get not connected for a certain time. • Grey words are added if they are associated to at least two user keywords Lasst uns über Mexiko sprechen. Die Mexikaner tragen Sombreros, das sind Hüte für den Sonnenschutz. So einen Hut hätte ich auch gern! Das ist ein Land in Mittelamerika.
Red Thread Functionality Afghanistan Georgia Iraq • Given: semantic map, additional input • Terms from the additional input that are found in the semantic map are coloured in red and connected in sequence of their occurrence- red connection: the edge already existed in the semantic map- yellow connection: the edge is new • Long-range yellow edges visualize topic shifts
SemanticTalk GUI topic survey window zoom rulers local context window
Embedding in the system • Implementation as java servlet with tomcat webserver • Mysql-Database for Graphs and associations • Linguatec VoicePro 10 – Interface for speech recognition • Several (language recognition)-clients can be connected via LAN
Interfaces • Import/Export- various formats for text files- XML/RDF/RDB for maps- PNG for maps The results obtained with SemanticTalk can be saved, loaded and exported to other tools for further processing • Retrieval:- words (nodes) with links to occurrences in the document collection- associations (edges) with links to occurrences in the document collection- explicit links, e.g. pictures for words
Further Processing of Net Topology Structures Product model Exchange format (e.g. rdf) Varianten Process model Semantic Map Ressource model Transformation in application models
Questions? THANK YOU!