210 likes | 347 Views
Maciej Pilichowski 1 Włodzisław Duch 2 1 Faculty of Mathematics and Computer Science, 2 Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Contact: macias@mat.umk.pl, Google: W.Duch. Neurocognitive Approach to Creativity in the Domain of Word-invention.
E N D
Maciej Pilichowski1 Włodzisław Duch2 1 Faculty of Mathematics and Computer Science, 2 Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Contact: macias@mat.umk.pl, Google: W.Duch Neurocognitive Approach to Creativity in the Domain of Word-invention
Introduction • Creativity: “the capacity to create a solution that is both novel and appropriate”. • Creative brains are: • well trained in a given domain, have great imagination, combine faster basic primitives, recognize interesting combinations of these primitives through emotional and associative filtering.
Computational creativity • To understand creative use of words go to the lower level … • construct words from combinations of phonemes, pay attention to morphemes, flexion etc. Creativity = space + imagination (fluctuations) + filtering (competition) Space: neural tissue providing space for infinite # of activation patterns. Imagination: many chains of phonemes activate in parallel both words and non-words reps, depending on the strength of synaptic connections. Filtering: associations, emotions, phonological/semantic density.
General idea • Start from keywords priming phonological representations in the auditory cortex; spread the activation to concepts that are strongly related. • Use inhibition in the winner-takes-most to avoid false associations. • Find fragments that are highly probable, estimate phonological probability. • Combine them, search for good morphemes, estimate semantic probability.
Autoassociative networks Simplest networks: • binary correlation matrix, • probabilistic p(ai,bj|w) Major issue: rep. of symbols, morphemes, phonology …
Objective Invention of new words that capture some characteristics of objects or processes. For example: • industrial or software products, • activity of companies, • the main topic of web pages. Understanding creative processes in the brain requires network simulations, but here only formal, probabilistic model is considered.
Data • Linguistic source for the Mambo algorithm is based on Google Web 1T 5-gram dictionary. • Spell-checking is based on LRAGR and SCOWL dictionaries. • To avoid over-representation of most common words logarithmic scale of word occurrences has been used.
Word representation • As letters (“the” → ``t'', ``h'', ``e'') – not good for phonological filters, words may not be easy to pronounce. • As phonemes (“the” → “ð”, “ə”) – not easy because most dictionaries do not contain phonological transcriptions. • As a semi-letter form (“the” → “th”, “e”), for English only. • Mixed form of any of the above.
Semantics • “Light” — is it as “small weight” or as “daylight”? • Enforcing required association is crucial: • pairing “possibilities” with “great” (positive association) rather than • “problems” (negative association). • In case of ambiguous situation that the algorithm cannot evaluate the user has to select a proper set of synonyms (synset).
Similarities • real world: “borrow” and yet “sorrow”, “barrow”, or “burrow”, • artificial system: rejected to avoid transitions like “borrow” → “borr” or “borrow” → “borrom”.
Genuineness • Examples of compound words — “bodyguard”, “brainstorm” or “airmail”. • They are forbidden to avoid hijacking of words — priming word “jet” + “●●●mail” from the dictionary → “jetmail”.
ngrams • Function ng(w) returns a sequence of strings (ngrams): where w[i:j] represents string of symbols at positions i to j in the word w, and n·Sng = |w|-Nng-1. In most cases: Nng=2, Sng=1. Example: ''world'' → ''wor'', ''orl'', ''rld''.
Word rank The word rank function Q'(w) • q is a dictionary function, • T(w) is a composition of word w transformations, • ng is a function partitioning symbols in w into overlapping ngrams. The total word rank function is a product over models:
Transformations • Transformation examples for Nng=2, Sng=1:
Data flow topic results WordNet associations associations dictionary priming set word representation word representation word rank similarity probability matrix
Amazon’s Kindle — the core priming set The exclusion list: aird, airin, airs, bookie, collectic, collectiv, globali, globed, papere, papering, pocketf, travelog.
Mambo system — the core priming set The exclusion word: cookie.
Computational efficiency • No priming dictionary, Nng=2, Sng=1, 100 best words, English language,requires: