1.51k likes | 1.62k Views
BIO-INSPIRED AND COGNITIVE COMPUTING for data mining, tracking, fusion, financial prediction, language understanding, web search engines, and diagnostic modeling of cultures. IEEE 2007 Fall Short Course Holiday Inn, Woburn MA 7:00 – 9:00 pm, Nov. 8, 15, 22, Dec. 6. Leonid Perlovsky
E N D
BIO-INSPIRED AND COGNITIVE COMPUTINGfordata mining, tracking, fusion, financial prediction, language understanding, web search engines, and diagnostic modeling of cultures IEEE 2007 Fall Short Course Holiday Inn, Woburn MA 7:00 – 9:00 pm, Nov. 8, 15, 22, Dec. 6 Leonid Perlovsky Visiting Scholar, Harvard University Technical Advisor, AFRL
OUTLINE 1.Cognition, Complexity, and Logic 2. The Knowledge Instinct -Neural Modeling Fields and Dynamic Logic 3. Language 4. Integration of cognition and language 5. High Cognitive Functions 6. Evolution of cultures 7. Future directions
DETAILED OUTLINE 2.4. applications, examples, exercises 2.4.1 clustering 2.4.2 tracking and CRB - example of tracking below clutter - complexity NMF vs. MHT - models 2.4.3 recognition - example of pattern in image below clutter - complexity NMF vs. MHT - models 2.4.4 fusion - example of fusion, navigation, and detection below clutter - models 2.4.5 prediction - financial prediction - models 2.5. block-diagrams 2.7. hierarchical structure 3. Language 3.1. language acquisition and complexity 3.1.2. language separate from cognition 3.1.3. hierarchy of language 3.1.4. application - search engine based on understanding 3.2. NMF of language 3.2.1 differentiating non-differentiable qualitative functions 1.Cognition – integration of real-time signals and a priori knowledge 1.1.physics and mathematics of the mind 1.2.genetic argument for the first principles 1.3.the nature of understanding 1.3.1. concepts: “chair” 1.3.2.hierarchy 1.4. combinatorial complexity (CC) – a fundamental problem? 1.5.CC since 1950s 1.6.CC vs. logic 1.6.1. formal, multivalued and fuzzy logics 1.6.2. dynamic logic 1.6.3. Aristotle vs. Godel + Alexander the Great 1.7.mathematics vs. mind 1.8. structure of the mind: concepts, instincts, emotions, behavior 1.9. the knowledgeinstinct 1.9.1. need for learning 1.9.2. “knowledge emotion” = aesthetic emotion 2. Modeling Field Theory (NMF) of cognition 2.1. the knowledge instinct = max similarity 2.2. Similarity as likelihood, as information 2.3. Dynamic Logic (DL)
DETAILED OUTLINECONTINUATION 6. Evolution of Culture 6.1. Culture and language 6.2. KI: differentiation and synthesis 6.3. “Spiritual” cultural measurements 6.4. Mathematical modeling and predictions 6.4.1. dynamic culture 6.4.2. traditional culture 6.4.3. terrorist’s consciousness 6.4.4. interacting cultures 6.5. Evolution of concepts and emotions 6.6. Creativity 6.7. Disintegration of cultures 6.8. Emotions in language 6.9. English vs. Arabic 6.10. Synthesis 6.11. Differentiation of emotions 6.12. Role of music in evolution of the mind and culture 7. Future directions & publications 7.1. Science and Religion 7.2 Predictions and testing 7.3. Future directions 7.4 Publications 4. Integration of cognition and language 4.1. language vs. cognition 4.2. past: AI and Chomskyan linguistics 4.3. integrated models 4.3. integrated hierarchies 4.4. Humboldt’s inner linguistic form 5. Prolegomena to a theory of the mind 5.1. higher cognitive functions 5.2. from Plato to Lock 5.3. from Kant to Grossberg 5.4. NMF vs. Buddhism 5.5. NMF vs. neuro-biology 5.5. NMF dynamics: elementary thought process 5.6. consciousness and unconscious 5.7. aesthetics and beauty 5.8. intuition 5.9. why Adam was expelled from paradise 5.9. symbols and signs 5.9.1. NMF theory of symbols 5.9.2. symbolic AI: “bewitchment” 5.10. why Adam was expelled from paradise
INTRODUCTION Nature of the mind
PHYSICS AND MATHEMATICS OF THE MINDRANGE OF CONCEPTS • Logic is sufficient to explain mind • [Newell, “Artificial Intelligence”, 1980s] • No new specific mathematical concepts are needed • Mind is a collection of ad-hoc principles, [Minsky, 1990s] • Specific mathematical constructs describe the multiplicity of mind phenomena • “first physical principles of mind” • [Grossberg, Zadeh, Perlovsky,…] • Quantum computation • [Hameroff, Penrose, Perlovsky,…] • New unknown yet physical phenomena • [Josephson, Penrose]
GENETIC ARGUMENTSFOR THE “FIRST PRINCIPLES” • Only 30,000 genes in human genome • Only about 2% difference between human and apes • Say, 1% difference between human and ape minds • Only about 300 proteins • Therefore, the mind has to utilize few inborn principles • If we count “a protein per concept” • If we count combinations: 300300 ~ unlimited => all concepts and languages could have been genetically h/w-ed (!?!) • Languages and concepts are not genetically hardwired • Because they have to be flexible and adaptive
COGNITION • Understanding the world around • Perception • Simple objects • Complex situations • Integration of real-time signals and existing (a priori) knowledge • From signals to concepts • From less knowledge to more knowledge
EXAMPLE • Example: “this is a chair, it is for sitting” • Identify objects • signals -> concepts • What in the mind help us do this? Representations, models, ontologies? • What is the nature of representations in the mind? • Wooden chairs in the world, but no wood in the brain
VISUAL PERCEPTION • Neural mechanisms are well studied • Projection from retina to visual cortex (geometrically accurate) • Projection of memories-models • from memory to visual cortex • Matching: sensory signals and models • In visual nerve more feedback connections than feedforward • matching involves complicated adaptation of models and signals • Difficulty • Associate signals with models • A lot of models (expected objects and scences) • Many more combinations: models<->pixels • Association + adaptation • To adapt, signals and models should be associated • To associate, they should be adapted
ALGORITHMIC DIFFICULTIES A FUNDAMENTAL PROBLEM? • Cognition and language involve evaluating large numbers of combinations • Pixels -> objects -> scenes • Combinatorial Complexity (CC) • A general problem (since the 1950s) • Detection, recognition, tracking, fusion, situational awareness, language… • Pattern recognition, neural networks, rule systems… • Combinations of 100 elements are 100100 • This number ~ the size of the Universe • > all the events in the Universe during its entire life
COMBINATORIAL COMPLEXITY SINCE the 1950s • CC was encountered for over 50 years • Statistical pattern recognition and neural networks: CC of learning requirements • Rule systems and AI, in the presence of variability : CC of rules • Minsky 1960s: Artificial Intelligence • Chomsky 1957: language mechanisms are rule systems • Model-based systems, with adaptive models: CC of computations • Chomsky 1981: language mechanisms are model-based (rules and parameters) • Current ontologies, “semantic web” are rule-systems • Evolvable ontologies : present challenge
CC AND TYPES OF LOGIC • CC is related to formal logic • Law of excluded middle (or excluded third) • every logical statement is either true or false • Gödel proved that logic is “illogical,” “inconsistent” (1930s) • CC is Gödel's “incompleteness” in a finite system • Multivalued logic eliminated the “law of excluded third” • Still, the math. of formal logic • Excluded 3rd -> excluded (n+1) • Fuzzy logic eliminated the “law of excluded third” • Fuzzy logic systems are either too fuzzy or too crisp • The mind fits fuzziness for every statement at every step => CC • Logic pervades all algorithms and neural networks • rule systems, fuzzy systems (degree of fuzziness), pattern recognition, neural networks (training uses logical statements)
LOGIC VS. GRADIENT ASCENT • Gradient ascent maximizes without CC • Requires continuous parameters • How to take gradients along “association”? • Data Xn (or) to object m • It is a logical statement, discrete, non-differentiable • Models / ontologies require logic => CC • Multivalued logic does not lead to gradient ascent • Fuzzy logic uses continuous association variables, but no parameters to differentiate • A new principle is needed to specify gradient ascent along fuzzy associations: dynamic logic
DYNAMIC LOGIC • Dynamic Logic unifies formal and fuzzy logic • initial “vague or fuzzy concepts” dynamically evolve into “formal-logic or crisp concepts” • Dynamic logic • based on a similarity between models and signals • Overcomes CC of model-based recognition • fast algorithms
ARISTOTLE VS. GÖDEL logic, forms, and language • Aristotle • Logic: a supreme way of argument • Forms: representations in the mind • Form-as-potentiality evolves into form-as-actuality • Logic is valid for actualities, not for potentialities (Dynamic Logic) • Thought language and thinking are closely linked • Language contains the necessary uncertainty • From Boole to Russell: formalization of logic • Logicians eliminated from logic uncertainty of language • Hilbert: formalize rules of mathematical proofs forever • Gödel (the 1930s) • Logic is not consistent • Any statement can be proved true and false • Aristotle and Alexander the Great
OUTLINE • Cognition, complexity, and logic • Logic does not work, but the mind does • The Mind and Knowledge Instinct • Neural Modeling Fields and Dynamic Logic • Language • Integration of cognition and language • Higher Cognitive Functions • Future directions
STRUCTURE OF THE MIND • Concepts • Models of objects, their relations, and situations • Evolved to satisfy instincts • Instincts • Internal sensors (e.g. sugar level in blood) • Emotions • Neural signals connecting instincts and concepts • e.g. a hungry person sees food all around • Behavior • Models of goals (desires) and muscle-movement… • Hierarchy • Concept-models and behavior-models are organized in a “loose” hierarchy
THE KNOWLEDGE INSTINCT • Model-concepts always have to be adapted • lighting, surrounding, new objects and situations • even when there is no concrete “bodily” needs • Instinct for knowledge and understanding • Increase similarity between models and the world • Emotions related to the knowledge instinct • Satisfaction or dissatisfaction • change in similarity between models and world • Related not to bodily instincts • harmony or disharmony (knowledge-world): aesthetic emotion
REASONS FOR PAST LIMITATIONS • Human intelligence combines conceptual understanding with emotional evaluation • A long-standing cultural belief that emotions are opposite to thinking and intellect • “Stay cool to be smart” • Socrates, Plato, Aristotle • Reiterated by founders of Artificial Intelligence [Newell, Minsky]
Neural Modeling Fields (NMF) • A mathematical construct modeling the mind • Neural synaptic fields represent model-concepts • A loose hierarchy of more and more general concepts • At every level: bottom-up signals, top-down signals • At every level: concepts, emotions, models, behavior • Concepts become input signals to the next level
NEURAL MODELING FIELDSbasic two-layer mechanism: from signals to concepts • Signals • Pixels or samples (from sensor or retina) x(n), n = 1,…,N • Concept-Models (objects or situations) Mm(Sm,n), parameters Sm, m = 1, …; • Models predict expected signals from objects • Goal: learn object-models and understand signals (knowledge instinct)
THE KNOWLEDGE INSTINCT • The knowledge instinct = maximization of similarity between signals and models • Similarity between signals and models, L • L = l ({x}) = l (x(n)) • l (x(n)) = r(m) l (x(n) | Mm(Sm,n)) • l (x(n) | Mm(Sm,n)) is a conditional similarity for x(n) given m • {n} are not independent, M(n) may depend on n’ • CC: L contains MN items: all associations of pixels and models (LOGIC)
SIMILARITY • Similarity as likelihood • l (x(n) | Mm(Sm,n)) = pdf(x(n) | Mm(Sm,n)), • a conditional pdf for x(n) given m • e.g., Gaussian pdf(X(n)|m) = G(X(n)|Mm,Cm) = 2p-d/2 detCm-1/2 exp(-DmnTCm-1 Dmn/2); Dmn = X(n) – Mm(n) • Note, this is NOT the usual “Gaussian assumption” • deviations from models D are random, not the data X • multiple models {m} can model any pdf, not one Gaussian model • Use for sets of data points • Similarity as information • l (x(n) | Mm(Sm,n)) = abs(x(n))*pdf(x(n) | Mm(Sm,n)), • a mutual information in model m on data x(n) • L is a mutual information in all model about all data • e.g., Gaussian pdf(X(n)|m) = G(X(n)|Mm,Cm) • Use for continuous data (signals, images)
DYNAMIC LOGIC (DL) non-combinatorial solution • Start with a set of signals and unknown object-models • any parameter values Sm • associate object-model with its contents (signal composition) • (1) f(m|n) = r(m) l (n|m) /r(m') l (n|m') • Improve parameter estimation • (2) Sm = Sm + a f(m|n) [ln l (n|m)/Mm]*[Mm/Sm] • (adetermines speed of convergence) • learn signal-contents of objects • Continue iterations (1)-(2). Theorem: MF is a convergingsystem - similarity increases on each iteration - aesthetic emotion is positive during learning
OUTLINE • Cognition, complexity, and logic • Logic does not work, but the mind does • The Mind and Knowledge Instinct • Neural Modeling Fields and Dynamic Logic • Application examples • Language • Integration of cognition and language • Higher Cognitive Functions • Future directions
APPLICATIONS • Many applications have been developed • Government • Medical • Commercial (about 25 companies use this technology) • Sensor signals processing and object recognition • Variety of sensors • Financial market predictions • Market crash on 9/11 predicted a week ahead • Internet search engines • Based on text understanding • Evolving ontologies for Semantic Web • Every application needs models • Future self-evolving models: integrated cognition and language
APPLICATION 1 - CLUSTERING • Find “natural” groups or clusters in data • Use Gaussian pdf and simple models l (n|m) = 2p-d/2 detCm-1/2 exp(-DmnTCm-1 Dmn/2); Dmn = X(n) – Mm(n) Mm(n) = Mm;each model has just 1 parameter, Sm = Mm • This is clustering with Gaussian Mixture Model • For complex l(n|m) derivatives can be taken numerically • For simple l(n|m) derivatives can be taken manually • Simplification, not essential • Simplify parameter estimation equation for Gaussian pdf and simple models ln l (n|m)/Mm = (-DmnTCm-1 Dmn) /Mm = Cm-1 Dmn + DmnTCm-1 = 2 Cm-1 Dmn, (C is symmetric) Mm = Mm + a f(m|n) Cm-1 Dmn … • In this case, even simpler equations can be derived samples in class m: Nm = f(m|n); N = Nm rates (priors): rm = Nm / N means: Mm = f(m|n) X(n) / Nm covariances: Cm = f(m|n) Dmn * DmnT / Nm - simple interpretation: Nm, Mm, Cm are weighted averages. The only difference from standard mean and covariance estimation is weights f(m|n), probabilities of class m • These are iterative equations, f(m|n) depends on parameters; theorem: iterations converge
EXERCISE (1) – HOME WORK • Code a simple NMF/DL for Gaussian Mixture clustering (1) Simulate data • Specify true parameters, say: • Dimensionality, d = 2 • Number of classes, 3, m = 1, 2, 3; • Rates rm= 0.2, 0.3, 0.5; • Number of samples, N = 100; Nm= N*rm= 20, 30, 50; • Means, Mm= …; 2-d vectors, (0.1, 0.1), (0.2, 0.8) (0.8, 0.2) • Covariances, Cm = unit matrixes • Call a function that generates Gaussian data (2) Run NMF/DL code to estimate parameters (rm, Mm, Cm) and association probabilities f(m|n) • Initiate parameters to any values, say each r = 1/3; means should not be the same, say M = (0,0), (0,1), (1,1); covariances should be initiated to larger values than uncertainties in the means (squared), say each C = diag(2,2). • Run iterations: estimate f(m|n), estimate parameters… until, say (changes in M) < 0.01 (3) Plot results (2-d plots) • Plot data for each class in different color/symbols • Plot means • Plot covariances • 2-s ellipses around their means: (X-M)TC-1(X-M) = 2
APPLICATION 2 - TRACKING 1) Example 2) Complexity of computations • Exercise • Cramer-Rao Bounds
1 km (a) True Tracks Range 0 Cross-Range 0 1 km b Initial state of model 2 iterations 1 km Range 0 5 iterations 9 iterations 12 iterations Converged state Example 2: GMTI Tracking and Detection Below Clutter DL starts with uncertain knowledge and converges rapidly on exact solution 18 dB improvement
EXAMPLE (2) note page • Detection and tracking targets below clutter: (a) true track positions in 0.5km x 0.5km data set; (b) actual data available for detection and tracking (signal is below clutter, signal-to-clutter ratio is about –2dB for amplitude and –3dB for Doppler; 6 scans are shown on top of each other, each contains 500 data points). Dynamic logic operation: (c) an initial fuzzy model, the fuzziness corresponds to the uncertainty of knowledge; (d) to (h) show increasingly improved models at various iterations (total of 20 iterations). Between (c) and (d) the algorithm fits the data with one model, uncertainty is somewhat reduced. There are two types of models: one uniform model describing clutter (it is not shown), and linear track models with large uncertainty; the number of track models, locations, and velocities are estimated from the data. Between (d) and (e) the algorithm tried to fit the data with more than one track-model and decided, that it needs two models to ‘understand’ the content of the data. Fitting with 2 tracks continues till (f); between (f) and (g) a third track is added. Iterations stopped at (h), when similarity stopped increasing. Detected tracks closely correspond to the truth (a). Complexity of this solution is low, about 106 operations. Solving this problem by MHT (template matching with evaluating combinations of various associations – this is a standard state-of-the-art) would take about MN = 101700 operations, unsolvable.
TRACKING AND DETECTION BELOW CLUTTER (movie, same as above) DL starts with uncertain knowledge, and similar to human mind does not sort through all possibilities, but converges rapidly on exact solution 3 targets, 6 scans, signal-to-clutter, S/C ~ -3.0dB
TRACKING EXAMPLE complexity and improvement • Technical difficulty • Signal/Clutter = - 3 dB, standard tracking requirements 15 dB • Computations, standard hypothesis testing ~ 101700, unsolvable • Solved by Dynamic Logic • Computations: 2x107 • Improvement 18 dB
EXERCISE (2) • Develop NMF/DL for target tracking • Step 1 (and the only one) • Develop models for tracking • Model: where do you expect to see the target • Parameters, Sm: position xm and velocity vm • Mm(xm, vm , n) = xm + vm * tn • Time, tn is not a parameter of the model, but data • More complex: Keplerian trajectories • Note: typical data are 2-d (Q,F) or 3-d (R,Q,F) • Models are 3-d (or 2-d, if sufficient) • 3-d models might be “not estimatable” from 2-d data • linear track in (x,y,z) over a short time appears linear in (Q,F) • then, use 2-d models, or additional data, or longer observation period
CRAMER-RAO BOUND (CRB) • Can a particular set of models be estimated from a particular (limited) set of data? • The question is not trivial • A simple rule-of-thumb: N(data points) > 10*S(parameters) • In addition: use your mind: is there enough information in the data? • CRB: minimal estimation error (best possible estimation) for any algorithm or neural neworks, or… • When there are many data points, CRB is a good measure (=ML=NMF) • When there are few data points (e.g. financial prediction) it might be difficult to access performance • Actual errors >> CRB • Simple well-known CRB for averaging several measurements st.dev(n) = st.dev(1)/√n • Complex CRB for tracking: • Perlovsky, L.I. (1997a). Cramer-Rao Bound for Tracking in Clutter and Tracking Multiple Objects. Pattern Recognition Letters, 18(3), pp.283-288.
EXERCISE (2 cont.) HOMEWORK • Homework: code NMF/DL for tracking, simulate data, run the code, and plot results • When simulating data: add sensor error and clutter • track + sensor errors: X(n,m) = xm + vm * tn + emn • sensor error: use sensor error model (or simple Gaussian) for em,n • clutter: X(n,m+1) = ec,n ; use clutter model for ec,n, or a simple uniform or Gaussian; simulate required number of clutter data points • Do not forget to add clutter model to your NMF/FL code • l(n|m=clutter) = const; the only parameter, rc = expected proportion of clutter • Note: the resulting system is “track-before-detect” or more accurately “concurrent detection and tracking” • does not require a standard procedure • (1) detect, (2) associate, (3) track • association and detection is obtained together with tracking • DL requires no combinatorial searches, which often limits “track-before-detect” performance
APPLICATION 3 FINDING PATTERNS IN IMAGES
IMAGE PATTERN BELOW NOISE Object Image Object Image + Clutter y y x x
PRIOR STATE-OF-THE-ART Computational complexity Multiple Hypothesis Testing (MHT) approach: try all possible ways of fitting model to the data For a 100 x 100 pixel image: Number of ObjectsNumber of Computations 1 1010 2 1020 3 1030
NMF MODELS • Information similarity measure lnl (x(n) | Mm(Sm,n)) = abs(x(n))*ln pdf(x(n) | Mm(Sm, n)) n = (nx,ny) • Clutter concept-model (m=1) pdf(X(n)|1) = r1 • Object concept-model (m=2… ) pdf(x(n) | Mm(Sm, n))= r2 G(X(n)|Mm (n,k),Cm) Mm (n,k) = n0+ a*(k2,k); (note: k, K require no estimation)
ONE PATTERN BELOW CLUTTER Y X SNR = -2.0 dB
DYNAMIC LOGIC WORKING DL starts with uncertain knowledge, and similar to human mind converges rapidly on exact solution • Object invisible to human eye • By integrating data with the knowledge-model DL finds an object below noise y (m) Range x (m) Cross-range
MULTIPLE PATTERNS BELOW CLUTTER Three objects in noise object 1 object 2 object 3 SCR - 0.70 dB -1.98 dB -0.73 dB 3 Object Image 3 Object Image + Clutter y y x x
b d a c h e f g IMAGE PATTERNS BELOW CLUTTER (dynamic logic iterations see note-text)
b d a c h e f g IMAGE PATTERNS BELOW CLUTTER (dynamic logic iterations see note-text) Logical complexity = MN = 105000, unsolvable; DL complexity = 107 S/C improvement ~ 16 dB
APPLICATION (3) note page • ‘smile’ and ‘frown’ patterns: (a) true ‘smile’ and ‘frown’ patterns shown without clutter; (b) actual image available for recognition (signal is below clutter, signal-to-clutter ratio is between –2dB and –0.7dB); (c) an initial fuzzy model, the fuzziness corresponds to the uncertainty of knowledge; (d) to (h) show increasingly improved models at various iterations (total of 22 iterations). Between (d) and (e) the algorithm tried to fit the data with more than one model and decided, that it needs three models to ‘understand’ the content of the data. There are several types of models: one uniform model describing the clutter (it is not shown), and a variable number of blob models and parabolic models, which number, locations, and curvatures are estimated from the data. Until about (g) the algorithm ‘thought’ in terms of simple blob models, at (g) and beyond, the algorithm decided that it needs more complex parabolic models to describe the data. Iterations stopped at (h), when similarity stopped increasing. Complexity of this solution is moderate, about 1010 operations. Solving this problem by template matching would take a prohibitive 1030 to 1040 operations. (This example is discussed in more details in [[i]].) [i] Linnehan, R., Mutz, Perlovsky, L.I., C., Weijers, B., Schindler, J., Brockett, R. (2003). Detection of Patterns Below Clutter in Images. Int. Conf. On Integration of Knowledge Intensive Multi-Agent Systems, Cambridge, MA Oct.1-3, 2003.
MULTIPLE TARGET DETECTION DL WORKING EXAMPLE DL starts with uncertain knowledge, and similar to human mind does not sort through all possibilities like an MHT, but converges rapidly on exact solution y x
COMPUTATIONAL REQUIREMENTS COMPARED Dynamic Logic (DL) vs. Classical State-of-the-art Multiple Hypothesis Testing (MHT) Based on 100 x 100 pixel image Number of Objects Number of Computations DL vs. MHT 1 2 3 108 vs. 1010 2x108vs. 1020 3x108 vs. 1030 • Previously un-computable (1030), can now be computed (3x108 ) • This pertains to many complex information-finding problems
APPLICATION 4 SENSOR FUSION Concurrent fusion, navigation, and detection below clutter