180 likes | 330 Views
Neural Networks, Fuzzy Logic, and Statistical Methods. CITS4404 AI and Adaptive Systems. Neural Networks (NNs). Reading: S. Russell and P. Norvig, Section 20.5, Artificial Intelligence: A Modern Approach, Prentice Hall, 2002.
Neural Networks, Fuzzy Logic, and Statistical Methods CITS4404 AI and Adaptive Systems
Neural Networks (NNs) Reading: S. Russell and P. Norvig, Section 20.5, Artificial Intelligence: A Modern Approach, Prentice Hall, 2002. G. McNeil and D. Anderson, “Artificial Neural Networks Technology”, The Data & Analysis Center for Software Technical Report, 1992.
The Nature-Inspired Metaphor • Inspired by the brain: • neurons are structurally simple cells that aggregate and disseminate electrical signals • computational power and intelligence emerges from the vast interconnected network of neurons • NNs act mainly as: • function approximators • pattern recognisers • They learn from observed data Diagrams taken from a report on neural networks by C. Stergiou and D. Siganos An Overview of Core CI Technologies
The Neuron Model Bias Weight Activation Function • A neuron combines values via its input function and its activation function • The bias determines the threshold needed for a “positive” response • Single-layer neural networks (perceptrons) can represent only linearly-separable functions Input Function Output Input Links Output Links An Overview of Core CI Technologies
Multi-Layered Neural Networks • A network is formed by the connections (links) of many nodes • inputs map to outputs through one or more hidden layers • Link-weights control the behaviour of the function represented by the NN • adjusting the weights changes the encoded function An Overview of Core CI Technologies
Multi-Layered Neural Networks • Hidden layers increase the “power” of the NN at the cost of extra complexity and training time: • perceptrons capture only linearly-separable functions • an NN with a single (sufficiently large) hidden layer can represent any continuous function with arbitrary accuracy • two hidden layers are needed to represent discontinuous functions • There are two main types of multi-layered NNs: • feed-forward: simple acyclic structure – the stateless encoding allows functions of just its current input • recurrent: cyclic feedback loops are allowed – the stateful encoding supports short-term memory An Overview of Core CI Technologies
Training Neural Networks • Training means adjusting link-weights to minimise some measure of error (the cost function) • i.e. learning is an optimisation search in weight-space • Any search algorithm can be used, most commonly gradient descent (back propagation) • Common learning paradigms: • supervised learning: training is by comparison with known input/output examples (a training set) • unsupervised learning: no a priori training set is provided; the system discovers patterns in the input • reinforcement learning: training uses environmental feedback to assess the quality of actions An Overview of Core CI Technologies
Neuro-Evolution and Reinforcement Learning • Neuro-evolution uses a Neural Network to describe the phenotype of a solution, where a genome is the weights on the edges (or even the topology of the network) • Methods such as PSO or EAs are then used to optimise the network weights, given feedback • These techniques are particularly useful for reinforcement learning, where fitness is easy to calculate but input-output pairs are hard to generate An Overview of Core CI Technologies
Neural Nets in Unsupervised Learning • Neural Networks can also be used for unsupervised learning • Large input and output layers have a smaller hidden layer in between, and the error is then calculated as the difference between the input and output layer • The distance between elements is the distance between their hidden layers An Overview of Core CI Technologies
Fuzzy Systems Reading: Lofti Zadeh, “Fuzzy logic”, Computer IEEE 1988:4 83-93. G. Gerla, “Fuzzy logic programming and fuzzy control”, Studia Logica, 79 (2005): 231-254. Jan Jantzen, “Design of fuzzy controllers”, Technical Report.
Fuzzy systems • Fuzzy logic facilitates the definition of control systems that can make good decisions from noisy, imprecise, or partial information • There are two key concepts • Graduation: everything is a matter of degree, e.g. it can be “not cold”, or “a bit cold”, or “a lot cold”, or … • Granulation: everything is “clumped”, e.g. age is young, middle-aged, or old young old 1 middle-aged 0 age
Fuzzy Logic • The syntax of Fuzzy logic typically includes • propositions (“It is raining”, “CITS4404 is difficult”, etc.) • Boolean connectives (and, not, etc.) • The semantics of Fuzzy logic differs from propositional logic; rather than assigning a True/False value to a proposition, we assign a degree of truth between 0 and 1, e.g. v(“CITS4404 is difficult”) = 0.8 • Typical interpretations of the operators and and not are • v(not p) = 1 – v(p) • v(p and q) = min {v(p), v(q)} (Godel-Dummett norm) • Different semantics may be given by varying the interpretation of and (the T-norm). Anything commutative, associative, monotonic, continuous, and with 1 as an identity can be a T-norm. Other common T-norms are: • v(p and q) = v(p) * v(q) (product norm) • v(p and q) = max{v(p) + v(q) – 1, 0} (Lukasiewicz norm)
Vagueness and Uncertainty • The product norm captures our understanding of probability or uncertainty with a strong independence assumption • prob(Rain and Wind) = prob(Rain) * prob(Wind) • The Godel-Dummett norm is a fair representation of Vagueness: • if it’s a bit windy and very rainy, it’s a bit windy and rainy • Fuzzy logic provides a unifying logical framework for all CI Techniques, as CI techniques are inherently vague • whether or not it is actually implemented is another question
Fuzzy Controllers • A fuzzy control system is a collection of rules • IF X [AND Y] THEN Z • e.g. IF cold AND ¬warming-up THEN increase heating slightly • Such rulesare usually derived empirically from experience, rather than from the system itself • attempt to mimic human-style logic • granulation means that the exact values of any constants (e.g. where does cold start/end?) are less important • The fuzzy rules typically take observations, and according to these observations’ membership of fuzzy sets, we get a fuzzy action • The fuzzy action then needs to be defuzzified to become a precise output
Fuzzy Control • Applying Fuzzy Rules temperature Cold Right Hot no change heat heat -ve no change heat cool d(temperature) / dt Image from http://www.faqs.org/docs/fuzzy/ zero no change cool cool +ve
Statistical Methods Reading: S. Russell and P. Norvig, Section 20.1, Artificial Intelligence: A Modern Approach, Prentice Hall, 2002. R. Barros, M. Basgalupp, A. de Carvalho, A Freitas, “A Survey of Evolutionary Algorithms for Decision Tree Induction”, IEEE Transactions on Systems, Man and Cybernetics.
Naïve Bayes Classifiers • Naïve Bayes Classifiers use a strong independence assumption when trying to determine the class of entity, given observations of that entity • Bayes Rule: • Probabilities are easy to maintain from observations, and calculations are cheap An Overview of Core CI Technologies
Decision Tree Analysis • Decision trees are used for classification problems, where leaves represent classes and branches represent features leading to those classes • Decision trees are easy to use and quite powerful • There are many statistical methods to build decision trees from observations An Overview of Core CI Technologies