240 likes | 253 Views
Patterson: Chap 1 A Review of Machine Learning. Dr. Charles Tappert The information here, although greatly condensed, comes almost entirely from the chapter content. This Chapter.
E N D
Patterson: Chap 1A Review of Machine Learning Dr. Charles Tappert The information here, although greatly condensed, comes almost entirely from the chapter content.
This Chapter • Because the focus of this book is on Deep Learning, this first chapter presents only a rough review of the classical methods employed in machine learning • These classical methods are covered in more detail in the Duda textbook
The Learning Machines • Definition: Machine Learning is using algorithms to extract information from raw data and represent it in some type of model • Deep learning emerged about 2006 and the deep learning systems are now winning the important machine learning competitions
The Learning Machines Biological Inspiration • Biological neural networks (brains) contain • Roughly 86 billion neurons • Over 500 trillion connections between neurons • Biological neural networks are much more complex than artificial neural networks (ANN) • Main properties of ANNs • Basic unit is the artificial neuron (node) • We can train ANNs to pass along only useful signals
The Learning Machines What is Deep Learning? • For the purposes of this book we define deep learning as neural networks with a large number of parameters and layers in one of four fundamental network architectures • Unsupervised pretrained networks • Convolutional neural networks • Recurrent neural networks • Recursive neural networks
The Learning Machines Going Down the Rabbit Hole • Deep learning has penetrated the computer science consciousness beyond most techniques in recent history • Top-flight accuracy with deep learning models • This initiates many philosophical discussions • Can machines be creative? What is creativity? • Can machines be as intelligent as humans?
Framing the Questions • The basics of machine learning are best understood by asking the correct questions • What is the input data? • What kind of model is best for the data? • What kind of answer would we like to elicit from new data based on this model?
Math Behind Machine Learning • Linear Algebra • Scalars, vectors, matrices, tensors, hyperplanes, solving systems of equations • Probability and Statistics • Conditional probabilities, Bayes Theorem, probability distributions • Students are expected to have the math background for this course
How Does Machine Learning Work? • Fundamentally, machine learning is based on algorithmic techniques to minimize the error in Ax = b through optimization where • A is a matrix of input row vectors • x is the weight vector • b is a column vector of output labels • Essentially, we want to determine x = A-1b but it is usually not possible to invert A
How Does Machine Learning Work?Regression, especially Linear • Attempts to find a function that describes the relationship between input x and output y • For linear regression, y = a + Bx
How Does Machine Learning Work?Classification • The model attempts to find classes based on a set of input features • The dependent variable y is categorical rather than numerical • Binary classifier is the most basic • For example, someone has a disease or not
How Does Machine Learning Work?Clustering • Clustering is unsupervised learning that usually involves a distance measure and iteratively moves similar items more closely together • At the end of the process, the items are clustered densely around n centroids
How Does Machine Learning Work?Optimization • Parameter optimization is the process of adjusting weights to produce accurate estimates of the data • Convergence of an optimization algorithm finds the parameters providing the smallest error across the training samples • The optimization function guides the learning toward a solution of least error
How Does Machine Learning Work?Convex Optimization • Convex optimization deals with convex cost functions
How Does Machine Learning Work?Gradient Descent • Gradient is a vector of n partial derivatives of the function f, a generalization of the 1D derivative • Problems – local minima and non-normalized features
How Does Machine Learning Work?Stochastic Gradient Descent (SGD) • Stochastic gradient descent calculates gradient and updates parameter vector after each training sample • Whereas gradient descent calculates the gradient and updates the parameter vector over all training samples • The SGD method speeds up learning • A variant of SGD, called mini-batch, uses more than a single training sample per iteration and leads to smoother convergence
How Does Machine Learning Work? Generative vs Discriminative Models • Two major model types – generative & discriminative • Generative models understand how the data were created in order to generate an output • These models generate likely output, such as art similar to that of a well-known artist • Discriminative models simply give us a classification or category for a given input • These models are typically used for classification in machine learning
Logistic Regression • Logistic regression is a well-known linear classification model • Handles binary classification as well as multiple labels • The dependent variable is categorical (e.g., classification) • We have three components to solve for our parameter vector x • A hypothesis about the data • A cost function, maximum likelihood estimation • An update function, derivative of the cost function
Logistic RegressionThe Logistic Function • The logistic function is defined as • This function is useful because it maps the input range of -infinity to +infinity into the output range 0-1, which can be interpreted as a probability
Logistic Regression Understanding Logistic Regression Output • The logistic function is often denoted with the Greek letter sigma because the graph representation resembles an elongated “s” whose max and min asymptotically approach 1 and 0, respectively • f(x) represents the probability that y equals 1 (i.e., true)
Evaluating ModelsThe Confusion Matrix • Various measures: e.g., Accuracy = (TP+TN)/(TP+TN+FP+FN)
Building an Understanding of Machine Learning • In this chapter, we introduced the core concepts needed for practicing machine learning • The core mathematical concept of modeling is based around the equation • We looked at the core ideas of getting features into the matrix A, ways to change the parameter vector x, and setting the outcomes in the vector b