240 likes | 254 Views
This chapter provides a condensed overview of classical machine learning methods, neural network architecture, and optimization techniques. It covers topics such as deep learning, biological inspiration for artificial neural networks, model types, and mathematical foundations. With a focus on questions framing, linear algebra, regression, classification, clustering, underfitting, overfitting, and optimization processes like gradient descent and stochastic gradient descent. The text explores generative and discriminative models key to understanding machine learning fundamentals.
E N D
Patterson: Chap 1A Review of Machine Learning Dr. Charles Tappert The information here, although greatly condensed, comes almost entirely from the chapter content.
This Chapter • Because the focus of this book is on Deep Learning, this first chapter presents only a rough review of the classical methods employed in machine learning • These classical methods are covered in more detail in the Duda textbook
The Learning Machines • Definition: Machine Learning is using algorithms to extract information from raw data and represent it in some type of model • Deep learning emerged about 2006 and the deep learning systems are now winning the important machine learning competitions
The Learning Machines Biological Inspiration • Biological neural networks (brains) contain • Roughly 86 billion neurons • Over 500 trillion connections between neurons • Biological neural networks are much more complex than artificial neural networks (ANN) • Main properties of ANNs • Basic unit is the artificial neuron (node) • We can train ANNs to pass along only useful signals
The Learning Machines What is Deep Learning? • For the purposes of this book we define deep learning as neural networks with a large number of parameters and layers in one of four fundamental network architectures • Unsupervised pretrained networks • Convolutional neural networks • Recurrent neural networks • Recursive neural networks
The Learning Machines Going Down the Rabbit Hole • Deep learning has penetrated the computer science consciousness beyond most techniques in recent history • Top-flight accuracy with deep learning models • This initiates many philosophical discussions • Can machines be creative? What is creativity? • Can machines be as intelligent as humans?
Framing the Questions • The basics of machine learning are best understood by asking the correct questions • What is the input data? • What kind of model is best for the data? • What kind of answer would we like to elicit from new data based on this model?
Math Behind Machine Learning • Linear Algebra • Scalars, vectors, matrices, tensors, hyperplanes, solving systems of equations • Probability and Statistics • Conditional probabilities, Bayes Theorem, probability distributions • Students are expected to have the math background for this course
How Does Machine Learning Work? • Fundamentally, machine learning is based on algorithmic techniques to minimize the error in Ax = b through optimization where • A is a matrix of input row vectors • x is the weight vector • b is a column vector of output labels • Essentially, we want to determine x = A-1b but it is usually not possible to invert A
How Does Machine Learning Work?Regression, especially Linear • Attempts to find a function that describes the relationship between input x and output y • For linear regression, y = a + Bx
How Does Machine Learning Work?Classification • The model attempts to find classes based on a set of input features • The dependent variable y is categorical rather than numerical • Binary classifier is the most basic • For example, someone has a disease or not
How Does Machine Learning Work?Clustering • Clustering is unsupervised learning that usually involves a distance measure and iteratively moves similar items more closely together • At the end of the process, the items are clustered densely around n centroids
How Does Machine Learning Work?Optimization • Parameter optimization is the process of adjusting weights to produce accurate estimates of the data • Convergence of an optimization algorithm finds the parameters providing the smallest error across the training samples • The optimization function guides the learning toward a solution of least error
How Does Machine Learning Work?Convex Optimization • Convex optimization deals with convex cost functions
How Does Machine Learning Work?Gradient Descent • Gradient is a vector of n partial derivatives of the function f, a generalization of the 1D derivative • Problems – local minima and non-normalized features
How Does Machine Learning Work?Stochastic Gradient Descent (SGD) • Stochastic gradient descent calculates gradient and updates parameter vector after each training sample • Whereas gradient descent calculates the gradient and updates the parameter vector over all training samples • The SGD method speeds up learning • A variant of SGD, called mini-batch, uses more than a single training sample per iteration and leads to smoother convergence
How Does Machine Learning Work? Generative vs Discriminative Models • Two major model types – generative & discriminative • Generative models understand how the data were created in order to generate an output • These models generate likely output, such as art similar to that of a well-known artist • Discriminative models simply give us a classification or category for a given input • These models are typically used for classification in machine learning
Logistic Regression • Logistic regression is a well-known linear classification model • Handles binary classification as well as multiple labels • The dependent variable is categorical (e.g., classification) • We have three components to solve for our parameter vector x • A hypothesis about the data • A cost function, maximum likelihood estimation • An update function, derivative of the cost function
Logistic RegressionThe Logistic Function • The logistic function is defined as • This function is useful because it maps the input range of -infinity to +infinity into the output range 0-1, which can be interpreted as a probability
Logistic Regression Understanding Logistic Regression Output • The logistic function is often denoted with the Greek letter sigma because the graph representation resembles an elongated “s” whose max and min asymptotically approach 1 and 0, respectively • f(x) represents the probability that y equals 1 (i.e., true)
Evaluating ModelsThe Confusion Matrix • Various measures: e.g., Accuracy = (TP+TN)/(TP+TN+FP+FN)
Building an Understanding of Machine Learning • In this chapter, we introduced the core concepts needed for practicing machine learning • The core mathematical concept of modeling is based around the equation • We looked at the core ideas of getting features into the matrix A, ways to change the parameter vector x, and setting the outcomes in the vector b