6. Understanding Big Data Analysis with Machine Learning

2015.4.30 서울시립대학교 전전컴 인공지능연구실 김유상 6. Understanding Big Data Analysis with Machine Learning

Introduction to machine learning • 기계학습이란.. • 활용예 • 스팸메일 검출기 • 자동운전 • 음성인식 • 얼굴인식 • 온라인상 이상활동 감지 등 • 관련 application/framework • R • Python • Apache Mahout • Weka

Supervised machine-learning algorithms • Linear regression • Logistic regression

Linear regression • Regression can be formulated as follows • The slope of the regression line - 기울기 • The intercept point of regression – y절편 • 선형회귀의 활용 • 판매예측 • 제품가격 최적화 • 다양한 자료 및 행사에 기반한 다음 온라인 구입 예측

Linear regression with R • train_data

Linear regression with R

Linear regression with R and Hadoop

Linear regression with R and Hadoop • Calculating the Xtx value with MapReduce job1. • Calculating the Xty value with MapReduce job2. • Deriving the coefficient values with Solve (Xtx, Xty).

Calculating the Xtx value with MapReduce job1

Calculating the Xty value with MapReduce job2

Deriving the coefficient values with Solve (Xtx, Xty)

Logistic regression • To predict the log odds ratios, use the following formula: • The probability formula is as follows • 로지스틱회귀의 활용 • 온라인구매의 가능성 예측 • 당뇨병 여부 진단

Logistic regression with R

Logistic regression with R and Hadoop • Defining the lr.map Mapper function • Defining the lr.reducer Reducer function • Defining the logistic.regressionMapReduce function

Logistic regression with R and Hadoop • foodstamp : Food-Stamp Program

Logistic regression with R and Hadoop

Unsupervised machine learning algorithm • Clustering • Artificial neural networks • Vector quantization

Clustering • Clustering is the task of grouping a set of object in such a way that similar objects with similar characteristics are grouped in the same category. • R에 있는 클러스터링 기술 • K-means • K-medoids • Hierachical • Density-based • 클러스터링의 활용 • 시장세분화 • 사회연결망 분석 • 컴퓨터 네트워크 조직화 • 천문 데이터 분석

Clustering with R

Performing clustering with R and Hadoop • Defining the dist.fun distance function • Defining the k-means.map k-means Mapper function • Defining the k-means.reduce k-means Reducer function • Defining the k-means.mr k-means MapReduce function • Defining input data points to be provided to the clustering algorithms

Performing clustering with R and Hadoop

Performing clustering with R and Hadoop • kmeans.mr 실행중 에러발생 • 하둡로그확인 • apply 함수에서 dim(X)를 찾지못함 • colSums는 vector형이고 apply는 matrix형을 요구하여 생긴문제로 추정

Performing clustering with R and Hadoop • 결과(책내용)

Recommendation algorithms • User-based recommendations • 유저에 기반하여 비슷한 유저의 선호도를 바탕으로 아이템을 추천 • Item-based recommendations • 아이템에 기반하여 유저가 선호하는 아이템과 비슷한 아이템을 추천

Steps to generate recommendations in R • Computing the co-occurrence matrix. • Establishing the user-scoring matrix. • Generating recommendations.

Steps to generate recommendations in R • small.csv

Computing the co-occurrence matrix

Establishing the user-scoring matrix

Generating recommendations

Generating recommendations with R and Hadoop • Establishing the co-occurrence matrix items. • Establishing the user scoring matrix to articles. • Generating recommendations.

Establishing the co-occurrence matrix items

Establishing the user scoring matrix to articles

Generating recommendations

Performing clustering with R and Hadoop • cal.mr 실행중 에러발생 • 하둡로그확인

6. Understanding Big Data Analysis with Machine Learning

6. Understanding Big Data Analysis with Machine Learning

Presentation Transcript

Machine Learning Techniques for HEP Data Analysis with T MVA

Machine Learning and the Big Data Challenge

Understanding Big Data

GraphChi: Big Data – small machine

Learning with understanding

Big Data Analysis

Machine Learning Techniques for HEP Data Analysis with T MVA

Understanding big data…

Machine Learning from Big Datasets

Machine Learning Techniques for HEP Data Analysis with T MVA

How machine learning is benefitting big data analytics ?

Machine Learning With R | Machine Learning Algorithms | Data Science Training | Edureka

Machine Learning for Big Data, Methods and Applications

Lecture 6 Machine Learning Bioinformatics Data Analysis and Tools

Machine Learning Techniques for HEP Data Analysis with T MVA

How are big data and machine learning related?