1 / 43

6. Understanding Big Data Analysis with Machine Learning

Explore the fundamentals of machine learning in analyzing big data. Learn linear and logistic regression, clustering, and recommendations using R and Hadoop. Practical examples and application frameworks included.

byrdwilliam
Download Presentation

6. Understanding Big Data Analysis with Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2015.4.30 서울시립대학교 전전컴 인공지능연구실 김유상 6. Understanding Big Data Analysis with Machine Learning

  2. Introduction to machine learning • 기계학습이란.. • 활용예 • 스팸메일 검출기 • 자동운전 • 음성인식 • 얼굴인식 • 온라인상 이상활동 감지 등 • 관련 application/framework • R • Python • Apache Mahout • Weka

  3. Supervised machine-learning algorithms • Linear regression • Logistic regression

  4. Linear regression • Regression can be formulated as follows • The slope of the regression line - 기울기 • The intercept point of regression – y절편 • 선형회귀의 활용 • 판매예측 • 제품가격 최적화 • 다양한 자료 및 행사에 기반한 다음 온라인 구입 예측

  5. Linear regression with R • train_data

  6. Linear regression with R

  7. Linear regression with R

  8. Linear regression with R and Hadoop

  9. Linear regression with R and Hadoop • Calculating the Xtx value with MapReduce job1. • Calculating the Xty value with MapReduce job2. • Deriving the coefficient values with Solve (Xtx, Xty).

  10. Calculating the Xtx value with MapReduce job1

  11. Calculating the Xty value with MapReduce job2

  12. Deriving the coefficient values with Solve (Xtx, Xty)

  13. Logistic regression • To predict the log odds ratios, use the following formula: • The probability formula is as follows • 로지스틱회귀의 활용 • 온라인구매의 가능성 예측 • 당뇨병 여부 진단

  14. Logistic regression with R

  15. Logistic regression with R and Hadoop • Defining the lr.map Mapper function • Defining the lr.reducer Reducer function • Defining the logistic.regressionMapReduce function

  16. Logistic regression with R and Hadoop • foodstamp : Food-Stamp Program

  17. Logistic regression with R and Hadoop

  18. Logistic regression with R and Hadoop

  19. Logistic regression with R and Hadoop

  20. Unsupervised machine learning algorithm • Clustering • Artificial neural networks • Vector quantization

  21. Clustering • Clustering is the task of grouping a set of object in such a way that similar objects with similar characteristics are grouped in the same category. • R에 있는 클러스터링 기술 • K-means • K-medoids • Hierachical • Density-based • 클러스터링의 활용 • 시장세분화 • 사회연결망 분석 • 컴퓨터 네트워크 조직화 • 천문 데이터 분석

  22. Clustering with R

  23. Performing clustering with R and Hadoop • Defining the dist.fun distance function • Defining the k-means.map k-means Mapper function • Defining the k-means.reduce k-means Reducer function • Defining the k-means.mr k-means MapReduce function • Defining input data points to be provided to the clustering algorithms

  24. Performing clustering with R and Hadoop

  25. Performing clustering with R and Hadoop

  26. Performing clustering with R and Hadoop

  27. Performing clustering with R and Hadoop

  28. Performing clustering with R and Hadoop • kmeans.mr 실행중 에러발생 • 하둡로그확인 • apply 함수에서 dim(X)를 찾지못함 • colSums는 vector형이고 apply는 matrix형을 요구하여 생긴문제로 추정

  29. Performing clustering with R and Hadoop • 결과(책내용)

  30. Recommendation algorithms • User-based recommendations • 유저에 기반하여 비슷한 유저의 선호도를 바탕으로 아이템을 추천 • Item-based recommendations • 아이템에 기반하여 유저가 선호하는 아이템과 비슷한 아이템을 추천

  31. Steps to generate recommendations in R • Computing the co-occurrence matrix. • Establishing the user-scoring matrix. • Generating recommendations.

  32. Steps to generate recommendations in R • small.csv

  33. Computing the co-occurrence matrix

  34. Computing the co-occurrence matrix

  35. Establishing the user-scoring matrix

  36. Generating recommendations

  37. Generating recommendations with R and Hadoop • Establishing the co-occurrence matrix items. • Establishing the user scoring matrix to articles. • Generating recommendations.

  38. Establishing the co-occurrence matrix items

  39. Establishing the co-occurrence matrix items

  40. Establishing the user scoring matrix to articles

  41. Generating recommendations

  42. Performing clustering with R and Hadoop • cal.mr 실행중 에러발생 • 하둡로그확인

More Related