1 / 20

COMP4332/RMBI4310

COMP4332/RMBI4310. Summary. Prepared by Raymond Wong Presented by Raymond Wong. This course was originally proposed to involve several projects on real data This is the reason why you need to deal with the real data. In the past, there is only one deadline for each project.

princes
Download Presentation

COMP4332/RMBI4310

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP4332/RMBI4310 Summary Prepared by Raymond Wong Presented by Raymond Wong

  2. This course was originally proposed to involve several projects on real data This is the reason why you need to deal with the real data

  3. In the past, there is only one deadline for each project. This year, we have 6 phases for one project which could help distribute your workload (if you follow our suggested guideline for different phases strictly).

  4. In this course, we have gone through the following processes for “Data Analytics”

  5. Data Collection Data Processing Collected Data Processed Data Raw Data Result Presenting Data Mining Processed Data Presentable Forms of Data Mining Results Data Mining Results

  6. Data Collection Data Processing Collected Data Processed Data Raw Data Result Presenting Data Mining Processed Data Presentable Forms of Data Mining Results Data Mining Results

  7. Relational data is stored with the technology of “traditional” relational database management system. Data Collection This system could be manipulated with a database programming language called SQL (Structured Query Language). Collected Data Raw Data e.g., purchase records and transaction records e.g., webpages and social network data Relational data Non-relational data Non-relational data is stored with the technology of “new” non-relational database management system. We know how to “access” the TEXT file (e.g., file reading) This system could be manipulated with a database programming language called NoSQL (Not OnlySQL). We could also “access” the webpages(i.e., data crawling)

  8. Data Collection Data Processing Collected Data Processed Data Raw Data Result Presenting Data Mining Processed Data Presentable Forms of Data Mining Results Data Mining Results

  9. Data Processing Collected Data Processed Data We have to transform and extract the collected data in the “correct” form so that that form could be used for the data mining models to be used in the next process

  10. Data Collection Data Processing Collected Data Processed Data Raw Data Result Presenting Data Mining Processed Data Presentable Forms of Data Mining Results Data Mining Results

  11. We have to define some “data mining” models to perform some “data mining” tasks We could call many existing libraries to complete these “data mining” tasks Data Mining Processed Data Data Mining Results

  12. Data Collection Data Processing Collected Data Processed Data Raw Data Result Presenting Data Mining Processed Data Presentable Forms of Data Mining Results Data Mining Results

  13. We have to present the data mining results in a “readable” form and a “presentable” form Some data mining results could be presented directly. Some other data mining results could be presented better by using some existing visualization libraries. Result Presenting Presentable Forms of Data Mining Results Data Mining Results

  14. COMP 5211: Advanced Artificial Intelligence COMP 5331: Knowledge Discovery in Database COMP 5212: Machine Learning COMP 5213: Introduction to Bayesian Networks COMP 5221: Natural Language Processing COMP4462: Data Visualization COMP4651: Cloud Computing and Big Data Systems • In this course, we will learn the following. • Data Collection • Data Crawling • SQL • NoSQL • Data Processing • Python Libraries • Data Mining • Data Mining Models • Keras (on TensorFlow) • Result Presenting • Matplotlib • Distributed Data Management • Data Bricks (using Spark)

  15. Python with Data Analytical Libraries • Data Scientist • Data Analyst • Machine Learning Expert(or Deep Learning Expert) • FinTech Analyst Benefits of Studying this Course

  16. It is not just related to “Computer Science”-related jobs. It is also related to credit risk units/departments in some famous companies in Hong Kong. Many people in the industry (including the big four accounting firms) also need this kind of skills now. There are many interviews asking this kind of questions.

  17. For RMBI students, • I attended an RMBI Alumni Re-Union last Friday (4 May). • Many RMBI alumni declaring an RM option regret not taking more BI-related courses during their undergraduate RMBI program. • This is because there are many hands-on experience which could be learnt “effectively” in the university with professor’s teaching. • They did not see the “importance” when they were studying in the undergraduate program. • Currently, they have to learn the skills by themselves with a hard time.

  18. Final Issues • 2 Assignments • Enough? With a lot of in-class exercises. • 1 Project • Difficult? A “simplified” version of a real-life problem.

  19. Final Issues • In-class Participation • Enough? Please continue!

  20. One final issue • Work hard for your exam! 

More Related