1 / 9

Data Science

Data Science. Dianna Xu Bryn Mawr College. The Course. 300-level Computer Science elective CS majors and minors Pre- reqs : CS1, CS2 (Data Structures), Discrete Math and Linear Algebra U nstructured data and explorative data analysis

aspen
Download Presentation

Data Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science Dianna Xu Bryn Mawr College

  2. The Course • 300-level Computer Science elective • CS majors and minors • Pre-reqs: CS1, CS2 (Data Structures), Discrete Math and Linear Algebra • Unstructured data and explorative data analysis • Iterative processes that require programming skills and and knowledge of algorithms

  3. 200-level Data Visualization Course • Taught Spring 2014 at Haverford • Stats basics, linear regression • Clustering • Baby network analysis – PageRank • Visualization with D3 • 50% overlap of students

  4. Topics • Statistical methods (2-3 weeks) • basics, regression analysis, Bayesian methods • Machine learning background (2-3 weeks) • multivariate regression and logistic regression • Dimensionality reduction (3 weeks) • PCA, SVD, Kernel PCA, other non-linear methods • Topological data analysis (2-3 weeks) • manifold learning, intro to TDA • Network analysis (2-3 weeks) • collaborative filtering, community detection

  5. Project-Oriented • Students will team up for a semester-long project on data analysis for local "data clients" • Faculty members who have "real life" data sets and research questions • Library, registrar and institutional research

  6. Data Sets • Digital Du Chemin • repertory of polyphonic songs from 16th-century France • Dark Reactions • chemical experiments with associated reactants and results • Maine athletes • Anil's bio data?

  7. Machine Learning Modules • Focused on the process of doing (good) machine learning i.e. • (step 1) Pose a problem in the language of machine learning • (step 2) Gather data • (step 3) Choose a potential method for solving the problem • (step 4) Setup an experiment to properly evaluate your method • (step 5) Evaluate experiment and possibly go to step 2 or 3 • Possible toolsets: • iPython notebook • Scikit-learn

  8. iPython Notebook Modules http://occam.olin.edu/sites/default/files/DataScienceMaterials/machine_learning_lecture_1/Machine%20Learning%20Lecture%201.html http://occam.olin.edu/sites/default/files/DataScienceMaterials/machine_learning_lecture_2/Machine%20Learning%20Lecture%202.html

  9. Open-source, python-based package for machine learning. Principal strength is a consistent API and enforcement of "good" machine learning practices

More Related