1 / 14

Relevant software and getting it installed.

Relevant software and getting it installed. Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 1b, January 24, 2014. Admin info (keep/ print this slide). Class: ITWS-4963/ITWS 6965 Hours: 12:00pm-1:50pm Tuesday/ Friday Location: SAGE 3101 Instructor: Peter Fox

moesha
Download Presentation

Relevant software and getting it installed.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relevant software and getting it installed. Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 1b, January 24, 2014

  2. Admin info (keep/ print this slide) • Class: ITWS-4963/ITWS 6965 • Hours: 12:00pm-1:50pm Tuesday/ Friday • Location: SAGE 3101 • Instructor: Peter Fox • Instructor contact: pfox@cs.rpi.edu, 518.276.4862 (do not leave a msg) • Contact hours: Monday** 3:00-4:00pm (or by email appt) • Contact location: Winslow 2120 (sometimes Lally 207A announced by email) • TA: Lakshmi Chenicheri chenil@rpi.edu • Web site: http://tw.rpi.edu/web/courses/DataAnalytics/2014 • Schedule, lectures, syllabus, reading, assignments, etc.

  3. Today • Install application software • Get some data and read, explore, etc. • Install data technology and related software

  4. Gnu R • R Studio – see R-intro.html in manualshttp://www.rstudio.com/ide/download/ • Manuals - http://cran.r-project.org/doc/manuals/ • Libraries – at the command line – library(), or select the packages tab, and check/ uncheck as needed • http://cran.r-project.org/doc/manuals/R-lang.html

  5. Scipy/numpy/ iPython (NB) • Windows/Linux • http://scipy.org/install.html • If you have a Mac • Anaconda – http://continuum.io/downloads (preferred) • Use Launcher to install Spyder (and iPQt) • Do you have macports installed? ‘$ which port’ • No? (sorry – ask me for details…) • Install Xcode (from http://developer.apple.com/download - you will need to register - academic) • http://www.macports.org/install.php • Also see individual packages on the install page.. • http://scipy.org/getting-started.html

  6. Matlab • http://dotcio.rpi.edu/services/software-labs • Student version • License works within RPI network, so may have to use VPN if outside • http://mathesaurus.sourceforge.net/octave-r.html R for Matlab users

  7. Files • http://escience.rpi.edu/data/DA • This is where the files for assignments, exercise will be placed

  8. Exercises – getting data in • Rstudio • read in csv file (two ways to do this) - GPW3_GRUMP_SummaryInformation_2010.csv • Read in excel file (directly or by csvconvert) - 2010EPI_data.xls (2010EPI_data tab) • See if you can plot some variables • Anything in common between them?

  9. Exercises • Scipy • In Spyder read in a matlab file: • import scipy.io as sio • mat_contents= sio.loadmat(‘Williams40.mat’) • mat_contents • Explore – plot, etc. • Read in a csv file (your choice) • Write out as matlab file, i.e. sio.savemat (see File I/O help http://docs.scipy.org/doc/scipy/reference/tutorial/io.html ) • http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html - start looking

  10. Exercises • Matlab • Read in two different datasets: • sw40_30s.mat or sw29adcp.mat • UChicago30.mat or Williams40.mat • Explore them… • Read in the csv files

  11. If time or for fun… • se_eqs.xls • Plot it • Fit it • PRESSURE.xls • Plot it • Smooth it • Fit it …

  12. Install-fest… continues • http://projects.apache.org/indexes/category.html#database • Hadoop(MapReduce) • Pig (http://wiki.apache.org/pig/RunPig ) • HIVE (http://hive.apache.org/releases.html ) • https://cwiki.apache.org/confluence/display/Hive/GettingStarted • https://cwiki.apache.org/confluence/display/Hive/Tutorial • https://cwiki.apache.org/confluence/display/Hive/LanguageManual • Cassandra (binaries from DataStax) • And MongoDB - http://www.mongodb.org/

  13. Objective • Get a good feel for the complexity and maturity of the data and tools environments • See some real data and start to consider what it will take to work with it • Big and complex - means time and memory and laptops only can do so much • We’ll soon look at the intersections like RHadoop: https://github.com/RevolutionAnalytics/RHadoop/wiki

  14. No more reading this week • Complete the installs as best you can • Pick your preferred application and data software and read up on them, try some examples

More Related