1 / 15

Cloud-Based Software Installation Workshop for Big Data Analysis

Join us for a workshop on March 10th, 2012, to learn about installing cloud-based software for big data analysis. The outline includes local installations, Python, word count code, R and R-Studio, Hadoop, cloud access, demos, and virtualization software. Detailed steps for Mac, Linux, and Windows installations are provided, along with information on cloud accounts and cloud-based software packages. Sessions on R and Hadoop, Cloudera Virtual Manager, and virtualization software will be conducted. Don't miss out on this informative workshop!

Download Presentation

Cloud-Based Software Installation Workshop for Big Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Software Installation Deck Big Data Workshop Saturday March 10th, 2012

  2. Outline • Local Installation • Python • Word Count Code and Files • R and R-Studio • Hadoop Local Installation • Cloud Access • Amazon Web Services Account • Cloud-Based Software Demos • R and R-Studio in the Cloud • Cloudera Virtual Manager • Virtualization Software • R and Hadoop: ‘rmr’

  3. Local

  4. Python Installation • Mac/Linux comes with Python (should be able to run). • Windows use the following website to download and install: • http://www.python.org/getit/windows

  5. Python Wikipedia Word Count Files The four files of different sizes were created by Vipin to test out the time to run each one locally.

  6. R and R-Studio Local Installation LOCAL INSTALLATION: R http://lib.stat.cmu.edu/R/CRAN/ R-Studiohttp://rstudio.org/

  7. Hadoop Installation Mac/Linux Please note that the local installation is for test and debug, and that ‘production’ jobs will be ran on the cloud. • Macbook – • Install ports package to get Hadoop (www.macports.org). sudo port install hadoop(DONE!) • Linux – • Use yum/apt-get package to get hadoop. sudo yum install hadoop (your mirror should have hadoop binaries)

  8. Hadoop Installation Windows Please note that the local installation is for test and debug, and that ‘production’ jobs will be ran on the cloud. • Microsoft is working with Hortonworks on contributing to the Apache Hadoop project for Windows.  Microsoft is working on a Community Technology Preview for Hadoop on Windows Azure (http://hadooponazure.com) and the release for on-premises installation is forthcoming.  Those interested in running Hadoop on their own Windows hardware can follow http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data-solution.aspx to sign up for the preview when it’s available. • TODAY, it is possible to install Hadoop on Windows, but those distributions require Cygwin, whereas the upcoming release will not.   There are some instructions for Windows (see for instance http://blog.sqltrainer.com/2012/01/installing-and-configuring-apache.html) that people can try.

  9. Cloud

  10. Cloud Account • http://aws.amazon.com/ • The first example will be through Amazon's Elastic Map/Reduce.  Similar in nature to: • http://www.youtube.com/watch?v=kNsS9aDf6uE

  11. Cloud-Based Software Packages (Demos) Cloud Numerics • http://blogs.msdn.com/b/cloudnumerics/archive/2012/02/07/cloud-numerics-example-analyzing-demographics-data-from-windows-azure-marketplace.aspx MortarData • http://mortardata.com/

  12. R and R-Studio Cloud Access (No VM) R-Studio in the Cloud: • http://www.r-bloggers.com/rstudio-in-the-cloud-for-dummies/ R or R-Studio in the Cloud: • http://toreopsahl.com/2011/10/17/securely-using-r-and-rstudio-on-amazons-ec2/

  13. Virtual Manager with Hadoop Please note that these are 64-bit versions, and that the Virtualization Software will require a laptop that supports virtualization. If you are unsure, one way this can be checked by looking at your BIOS and seeing if Virtualization is Enabled. Most chips support virtualization; however a handful of MFG installed BIOS do not enable virtualization. Cloudera Hadoop Package • https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM • There are 3 options that relate to different Virtualization Software one of which also need to be installed (next slide) • SSH Software (Windows) http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

  14. Virtual Manager with Hadoop Jeffrey will be walking through this process. • VMware Player: Jeffrey Uses This One in his Session http://downloads.vmware.com/d/info/desktop_end_user_computing/vmware_player/4_0 • KVM: http://www.linux-kvm.org/page/Main_Page • VirtualBox: Jim uses this one. • https://www.virtualbox.org/

  15. Session 6: R and Hadoop: rmr Jeffrey will be walking through this process. • https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr We realize the VM and R and Hadoop parts are very detailed, and that there may be questions on other workshop parts. Following the last session we will try to have a post-workshop help session.

More Related