150 likes | 167 Views
Software Installation Deck. Big Data Workshop Saturday March 10 th , 2012. Outline. Local Installation Python Word Count Code and Files R and R-Studio Hadoop Local Installation Cloud Access Amazon Web Services Account Cloud-Based Software Demos R and R-Studio in the Cloud
E N D
Software Installation Deck Big Data Workshop Saturday March 10th, 2012
Outline • Local Installation • Python • Word Count Code and Files • R and R-Studio • Hadoop Local Installation • Cloud Access • Amazon Web Services Account • Cloud-Based Software Demos • R and R-Studio in the Cloud • Cloudera Virtual Manager • Virtualization Software • R and Hadoop: ‘rmr’
Python Installation • Mac/Linux comes with Python (should be able to run). • Windows use the following website to download and install: • http://www.python.org/getit/windows
Python Wikipedia Word Count Files The four files of different sizes were created by Vipin to test out the time to run each one locally.
R and R-Studio Local Installation LOCAL INSTALLATION: R http://lib.stat.cmu.edu/R/CRAN/ R-Studiohttp://rstudio.org/
Hadoop Installation Mac/Linux Please note that the local installation is for test and debug, and that ‘production’ jobs will be ran on the cloud. • Macbook – • Install ports package to get Hadoop (www.macports.org). sudo port install hadoop(DONE!) • Linux – • Use yum/apt-get package to get hadoop. sudo yum install hadoop (your mirror should have hadoop binaries)
Hadoop Installation Windows Please note that the local installation is for test and debug, and that ‘production’ jobs will be ran on the cloud. • Microsoft is working with Hortonworks on contributing to the Apache Hadoop project for Windows. Microsoft is working on a Community Technology Preview for Hadoop on Windows Azure (http://hadooponazure.com) and the release for on-premises installation is forthcoming. Those interested in running Hadoop on their own Windows hardware can follow http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data-solution.aspx to sign up for the preview when it’s available. • TODAY, it is possible to install Hadoop on Windows, but those distributions require Cygwin, whereas the upcoming release will not. There are some instructions for Windows (see for instance http://blog.sqltrainer.com/2012/01/installing-and-configuring-apache.html) that people can try.
Cloud Account • http://aws.amazon.com/ • The first example will be through Amazon's Elastic Map/Reduce. Similar in nature to: • http://www.youtube.com/watch?v=kNsS9aDf6uE
Cloud-Based Software Packages (Demos) Cloud Numerics • http://blogs.msdn.com/b/cloudnumerics/archive/2012/02/07/cloud-numerics-example-analyzing-demographics-data-from-windows-azure-marketplace.aspx MortarData • http://mortardata.com/
R and R-Studio Cloud Access (No VM) R-Studio in the Cloud: • http://www.r-bloggers.com/rstudio-in-the-cloud-for-dummies/ R or R-Studio in the Cloud: • http://toreopsahl.com/2011/10/17/securely-using-r-and-rstudio-on-amazons-ec2/
Virtual Manager with Hadoop Please note that these are 64-bit versions, and that the Virtualization Software will require a laptop that supports virtualization. If you are unsure, one way this can be checked by looking at your BIOS and seeing if Virtualization is Enabled. Most chips support virtualization; however a handful of MFG installed BIOS do not enable virtualization. Cloudera Hadoop Package • https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM • There are 3 options that relate to different Virtualization Software one of which also need to be installed (next slide) • SSH Software (Windows) http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
Virtual Manager with Hadoop Jeffrey will be walking through this process. • VMware Player: Jeffrey Uses This One in his Session http://downloads.vmware.com/d/info/desktop_end_user_computing/vmware_player/4_0 • KVM: http://www.linux-kvm.org/page/Main_Page • VirtualBox: Jim uses this one. • https://www.virtualbox.org/
Session 6: R and Hadoop: rmr Jeffrey will be walking through this process. • https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr We realize the VM and R and Hadoop parts are very detailed, and that there may be questions on other workshop parts. Following the last session we will try to have a post-workshop help session.