590 likes | 755 Views
Tutorial for Web Mining Project. Introduction. In mis510 project, your team is required to create a web business, with a complete web site and business functionalities for specific customers, using either Google App Engine or Amazon EC2 platform.
E N D
Introduction • In mis510 project, your team is required to create a web business, with a complete web site and business functionalities for specific customers, using either Google App Engine or Amazon EC2 platform. • Since Google App Engine and Amazon EC2 have distinct interfaces, service features and pricing policies, this tutorial gives instructions of how to use these platforms respectively. • Estimated costs for typical projects using different platforms are also evaluated.
Overview • A cloud platform for publishing web application . • Simple, web-based application management console. • Developers can focus on application logic, no need to worry about hardware ,system administration, scalability etc. • Support Java, Python and Go.
Guideline • 0. Preparation • 1. Create a Google Web Application Project • 2. Debug, Run and Deploy • 3. Interaction with User • 4. Use Cloud Database • 5. Pricing
0.Preparation • 0.1. Sign up a Google App Engine account: • https://appengine.google.com • 0.2. Download App Engine SDK • http://code.google.com/appengine/downloads.html • 0.3. For Java/Eclipse users, it is recommended to download Eclipse Plugins to build, debug and deploy your application. • http://code.google.com/eclipse/docs/download.html
1.1 File Structure of the Web Application src/ includes all source files for your application. Java source codes META-INF/ includes other configuration files war/ includes all the files that are deployed and actually used on the server. Images, data, HTML and JSP files are put directly under /war folder. WEB-INF/ includes used libraries, compiled classes and configuration files.
1.1 File Structure of the Web Application • In WEB-INF folder, there are two configuration files. • Appengine-web.xml looks like • Don’t forget to add your registered application ID between <appliction> tags. • Web.xml is SUPER IMPORTANT. It is mainly responsible for mapping URLs to your servlet classes and web pages. (Examples provided later.)
1.2 Create a New Project • Eclipse plugin will automatically create the structure for you. • New->Web Application Project • Type the project name, then choose the Google SDKs you want to use. Typically you only need ‘Use Google App Engine’ to be checked. • If you don’t use Eclipse, you can create file structure as described and use Apache Ant to build and deploy it.
2.1.Debug and Run • Eclipse plugin has already created a Hello World example for you. You can directly run your project and test if it works. • Right click on the project folder-> Debug As Web Application. • In Debug mode, Google App Engine will create a server on your local machine, and your project will run on that local server. • If it is running successfully, the console will display a line like: INFO: The server is running at http://localhost:8888/ • You can open a browser and paste the link above to test you project.
2.1.Debug and Run • When the server is running in debug mode, any changes to your project files should be automatically detected by Google App Engine, so you don’t have to rebuild the project (but still you need to refresh the browser to see the changes). • *Don’t over-trust this statement. When you always encounter the same error, it is very likely that just rebuilding the project will help you out. • An exception is web.xml. If you make changes to it, you must rebuild your project.
2.2.Deploy • When you are satisfied with your application, you can deploy it to the cloud environment Google provides so that users all over the world have access to it. • Simply click the ‘Deploy’icon, and enter email and password of your registered App Engine Account. • Then you can visit your application at • http://your application ID.appspot.com
3. Interaction with User Your Application User Input • Often, you want your application not only to present static information, but also to interact with users. • Your system needs to pass user inputs from web pages to your Java or Python program. • Here we provide a JSP/Java example of a movie related web mining application. This example returns movie’s plot based on the movie name given by users. Web Pages/ API Web Mining Component (Server Side Logic) Interface Output
3.1 Receive User Input • In form_input.jsp, add the following lines between the <body> </body> tags. • When the user visits form_input.jsp. It will show a field for input: • You want to pass the input to SampleServlet.java
3.1 Receive User Input • You need to configure web.xml to let the system know how to map the form submission URL to the appropriate Java class. The following example shows such a mapping: • http://your application ID.appspot.com/processinputSampleServlet.java
3.2 Process Use Input • Use req.getParameter() method to obtain the user input (movie name) and process it in SampleServlet.java. Ax external API is used to retrieve the movie’s plot from web.
3.2 Process Use Input • Here’s a snippet of the API use code. The complete sample code is given in ‘samplecode.rar’.
3.3 Return the Output to User • Now you can display the results to user by adding a line to the designated jsp page. In this example, we use the same jsp page as user input. Now the form_input.jsp should look like:
4. Use Cloud Database • Situations where using cloud database may help: • Remember user activities. • Store the results of web mining process to speed up next inquiry. • Upload a large file which is a component of your application. • …. • In next slides we show an example of using Google Datastore to save and retrieve users’ comments for movies.
4. Use Cloud Database • Updating the form_input.jsp to receive user comments:
4.1 Google Datastore • 4.1.1 Store Comments • Add this component to SampleServelet.java • (For complete sample, please refer to samplecode.rar)
4.1 Google Datastore • 4.1.1 Retrieve Comments • Add this component to for_input.jsp • (For complete sample, please refer to samplecode.rar)
4. Use Cloud Database • Advantages of Google Datastore: • Google provides data management capacity for you. • Very Flexible (schemaless) • Option to view & manage the data online • Login to Google App Engine:https://appengine.google.com/, choose your application-> Datastore Viewer • Disadvantages: • Limit of 1GB free data storage quota, compared to Amazon EC2(10GB). • Only for small data object(entity) in Datastore. • To store larger data, Google Blobstore can be used. • http://code.google.com/appengine/docs/java/blobstore/overview.html
5. Cost Since 11/07/2011, Google App Engine uses a new pricing policy and sets a resource usage quota for free application. Free Quota for Major Resources For resource usage exceeding this quota, Google charges at the price rates
5. Pricing For resource usage exceeding the quota, Google charges at the price rates below. Billing Rate for Major Resources
5. Pricing • Costs vary greatly depending on different resource usage. The following table lists a rough estimation of daily costs for typical apps:
5.Pricing • Suggestions for reducing cost. • Login to App Engine Console and set daily budget. • Reduce instance hours • Save the web mining results in Datastore or Blobstore. • Don’t advertise your app yet. • Debug on your local server most of the time (completely free!). Deploy the full version of your app only during last weeks of the mis 510. • Applying these suggestions will lead to less than $60 for total cost of the project. • This is the safest way to control your cost, but resource usage exceeding this budget will not be allowed (so your app throw errors.)
Amazon Elastic Compute Cloud (Amazon EC2) Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. simple web service interface complete control of your computing resources fast obtain and boot new server instances quickly scale capacity as your computing requirements change pay only for capacity that you actually use
Tutorial Guideline 1. Sign up EC2 2. Launch an Instance 3. Connect to Windows Instance 4. Connect to Unix/Linux Instance 5. Application Example 6. Pricing 7. Resources
1. Sign Up EC2 Sign up an Amazon EC2 Account: http://aws.amazon.com/ec2/ If you have an Amazon Shopping Account, just use this account.
2. Launch an Instance Sign in AWS Management Console (choose EC2): http://aws.amazon.com/console/
Create and Download a Key Pair A key pair is a security credential similar to a password, which you use to securely connect to your instance after it's running.
Choose an Amazon Machine Image (AMI) Amazon Linux Windows Server 2008 with SQL Server Red Hat/Ubuntu/Debian Linux Just like choosing a virtual machine You can choose 64-bit or 32-bit machines Prices for different machines are different
Configure Firewall (create a security group) Create rules to get access to instance For a windows server, we need HTTP port 80, MS SQL port 1433, Remote Desktop port 3389 and HTTP 8080 (for Tomcat). For Linux, we need SSH to login (to use PuTTY and WinSCP).
3. Connect to Windows Instance Go to the AWS Management Console and locate the instance on the Instances page. Right-click the instance and select Get Windows Password.
Get an elastic IP (static IP) Click “Elastic IP” in “Navigation” Click “Allocate New Address” Associate Address to your instance Elastic Address is desirable resource. You should release the address, if you don’t want to associate it to any instance. Otherwise, Amazon will punish you!
Manage and Control the Server Stop = Shutdown computer Reboot = Restart computer Terminate = sell your computer! You can monitor your instance in AWS management console
4. Connect to Unix/Linux Instance Install PuTTY on your windows machine Start PuTTYgen (e.g., from the Start menu, click All Programs > PuTTY > PuTTYgen). Click Load and browse to the location of the private key file that you want to convert (e.g., hello.pem) into hello.ppk. Save hello.ppk somewhere.
Use PuTTY to connect Open PuTTY Use Public DNS as hostname Use root (Red-Hat), bitnami (Ubuntu), ec2-user (Amazon Linux) as username Click SSH->Auth to load the.ppk file