1 / 19

Role of Mashups , Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil

Role of Mashups , Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil. Outline. SW Silos. We continue to build silos.. Why? I’m only creating a prototype for my paper… I want to have control… I want to write my own code… I can do it faster…

jalene
Download Presentation

Role of Mashups , Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Role of Mashups, Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil

  2. Outline

  3. SW Silos We continue to build silos.. • Why? • I’m only creating a prototype for my paper… • I want to have control… • I want to write my own code… • I can do it faster… • I’m not funded to integrate with… • … Images from Google Search

  4. From Silos to Mashups • Definition: Mashup is a web page or application that uses and combines data, presentation or functionality from two or more sources to create new services • Why do we want this? • Enable out services in many applications and on a variety of devices (laptop, high-res display wall, ipad, iphone or the others) • Share and reuse is a good thing • Reach communities with our tools and their data!!! • What can we do to change this? • We can think and create data driven solutions so that they can be mashed up with other tools. • We can build web services that can be deployed or accessed. • We can create API’s to be used. • How can we do this?

  5. Mashup Framework Visualizations User Interfaces Apps Plugins Web Apps Services MeandreWorkbench Repositories Data Analysis Components Flows Meandre Data-Intensive Flows Components Developer Tools Data Analytics Visualization Component Repository Component Discovery Meandre Infrastructure Virtualization Infrastructure Computational Resources

  6. Ptolemy II Kepler Triana BPEL Scientific Workflows Trident VisTrails Meandre Taverna BPEL David De Roure slide (slightly modified)

  7. Meandre for Mashups • Major Capabilities • Dataflow execution • Semantic technology (using RDF for storing meta info) • Web-Oriented • Supports publishing services for data, analytics and visualization • Modular components • Encapsulation and execution mechanism • Promotes reuse, sharing, and collaboration • Cloud-friendly infrastructure • Note: (for Tom) Trading off some performance for reuse, flexibility and modular components… with option to parallelize components to improve performance

  8. Components • Analytics • Unsupervised Learning • Clustering • Frequent Pattern Analysis (Rule Association) • Supervised Learning • Naïve Bayesian • Support Vector Machines (Weka) • Decision Trees (c4.5) • Optimization Approaches • Genetic Algorithm • Text Analysis (POS, Entity Ext) • OpenNLP • Stanford NER • Visualization • Geographic (Google Maps) • Temporal (Simile) • Network Graphs – Link Nodes and Arcs (Protovis) • Parallel Coordinates (Protovis) • Stacked Area Chart (Flare) • Tag Cloud Maker • Decision Tree (Applet D2K) • Naïve Bayes (Applet D2K) • Rule Association (Applet) • Dendogram (GWT)

  9. Meandre Services from Firefox Plugin Tag Cloud Analysis Readability Analysis Network Analysis Date Entity to Simile Timeline Automatic Summarization Location Entity to Google Map Example: Zotero and SEASR

  10. An Ideological Metaphor & Definition • Cloud Metaphor • The term cloud is used as a metaphor forthe Internet, based on how it is depicted in computer network diagrams and is an abstraction for the complex infrastructure it conceals • Cloud Computing – Definition • The first academic use of this term appears to define it as a computing paradigm where the boundaries of computing will be determined by economic rationale rather than technical limits. • Cloud computing is a paradigm of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them http://en.wikipedia.org/wiki/Cloud_computing

  11. Cloud Computing • How can we leverage these computation environments? • Known issues • Cloud mechanics have a steep learning curve.. • Data movement to the cloud • Security • Next generation data-intensive applications will: • Use cloud computing technologies and conduits • Require adaptation of programming paradigms • Leverage a flexible and modular architecture • Promote processing and resources at scale • Distributed data flow designs to allow processing to be co-located with data sources and enable transparent scalability

  12. Meandre in the Clouds • Meandre • Data-intensive execution engine • Component-based programming architecture • Orchestrate cloud deployments • Leverage cloud conduits • NCSA Virtual Machines & Enterprise Cloud • VMWare, Xen, & Eucalyptus • ElasticFox & AMS Web Application

  13. Components for Amazon & Eucalyptus Components can be created to: • List images • Launch/terminate instances • Transfer Data or Programs to running instances • Trigger process computation • Monitor processes and/or persistent services

  14. Cloud Orchestration Data Flow

  15. Parallelism • Writing parallel code can be hard and debugging even harder… • But we need it because our data sets are growing… • And software tools can help • And hardware is also available • MapReduce model • a powerful abstraction (software framework) developed by Google to support distributed computing on large data sets on clusters of computers • Hadoop is an open source version • GPUs

  16. Meandre for Parallelism • Implemented a Script Language (ZigZag) • Implemented MapReduce in Meandre • Automatic Parallelization for stateless components • Adding the operator [+4] or [+4!] would result in a directed graph # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+4!] print( object:pt.string )

  17. Scaling Genetic Algorithms in Meandre Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.

  18. And With Hadoop 60 Dual Quad Core Xeons with 8GB RAM. GB Ethernet Resources exhaustion

  19. Summary

More Related