190 likes | 392 Views
Role of Mashups , Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil. Outline. SW Silos. We continue to build silos.. Why? I’m only creating a prototype for my paper… I want to have control… I want to write my own code… I can do it faster…
E N D
Role of Mashups, Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil
SW Silos We continue to build silos.. • Why? • I’m only creating a prototype for my paper… • I want to have control… • I want to write my own code… • I can do it faster… • I’m not funded to integrate with… • … Images from Google Search
From Silos to Mashups • Definition: Mashup is a web page or application that uses and combines data, presentation or functionality from two or more sources to create new services • Why do we want this? • Enable out services in many applications and on a variety of devices (laptop, high-res display wall, ipad, iphone or the others) • Share and reuse is a good thing • Reach communities with our tools and their data!!! • What can we do to change this? • We can think and create data driven solutions so that they can be mashed up with other tools. • We can build web services that can be deployed or accessed. • We can create API’s to be used. • How can we do this?
Mashup Framework Visualizations User Interfaces Apps Plugins Web Apps Services MeandreWorkbench Repositories Data Analysis Components Flows Meandre Data-Intensive Flows Components Developer Tools Data Analytics Visualization Component Repository Component Discovery Meandre Infrastructure Virtualization Infrastructure Computational Resources
Ptolemy II Kepler Triana BPEL Scientific Workflows Trident VisTrails Meandre Taverna BPEL David De Roure slide (slightly modified)
Meandre for Mashups • Major Capabilities • Dataflow execution • Semantic technology (using RDF for storing meta info) • Web-Oriented • Supports publishing services for data, analytics and visualization • Modular components • Encapsulation and execution mechanism • Promotes reuse, sharing, and collaboration • Cloud-friendly infrastructure • Note: (for Tom) Trading off some performance for reuse, flexibility and modular components… with option to parallelize components to improve performance
Components • Analytics • Unsupervised Learning • Clustering • Frequent Pattern Analysis (Rule Association) • Supervised Learning • Naïve Bayesian • Support Vector Machines (Weka) • Decision Trees (c4.5) • Optimization Approaches • Genetic Algorithm • Text Analysis (POS, Entity Ext) • OpenNLP • Stanford NER • Visualization • Geographic (Google Maps) • Temporal (Simile) • Network Graphs – Link Nodes and Arcs (Protovis) • Parallel Coordinates (Protovis) • Stacked Area Chart (Flare) • Tag Cloud Maker • Decision Tree (Applet D2K) • Naïve Bayes (Applet D2K) • Rule Association (Applet) • Dendogram (GWT)
Meandre Services from Firefox Plugin Tag Cloud Analysis Readability Analysis Network Analysis Date Entity to Simile Timeline Automatic Summarization Location Entity to Google Map Example: Zotero and SEASR
An Ideological Metaphor & Definition • Cloud Metaphor • The term cloud is used as a metaphor forthe Internet, based on how it is depicted in computer network diagrams and is an abstraction for the complex infrastructure it conceals • Cloud Computing – Definition • The first academic use of this term appears to define it as a computing paradigm where the boundaries of computing will be determined by economic rationale rather than technical limits. • Cloud computing is a paradigm of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them http://en.wikipedia.org/wiki/Cloud_computing
Cloud Computing • How can we leverage these computation environments? • Known issues • Cloud mechanics have a steep learning curve.. • Data movement to the cloud • Security • Next generation data-intensive applications will: • Use cloud computing technologies and conduits • Require adaptation of programming paradigms • Leverage a flexible and modular architecture • Promote processing and resources at scale • Distributed data flow designs to allow processing to be co-located with data sources and enable transparent scalability
Meandre in the Clouds • Meandre • Data-intensive execution engine • Component-based programming architecture • Orchestrate cloud deployments • Leverage cloud conduits • NCSA Virtual Machines & Enterprise Cloud • VMWare, Xen, & Eucalyptus • ElasticFox & AMS Web Application
Components for Amazon & Eucalyptus Components can be created to: • List images • Launch/terminate instances • Transfer Data or Programs to running instances • Trigger process computation • Monitor processes and/or persistent services
Parallelism • Writing parallel code can be hard and debugging even harder… • But we need it because our data sets are growing… • And software tools can help • And hardware is also available • MapReduce model • a powerful abstraction (software framework) developed by Google to support distributed computing on large data sets on clusters of computers • Hadoop is an open source version • GPUs
Meandre for Parallelism • Implemented a Script Language (ZigZag) • Implemented MapReduce in Meandre • Automatic Parallelization for stateless components • Adding the operator [+4] or [+4!] would result in a directed graph # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+4!] print( object:pt.string )
Scaling Genetic Algorithms in Meandre Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.
And With Hadoop 60 Dual Quad Core Xeons with 8GB RAM. GB Ethernet Resources exhaustion