1 / 29

High Productivity Computing

High Productivity Computing. Large-scale Knowledge Discovery: Co-evolving Algorithms and Mechanisms Steve Reinhardt Principal Architect Microsoft steve.reinhardt@microsoft.com. Prof. John Gilbert, UCSB Dr. Viral Shah, UCSB Dr. Aydin Buluc , LBL (formerly UCSB).

ann
Download Presentation

High Productivity Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Productivity Computing Large-scale Knowledge Discovery: Co-evolving Algorithms and Mechanisms Steve Reinhardt Principal Architect Microsoftsteve.reinhardt@microsoft.com Prof. John Gilbert, UCSB Dr. Viral Shah, UCSB Dr. AydinBuluc, LBL (formerly UCSB)

  2. Context for Knowledge Discovery From Debbie Gracio and Ian Gorton, PNNL Data Intensive Computing Initiative

  3. Knowledge Discovery (KD) Definition • Data-intensive computing: when the acquisition and movement of input data is a primary limitation on feasibility or performance • Simple data mining: searching for exceptional values on elemental measures (e.g., heat, #transactions) • Knowledge discovery: searching for exceptional values on associative/social measures (e.g., most between, belonging to greatest number of valuable reactions)

  4. Today’s Biggest Obstacle in the KD Field • Lack of fast feedback between domain experts and infrastructure/tool developers about good usable scalable KD software platforms • Need to accelerate the rate of learning about both good KD algorithms and good KD infrastructure • Domain experts want: • Good infrastructure that works • … and scales very well and runs fast • Appropriate abstractions • Flexibility to develop/tweak algorithms to suit their needs • Algorithms with strong math basis • But don’t know • The best approach or algorithms • Infrastructure developers want: • Clear audience and requirements for what they develop • Architecture that copes with client, cluster, cloud, GPU, and huge data • But don’t know • The best approach Need to get good (not perfect) scalable platforms in use to co-evolve approaches and algorithms

  5. Candidate Approaches

  6. KDT Layers: Enable overloading with various technologies … Community Detection Elementary Mode Analysis kdt. Betweenness Centrality BarycentricClustering … BetweennessCentrality(Cray XMT) Parallel/distributed operations (constructors, SpGEMM, SpMV, SpAdd, SpGEMM semi-rings, I/O) Parallel/distributed operations (in-memory (Star-P) or out-of-memory (DryadLINQ-based)) Localconstructors LocalSpGEMM LocalSpRef/ SpAsgn LocalSpGEMV LocalSpAdd LocalSpGEMMon semi-rings LocalI/O LocalSpGEMV(GPU) LocalSpGEMM(GPU) scipy.

  7. DryadLINQ: Query + Plan + Parallel Execution • Dryad • Distributed-memory coarse-grain run-time • Generalized MapReduce • Using computational vertices and communication channels to form a dataflow execution graph • LINQ (Language INtegrated Query) • A query-style language interface to Dryad • Typical relational operators (e.g., Select, Join, GroupBy) • Scaling for histogram example • Input data 10.2TB, using 1,800 cluster nodes, 43,171 execution-graph verticesspawning 11,072 processes, creating 33GB output data in 11.5 minutes of execution data plane Files, TCP, FIFO, Network sched V V V NS PD PD PD control plane Job manager cluster

  8. MATLAB Star-P Bridges Scientists to HPCs Star-P enables domain experts to use parallel, big-memory systems via productivity languages (e.g., the M language of MATLAB) Knowledge discovery scaling with Star-P • Kernels to 55B edges between 5B vertices, on 128 cores (consuming 4TB memory) • Compact applications to 1B edges on 256 cores

  9. Example: Kernel 4F from SSCA#2 • Layered on NumPy/SciPy def kernel4f(G): # Graph Clustering import scipy as sc import kdt as kdt # or kdt.disk, kdt.gpu, or kdt.xmt # Find a Maximal Independent Set in G (IS, misrounds) = kdt.mis(G); # Find neighbors of each node from the IS neighFromIS= kdt.neighbors(IS,1); # Pick one of the neighboring IS nodes as a leader [ign, leader] = sc.max(neighFromIS, (), 2); # Collect votes from neighbors [I, J] = kdt.edges(G); n = size(G); S = kdt.graph(I, leader[J], 1, n, n); # Pick the most popular leader among neighbors and join that cluster (ign, leader) = sc.max(S, [], 2); return leader

  10. Next Steps • Get prototypes† available quickly • in-memory and out-of-memory targets of KDT • with graph layer • likely exposed via Python library interface • Work with early customers/researchers • Iterate † == “not a product”

  11. © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista, Windows 7, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it shouldnot be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

  12. Boundary between old and new

  13. Agenda • Uses of knowledge discovery • Distinction from data mining • Infrastructure • Algorithms • Practicalities

  14. Homeland security • Problem: Need to detect concerted (concealed) actions • Data is huge, from many media types • Many types of possibleattacks • Detection sometimes needed on tacticaltime-scales Coffman, Greenblatt, and Marcus, “Graph-based Technologies for Intelligence Analysis”, CACM

  15. Cheminformatics:Identifying molecular precursors of negative trial outcomes • Problem: Modest numbers (O(1K)) of patients in clinical trials are insufficient to detect rare negative effects • Possible solution: Identify the molecular precursors at sub-injurious concentrations • Typically multiple precursors needed to identify negative effect • Causal network models generate clinically testable hypotheses for avoidance The Combined “Systems Profile” for EGF Inhibition Elliston et al., “Systems Pharmacology: An Application of Systems Biology”

  16. Manufacturing: Aircraft engine failure detection • Problem: Actual or borderline failures are expensive to fix and potentially disruptive of operations • Recognizing failure signatures before they become real is much better (repair cost, operational disruption) • Two uses: • Initial detection of signatures • Operational detection Courtesy of Rolls Royce PLC

  17. Telecommunications: Wireless traffic categorization • Nonnegative matrix factorization identifies essential components of traffic • Analyst labels different types of external behavior Karpinski, Gilbert, and Belding, “Non-parametric discrete mixture model recovery via non-negative matrix factorization”

  18. Knowledge Discovery Workflow 1. Cull relevant data 2. Build input graph 3. Analyze input graph 4. Visualize result graph memory - Gene - Email - Twitter - Video - Sensor - Web

  19. Agenda • Uses of knowledge discovery • Infrastructure • Algorithms • Practicalities

  20. Infrastructure: Microsoft Approach • Economic history is on the side of mass-consumption tool-builders, not artisans • Ancient scribes -> quill pens -> mass-produced pencils (Ticonderoga Dixon) • Early expert-only automobiles -> automobiles usable by anyone (Ford) • Refrigerator-sized motion-picture cameras -> hand-helds (Sony, …) • HPC’s impact on society is low because hard to use • Disruptive change: parallelism everywhere (intra-chip, intra-node, inter-node, cloud) and the need for applications to respond • Microsoft investing to enable developers and domain experts move to this parallel world

  21. A Cross-section of Today’s Tools Tools / Programming Models / Runtimes Tools Managed Languages Visual F# Axum Visual Studio 2010 Parallel Debugger Windows Native Libraries Managed Libraries DryadLINQ Async AgentsLibrary Parallel Pattern Library • Profiler Concurrency • Analysis Parallel LINQ Rx Task ParallelLibrary Data Structures Data Structures Microsoft Research Native Concurrency Runtime Task Scheduler Race Detection Managed Concurrency Runtime Resource Manager ThreadPool Fuzzing Operating System HPC Server Threads UMS Threads Windows 7 / Server 2008 R2 Research / Incubation Visual Studio 2010 / .NET 4 Key:

  22. Cluster-Aware Microsoft TC Technologies Memory-centric Data-parallel Loop-parallel Disk-centric Star-P Product Domain specialists Research/Incubation DryadLINQ HPC SOA Dryad Professional developers MPI • Job scheduling • Diagnostics • System monitoring Windows HPC Server node node node node node node node node

  23. Microsoft TC Tools for Knowledge Discovery Workflow 1. Cull relevant data 2. Build input graph 3. Analyze input graph 4. Visualize result graph DryadLINQ Star-P / KDT memory - Gene data - Email - Twitter - Video - Web data - …

  24. KDT: “Knowledge Discovery Toolbox” graph sparse matrix GPU/accelerator Input filesDryad streams OPeNDAP Hadoop Graph primitives (connected components, maximal independent sets, ...) Clustering Betweenness centrality Barycentric K-means Classification Support vector machines Markov models Bayesian Visualization IN-SPIRE, ... Dimensionality reduction / factorization(eigenvalues/vectors , singular values, nonnegative matrix factorization, …) Graph abstractionsand patterns(vertices, visitors, breadth-first search) Optimization Parallel I/O(HDF5, POSIX fopen/fread/…) Utility(sort, indexing) Linear algebra(sparse mat*vec, mat*mat, ...) Solvers(MUMPS, SuperLU, ...) Data structures(sparse matrices, ...) Parallel constructs (data- and loop-parallel) core Star-P

  25. Agenda • Uses of knowledge discovery • Infrastructure • Algorithms • Practicalities

  26. Algorithms • Must be usable by non-graph-expert on very large data • E.g., automatically detecting convergence • Must have practical computational complexity • O( |V|2 ) or O( |E|2 ) not practical • Core Star-P sparse matrix algorithms areO( |E|+|V| ), not O( |V|3) or O( |E|1.5 ) • State of the art (e.g., for clustering …) • Some agreement on best current algorithms • Girvan-Newman community detection: repeated recalculation of betweenness centrality too expensive • Non-negative matrix factorization: can give good results, but difficult to calibrate • Broad agreement current algorithms not good enough • E.g., only work for given number of clusters, don’t support multi-membership • Intense work on new algorithms • Better algorithms will arise from domain specialists working with at-scale data interactively

  27. Agenda • Uses of knowledge discovery • Infrastructure • Algorithms • Practicalities

  28. Practicalities • Voltaire: “The perfect is the enemy of the good.” • Today’s best algorithms are valuable in themselves, so worth propagating to a wider audience • And need to foster rapid development of new better algorithms •  Provide robust scalable infrastructure (Star-P) for algorithm development •  Seed development with open-source library of best current algorithms

  29. Summary • Knowledge discovery is a high-value technique relevant to many disciplines • Data sizes require cluster technologies for both Cull (disk) and Analyze (memory) steps • Rapid algorithm development is essential • DryadLINQ (disk) and Star-P (memory) robustly implement key infrastructure at scale • <<Watch this space>>

More Related