1 / 56

ICDE2009 Keynotes Summary

ICDE2009 Keynotes Summary. Shanghai, China, 3.29-4.2 Li Yukun. Outline. Keynotes Search Computing( Stefano Ceri ) Data Management in the Cloud( Raghu Ramakrishnan) Why Can't I Find My Data the Way I Find My Dinner? David Carlson. Keynote 1. Search Computing Stefano Ceri

alice-rivas
Download Presentation

ICDE2009 Keynotes Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICDE2009 Keynotes Summary Shanghai, China, 3.29-4.2 Li Yukun

  2. Outline • Keynotes • Search Computing(Stefano Ceri) • Data Management in the Cloud(Raghu Ramakrishnan) • Why Can't I Find My Data the Way I Find My Dinner? David Carlson

  3. Keynote 1 Search Computing Stefano Ceri Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza L. Da Vinci 32, 20133 Milano, Italy Stefano.Ceri@polimi.it

  4. Motivation • “Who are the strongest European competitors on software ideas? • Who is the best doctor to cure insomnia in a nearby hospital? • Where can I attend an interesting conference in my field close to a sunny beach?” This information is available on the Web, but no software system can accept such queries nor compute the answer.

  5. Core model for search computing • Conventional services • Are abstracted as systems producing sets of equal-weight answers; • Service computing • A cross-discipline that covers the science and technology of bridging the gap between Business Services and IT Services. • The goal of Services Computing is to enable IT services and computing technology to perform business services more efficiently and effectively. • Search services • Can be abstracted as systems producing ranked lists of answers. • Search computing • It is a new paradigm where ranking is the dominant factor for composing services. • Multi-domain query,constellation of cooperating search services, possibly dynamically selected,

  6. CHAPTERS OF SEARCH COMPUTING • Theory for search computing • Select the best abstractions covering the concepts • Design basic operations on services and algorithms • Compute time and space complexity • Statistical models for search services • Build statistical estimators of the number and quality of the results • Optimization methods for search computing • Description abstractions for search services • Expose ranking-specific properties of search services • Language abstractions for search computing • by incorporating the ranking aspects and strategies for dealing with rankings

  7. CHAPTERS OF SEARCH COMPUTING • Human-computer interfaces • Expressing ranking preferences. • Light-weight user interaction • Semantics • Merging the results of heterogeneous search services • semantic “join” of search services. • Higher-order ranking • “ranking of rankings”, is essential for selecting and prioritizing search services. • A multi-level one, • Managing individual and social searching • search strategies to user profiling or to past user interactions • Societal recommendation and evaluation • Thus, individual and societal aspects are key ingredients for search computing

  8. CHAPTERS OF SEARCH COMPUTING • Search computing engineering • designing, assembling and deploying search computing software applications. • Economy of search computing • Suitable business models, based upon advertising schemes, pay-per-query, subscription fees, micro-billing, and so on. • Security and privacy of search computing • control of how data is used. • For instance, use of a search service could be granted to a service computing application, provided that the service’s owners can trace all queries involving their data and limit the kind of information that is made visible to the queries.

  9. PROJECT ORGANIZATION • Funded by the European Research Council in the framework of the IDEAS Advanced Grants; • It started on Nov. 1, 2008 and will last five years.

  10. PROJECT ORGANIZATION • The project involves about 30 researchers at Politecnico • Abdan Abid, Edoardo Amaldi, Alessandro Bozzon, Daniele Maria Braga, Marco Brambilla, Tommaso Buganza, Alessandro Campi, Sofia Ceppi, Sara Comai, Emanuele Della Valle, Piero Fraternali, Nicola Gatti, Michael Grossniklaus, Ma’moun Abu Hellu, Pier Luca Lanzi, Davide Martinenghi, Marco Masseroli, Maristella Matera, Davide Mazza, Giuseppe Pozzi, Stefania Ronchi, Roberto Verganti, Marco Tagliasacchi, Massimo Tisi. • SeCo has an advisory board • Edoardo Amaldi (Operations Research), • Fabio Casati (Service Computing), • Georg Gottlob (Theory), • Ioana Manolescu (Systems and Performance), • Roberto Verganti (Business Models), • Gerhard Weikum (Information Retrieval for the Web), • Jennifer Widom (Languages and Paradigms)

  11. seven teams • Concept team • Theory and methods • Service registration and management • Query processing • Interaction design • Tools and prototypes • Business models and technology watch

  12. More information on SeCo is available on the project’s Web site: • http://home.dei.polimi.it/ceri/seco/index.html

  13. Outline • Keynotes • Search Computing Stefano Ceri • Data Management in the Cloud Raghu Ramakrishnan • Why Can't I Find My Data the Way I Find My Dinner? David Carlson

  14. Keynote 2: Data Management in the Cloud Yahoo! Research Raghu Ramakrishnan Brian Cooper Utkarsh Srivastava Adam Silberstein Nick Puz Rodrigo Fonseca CCDI Chuck Neerdaels P.P.S. Narayan Kevin Athey Toby Negrin Plus Dev/QA teams

  15. SCENARIOS Pie-in-the-sky

  16. Living in the Clouds We want to start a new website, FredsList.com Our site will provide listings of items for sale, jobs, etc. As time goes on, we’ll add more features illustrate how more cloud capabilities are used as needed List of capabilities/components is illustrative, not exhaustive

  17. Step 1: Listings FredsList wants to store listings as (key, category, description) FredsList.com application DECLARE DATASET Listings AS ( ID String PRIMARY KEY, Category String, Description Text ) 5523442, childcare, Nanny available in San Jose 1234323, transportation, For sale: one bicycle, barely used 215534, wanted, Looking for issue 1 of Superman comic book Simple Web Service API’s Database Sherpa

  18. Step 2: Search FredsList’s customers quickly ask for keyword search FredsList.com application ALTER Listings SET Description SEARCHABLE “dvd’s” “bicycle” “nanny” Simple Web Service API’s Database Search Sherpa Vespa Messaging YMB

  19. Step 3: Photos FredsList decides to add photos to listings FredsList.com application ALTER Listings ADD Photo BLOB Simple Web Service API’s Storage Database Search Foreign key photo → listing MObStor Sherpa Vespa Messaging YMB

  20. Step 4: Data Analysis FredsList wants to analyze its listings to get statistics about category, do geocoding, etc. FredsList.com application ALTER Listings MAKE ANALYZABLE Hadoop program to generate fancy pages for listings Hadoop program to geocode data Pig query to analyze categories Simple Web Service API’s Storage Compute Database Search Foreign key photo → listing MObStor Grid Sherpa Vespa Messaging YMB Batch export

  21. Step 5: Performance FredsList wants to reduce its data access latency FredsList.com application ALTER Listings MAKE CACHEABLE Simple Web Service API’s Storage Compute Database Caching Search Foreign key photo → listing MObStor Grid Sherpa memcached Vespa Messaging YMB Batch export

  22. EYES TO THE SKIES Motherhood-and-Apple-Pie

  23. Requirements for Cloud Services Multitenant A cloud service must support multiple, organizationally distant customers. Elasticity Tenants should be able to negotiate and receive resources/QoS on-demand. Resource Sharing Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient. Horizontal scaling It should be possible to add cloud capacity in small increments; this should be transparent to the tenants Metering A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service. Security A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud. Availability A cloud service should be highly available. Operability A cloud service should be easy to operate

  24. Types of Cloud Services Two kinds of cloud services: Horizontal Cloud Services Functionality enabling tenants to build applications or new services on top of the cloud Functional Cloud Services Functionality that is useful in and of itself to tenants. E.g., various SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping Yahoo! has been offering these for a long while (e.g., Mail for SMB, Groups, Flickr, BOSS, Ad exchanges)

  25. SHERPA To Help You Scale Your Mountains of Data

  26. The Sherpa Solution The next generation global-scale record store • Record-orientation: Routing, data storage optimized for low-latency record access • Scale out: Add machines to scale throughput (while keeping latency low) • Asynchrony: Pub-sub replication to far-flung datacenters to mask propagation delay • Consistency model: Reduce complexity of asynchrony for the application programmer • Cloud deployment model: Hosted, managed service to reduce app time-to-market and enable on demand scale and elasticity 26

  27. QUERY PROCESSING 27

  28. Accessing Data Record for key k Get key k Record for key k 1 2 3 4 Get key k SU SU SU 28

  29. Bulk Read {k1, k2, … kn} Get k1 Get k2 Get k3 Scatter/ gather server 1 2 SU SU SU 29

  30. Range Queries in YDOT Clustered, ordered retrieval of records Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Grapefruit…Pear? Grapefruit…Lime? Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Lime…Pear? Router Storage unit 1 Storage unit 2 Storage unit 3 Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Apple Avocado Banana Blueberry Strawberry Tomato Watermelon Lime Mango Orange Canteloupe Grape Kiwi Lemon

  31. Updates Write key k SU SU SU 6 5 2 4 1 8 7 3 Sequence # for key k Write key k Routers Message brokers Write key k Sequence # for key k SUCCESS Write key k 31

  32. ASYNCHRONOUS REPLICATION AND CONSISTENCY 32

  33. Asynchronous Replication 33

  34. Goal: make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key “Brian”? Consistency Model Record inserted Delete Update Update Update Update Update Update Update v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Time Generation 1 34

  35. Consistency Model Read Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 35

  36. Consistency Model Read up-to-date Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 36

  37. Consistency Model Read ≥ v.6 Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 37

  38. Consistency Model Write Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 38

  39. Consistency Model Write if = v.7 ERROR Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 39

  40. Index Maintenance How to have lots of interesting indexes, without killing performance? Solution: Asynchrony! Indexes updated asynchronously when base table updated Planned functionality

  41. SHERPAIN CONTEXT 42

  42. MObStor Yahoo!’s next-generation globally replicated, virtualized media object storage service Better provisioning, easy migration, replication, better BCP, and performance New features (Evergreen URLs, CDN integration, REST API, …) The object metadata problem is addressed using Sherpa, though MObStor is focused on blob storage. 43

  43. Storage & Delivery Stack

  44. The World Has Changed Web applications need Scalability! Geographic distribution High availability Reliable storage Web applications be unfit for Complicated queries Strong transactions

  45. Web Data Management • CRUD • Point lookups and short scans • Index organized table and random I/Os • $ per latency • Scan oriented workloads • Focus on sequential disk I/O • $ per cpu cycle Structured record storage (PNUTS) Large data analysis (Hadoop) • Object retrieval and streaming • Scalable file storage • $ per GB Blob storage (SAN/NAS)

  46. Application Design Space Get a few things Sherpa MObStor YMDB MySQL Oracle Filer BigTable Scan everything Hadoop Everest Files Records 47

  47. Further Reading Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008) Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008) Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni

  48. Outline • Keynotes • Search Computing(Stefano Ceri) • Data Management in the Cloud(Raghu Ramakrishnan) • Why Can't I Find My Data the Way I Find My Dinner? David Carlson

  49. Keynote 3 • Why Can’t I Find My Data the Way I Find My Dinner? • David Carlson • Director International Polar Year International Programme Office • Cambridge, UK • ipy.djc@gmail.com

  50. International Polar Year(IPY) • One can find almost every discipline represented in the IPY projects, and funding has come from geophysical, biological and social agencies and programs.

More Related