560 likes | 677 Views
ICDE2009 Keynotes Summary. Shanghai, China, 3.29-4.2 Li Yukun. Outline. Keynotes Search Computing( Stefano Ceri ) Data Management in the Cloud( Raghu Ramakrishnan) Why Can't I Find My Data the Way I Find My Dinner? David Carlson. Keynote 1. Search Computing Stefano Ceri
E N D
ICDE2009 Keynotes Summary Shanghai, China, 3.29-4.2 Li Yukun
Outline • Keynotes • Search Computing(Stefano Ceri) • Data Management in the Cloud(Raghu Ramakrishnan) • Why Can't I Find My Data the Way I Find My Dinner? David Carlson
Keynote 1 Search Computing Stefano Ceri Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza L. Da Vinci 32, 20133 Milano, Italy Stefano.Ceri@polimi.it
Motivation • “Who are the strongest European competitors on software ideas? • Who is the best doctor to cure insomnia in a nearby hospital? • Where can I attend an interesting conference in my field close to a sunny beach?” This information is available on the Web, but no software system can accept such queries nor compute the answer.
Core model for search computing • Conventional services • Are abstracted as systems producing sets of equal-weight answers; • Service computing • A cross-discipline that covers the science and technology of bridging the gap between Business Services and IT Services. • The goal of Services Computing is to enable IT services and computing technology to perform business services more efficiently and effectively. • Search services • Can be abstracted as systems producing ranked lists of answers. • Search computing • It is a new paradigm where ranking is the dominant factor for composing services. • Multi-domain query,constellation of cooperating search services, possibly dynamically selected,
CHAPTERS OF SEARCH COMPUTING • Theory for search computing • Select the best abstractions covering the concepts • Design basic operations on services and algorithms • Compute time and space complexity • Statistical models for search services • Build statistical estimators of the number and quality of the results • Optimization methods for search computing • Description abstractions for search services • Expose ranking-specific properties of search services • Language abstractions for search computing • by incorporating the ranking aspects and strategies for dealing with rankings
CHAPTERS OF SEARCH COMPUTING • Human-computer interfaces • Expressing ranking preferences. • Light-weight user interaction • Semantics • Merging the results of heterogeneous search services • semantic “join” of search services. • Higher-order ranking • “ranking of rankings”, is essential for selecting and prioritizing search services. • A multi-level one, • Managing individual and social searching • search strategies to user profiling or to past user interactions • Societal recommendation and evaluation • Thus, individual and societal aspects are key ingredients for search computing
CHAPTERS OF SEARCH COMPUTING • Search computing engineering • designing, assembling and deploying search computing software applications. • Economy of search computing • Suitable business models, based upon advertising schemes, pay-per-query, subscription fees, micro-billing, and so on. • Security and privacy of search computing • control of how data is used. • For instance, use of a search service could be granted to a service computing application, provided that the service’s owners can trace all queries involving their data and limit the kind of information that is made visible to the queries.
PROJECT ORGANIZATION • Funded by the European Research Council in the framework of the IDEAS Advanced Grants; • It started on Nov. 1, 2008 and will last five years.
PROJECT ORGANIZATION • The project involves about 30 researchers at Politecnico • Abdan Abid, Edoardo Amaldi, Alessandro Bozzon, Daniele Maria Braga, Marco Brambilla, Tommaso Buganza, Alessandro Campi, Sofia Ceppi, Sara Comai, Emanuele Della Valle, Piero Fraternali, Nicola Gatti, Michael Grossniklaus, Ma’moun Abu Hellu, Pier Luca Lanzi, Davide Martinenghi, Marco Masseroli, Maristella Matera, Davide Mazza, Giuseppe Pozzi, Stefania Ronchi, Roberto Verganti, Marco Tagliasacchi, Massimo Tisi. • SeCo has an advisory board • Edoardo Amaldi (Operations Research), • Fabio Casati (Service Computing), • Georg Gottlob (Theory), • Ioana Manolescu (Systems and Performance), • Roberto Verganti (Business Models), • Gerhard Weikum (Information Retrieval for the Web), • Jennifer Widom (Languages and Paradigms)
seven teams • Concept team • Theory and methods • Service registration and management • Query processing • Interaction design • Tools and prototypes • Business models and technology watch
More information on SeCo is available on the project’s Web site: • http://home.dei.polimi.it/ceri/seco/index.html
Outline • Keynotes • Search Computing Stefano Ceri • Data Management in the Cloud Raghu Ramakrishnan • Why Can't I Find My Data the Way I Find My Dinner? David Carlson
Keynote 2: Data Management in the Cloud Yahoo! Research Raghu Ramakrishnan Brian Cooper Utkarsh Srivastava Adam Silberstein Nick Puz Rodrigo Fonseca CCDI Chuck Neerdaels P.P.S. Narayan Kevin Athey Toby Negrin Plus Dev/QA teams
SCENARIOS Pie-in-the-sky
Living in the Clouds We want to start a new website, FredsList.com Our site will provide listings of items for sale, jobs, etc. As time goes on, we’ll add more features illustrate how more cloud capabilities are used as needed List of capabilities/components is illustrative, not exhaustive
Step 1: Listings FredsList wants to store listings as (key, category, description) FredsList.com application DECLARE DATASET Listings AS ( ID String PRIMARY KEY, Category String, Description Text ) 5523442, childcare, Nanny available in San Jose 1234323, transportation, For sale: one bicycle, barely used 215534, wanted, Looking for issue 1 of Superman comic book Simple Web Service API’s Database Sherpa
Step 2: Search FredsList’s customers quickly ask for keyword search FredsList.com application ALTER Listings SET Description SEARCHABLE “dvd’s” “bicycle” “nanny” Simple Web Service API’s Database Search Sherpa Vespa Messaging YMB
Step 3: Photos FredsList decides to add photos to listings FredsList.com application ALTER Listings ADD Photo BLOB Simple Web Service API’s Storage Database Search Foreign key photo → listing MObStor Sherpa Vespa Messaging YMB
Step 4: Data Analysis FredsList wants to analyze its listings to get statistics about category, do geocoding, etc. FredsList.com application ALTER Listings MAKE ANALYZABLE Hadoop program to generate fancy pages for listings Hadoop program to geocode data Pig query to analyze categories Simple Web Service API’s Storage Compute Database Search Foreign key photo → listing MObStor Grid Sherpa Vespa Messaging YMB Batch export
Step 5: Performance FredsList wants to reduce its data access latency FredsList.com application ALTER Listings MAKE CACHEABLE Simple Web Service API’s Storage Compute Database Caching Search Foreign key photo → listing MObStor Grid Sherpa memcached Vespa Messaging YMB Batch export
EYES TO THE SKIES Motherhood-and-Apple-Pie
Requirements for Cloud Services Multitenant A cloud service must support multiple, organizationally distant customers. Elasticity Tenants should be able to negotiate and receive resources/QoS on-demand. Resource Sharing Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient. Horizontal scaling It should be possible to add cloud capacity in small increments; this should be transparent to the tenants Metering A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service. Security A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud. Availability A cloud service should be highly available. Operability A cloud service should be easy to operate
Types of Cloud Services Two kinds of cloud services: Horizontal Cloud Services Functionality enabling tenants to build applications or new services on top of the cloud Functional Cloud Services Functionality that is useful in and of itself to tenants. E.g., various SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping Yahoo! has been offering these for a long while (e.g., Mail for SMB, Groups, Flickr, BOSS, Ad exchanges)
SHERPA To Help You Scale Your Mountains of Data
The Sherpa Solution The next generation global-scale record store • Record-orientation: Routing, data storage optimized for low-latency record access • Scale out: Add machines to scale throughput (while keeping latency low) • Asynchrony: Pub-sub replication to far-flung datacenters to mask propagation delay • Consistency model: Reduce complexity of asynchrony for the application programmer • Cloud deployment model: Hosted, managed service to reduce app time-to-market and enable on demand scale and elasticity 26
Accessing Data Record for key k Get key k Record for key k 1 2 3 4 Get key k SU SU SU 28
Bulk Read {k1, k2, … kn} Get k1 Get k2 Get k3 Scatter/ gather server 1 2 SU SU SU 29
Range Queries in YDOT Clustered, ordered retrieval of records Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Grapefruit…Pear? Grapefruit…Lime? Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Lime…Pear? Router Storage unit 1 Storage unit 2 Storage unit 3 Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Apple Avocado Banana Blueberry Strawberry Tomato Watermelon Lime Mango Orange Canteloupe Grape Kiwi Lemon
Updates Write key k SU SU SU 6 5 2 4 1 8 7 3 Sequence # for key k Write key k Routers Message brokers Write key k Sequence # for key k SUCCESS Write key k 31
Goal: make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key “Brian”? Consistency Model Record inserted Delete Update Update Update Update Update Update Update v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Time Generation 1 34
Consistency Model Read Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 35
Consistency Model Read up-to-date Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 36
Consistency Model Read ≥ v.6 Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 37
Consistency Model Write Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 38
Consistency Model Write if = v.7 ERROR Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 39
Index Maintenance How to have lots of interesting indexes, without killing performance? Solution: Asynchrony! Indexes updated asynchronously when base table updated Planned functionality
MObStor Yahoo!’s next-generation globally replicated, virtualized media object storage service Better provisioning, easy migration, replication, better BCP, and performance New features (Evergreen URLs, CDN integration, REST API, …) The object metadata problem is addressed using Sherpa, though MObStor is focused on blob storage. 43
The World Has Changed Web applications need Scalability! Geographic distribution High availability Reliable storage Web applications be unfit for Complicated queries Strong transactions
Web Data Management • CRUD • Point lookups and short scans • Index organized table and random I/Os • $ per latency • Scan oriented workloads • Focus on sequential disk I/O • $ per cpu cycle Structured record storage (PNUTS) Large data analysis (Hadoop) • Object retrieval and streaming • Scalable file storage • $ per GB Blob storage (SAN/NAS)
Application Design Space Get a few things Sherpa MObStor YMDB MySQL Oracle Filer BigTable Scan everything Hadoop Everest Files Records 47
Further Reading Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008) Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008) Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni
Outline • Keynotes • Search Computing(Stefano Ceri) • Data Management in the Cloud(Raghu Ramakrishnan) • Why Can't I Find My Data the Way I Find My Dinner? David Carlson
Keynote 3 • Why Can’t I Find My Data the Way I Find My Dinner? • David Carlson • Director International Polar Year International Programme Office • Cambridge, UK • ipy.djc@gmail.com
International Polar Year(IPY) • One can find almost every discipline represented in the IPY projects, and funding has come from geophysical, biological and social agencies and programs.