1 / 5

User Patterns from SkyServer

User Patterns from SkyServer. 1/f power law of session times and request sizes No discrete classes of users!! Users are willing to learn SQL for advantage Quickly adopted to server-side MyDB environment 1600 active users of MyDB Start with small ad-hoc requests

blythe
Download Presentation

User Patterns from SkyServer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User Patterns from SkyServer • 1/f power law of session times and request sizes • No discrete classes of users!! • Users are willing to learn SQL for advantage • Quickly adopted to server-side MyDB environment • 1600 active users of MyDB • Start with small ad-hoc requests • Once template finalized, run on whole dataset • Share results with others Through the web services interface • Frequent use of crawlers • Often bucketize requests (checkpointing)

  2. MyDB/CasJobs Nolan Li, JHU • Server side user databases • Can perform joins with main databases • Users can create their own UDFs etc • Every user action is recorded • Can share tables, do simple graphics server side MyDB DR4 DR5 DR6 S4

  3. JHU Petascale Archive • We are building a distributed SQL Server cluster exceeding 1 Petabyte • Just becoming operational • 40x8-core servers with 22TB each, 6x16-core servers with 33TB each, connected with Infiniband • 10Gbit lambda uplink to StarTap • Funded by Moore Foundation, Microsoft and the Pan-STARRS project • Dedicated to eScience,will provide public access

  4. Amdahl’s Laws Gene Amdahl (1965) Laws for a balanced system • Parallelism: max speedup is S/(S+P) • One bit of IO/sec per instruction/sec (BW) • One byte of memory per one instruction/sec (MEM) • One IO per 50,000 instructions (IO) Modern multi-core systems move farther away from Amdahl’s Laws (Bell, Gray and Szalay 2006) For a Blue Gene the BW=0.013, MEM=0.471. For the JHU cluster BW=0.664, MEM=1.099

  5. Components • Data must be heavily partitioned • It must be simple to manage • Distributed SQL Server cluster • Management tools • Configuration tools • Workflow environment for loading/system jobs • Workflow environment for user requests • Provide crawler framework • Both SQL and procedural languages • User workspace environment (MyDB)

More Related