1 / 20

ORBMeeting

ORBMeeting. July 11, 2001. Outline. SAM Overview and Station description Resource Management Station Cache Station Prioritized Fair Share Job Control File Storage Server Setup and administration Station server File storage server

Download Presentation

ORBMeeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ORBMeeting July 11, 2001

  2. Outline • SAM Overview and Station description • Resource Management • Station Cache • Station Prioritized Fair Share Job Control • File Storage Server • Setup and administration • Station server • File storage server • Current most active stations, viewing statistics, station specialization

  3. Overview of Sam Name Server Database Server Global Resource Manager(s) Log server Shared Globally Station 1 Servers Station 3 Servers Local Station n Servers Station 2 Servers Mass Storage System(s) Arrows indicate Control and data flow Shared Locally

  4. Components of a SAM Station /Consumers Producers/ Project Managers Temp Disk Cache Disk MSS or Other Station MSS or Other Station File Storage Server Station & Cache Manager File Storage Clients File Stager(s) Data flow Control eworkers

  5. Resource Management: Cache • Level: Station; Established for particular groups • Parameters include: • Size of cache • Size of cache allowed to be “locked” • Refresh algorithm: LRU,FIFO,MRU, etc.

  6. Resource Management: Jobs • Level: station; by group and access mode • Number of Concurrent Projects • Fair share algorithm • Each group assigned a number. The numbers are normalized so the sum is 1. • Determines each jobs queue assignment: sam_hi or sam_lo.

  7. Station Setup • Configurable on startup • Min-delivery (Kbytes) • Preferred locations • Honor optimizer order • File release timeout • Max project file usage • Default batch system • Configurable by command • Adding disks • Adding caches • configuring group allocations caches, max size, locks, refresh algo, max projects, admins

  8. Station Administration: Dump(1) lueking@d0mino:~ % sam dump station –groups *** BEGIN DUMP STATION central-analysis, id=21 running at d0mino 5 days 22 hours 24 minutes 20 seconds, admins: lueking Known batch systems: lsf Default batch system: lsf No Source location is preferred There are 1 authorized transfer groups Full delivery unit is enforced; external deliveries are unconstrained

  9. Station Administration: Dump (2) AUTHORIZED GROUPS: group algo: admins: cope lueking melanson terekhov veseli white , swap policy: LRU, fair share: 0, quotas (cur/max): projects = 5/50, disk: 72838247KB/100000000KB, locks:0B/30000000KB group cal: admins: lueking terekhov veseli white , swap policy: LRU, fair share: 0, quotas (cur/max): projects = 1/10, disk: 11856085KB/78125MB, locks:0B/78125MB group demo: admins: lueking terekhov veseli white , swap policy: LRU, fair share: 0.608163, quotas (cur/max): projects = 2/50, disk: 4867877KB/5000000KB, locks:0B/0KB group dzero: admins: lueking melanson terekhov veseli white , swap policy: LRU, fair share: 0.142857, quotas (cur/max): projects = 10/100, disk: 499860527KB/500000000KB, locks:0B/100000000KB group emid: admins: lueking terekhov veseli white , swap policy: LRU, fair share: 0, quotas (cur/max): projects = 0/10, disk: 6396015KB/10000000KB, locks:0B/10000000KB group test: admins: lueking terekhov veseli white , swap policy: LRU, fair share: 0.11512, quotas (cur/max): projects = 1/20, disk: 21381359KB/26000000KB, locks:237179KB/20000000KB group thumbnail: admins: lueking melanson schellma , swap policy: LRU, fair share: 0.13386, quotas (cur/max): projects = 0/5, disk: 20687259KB/50000000KB, locks:0B/0KB *** END OF STATION DUMP ***

  10. Resource Management: File Storage • Cache Routing table • Retry parameters • Auto-destination • File Family • File Family Width • Other storage parameters:library manager, storage group, cpio wrapper, permissions

  11. File Storage Server: Setup • Configurable on startup • Default-route • Route – if sending to through remote station • Route =enstore,central-analysis:d0mino.fnal.gov:/sam/cache21/nikhef • Retrial options •  --opter-retrial-count= ,--opter-retrial-interval= • --auth-retrial-count= ,--auth-timeout= •  --stager-retrial-count= ,--stager-retrial-interval= • --xfer-retrial-count= ,--xfer-retrial-interval= • --relay-retrial-count= ,--relay-retrial-interval= • --dbs-retrial-count= , --dbs-retrial-interval= • Configurable by command • Get_encp_priority.py – to change priority sent to enstore

  12. File Storage Server: Dump lueking@d0mino:~ % sam dump fss Next Generation FSS at station central-analysis running on d0mino.fnal.gov 1 days 17 hours 53 minutes 15 seconds No routing (all transfers are direct) Configuration for operation retrial (count, interval/timeout) DBS contact: 3, 1 hours Opter contact: 1, 1 hours Authorization receipt:1, 1 hours Stager contact: 1, 1 hours Transfer (retrials upon timeout and upon failure): 3, 6 hours Relay (multi-stage routing only): 3, 1 hours File Storage Server Dump: Stagers are known at nodes: d0mino.fnal.gov 932 requests submitted, 0 rejected, 931 complete File Store requests: reco_mcp06_p08.10.00_prague_pythia_qcd-incl-PtGt80.0_mb-poisson-2.5_179132943_2001:reqID 932) sam d0mino.fnal.gov:/sam/cache17/import/prague -> enstore:/pnfs/sam/m2/copy1/monte_carlo/phase6/mcc99/reco/all subm time 08 Jul 12:21:59 auth req time 08 Jul 12:21:59 auth time 08 Jul 12:21:59 stager contacted 08 Jul 12:21:59

  13. Autodestination: Map configuration destList = [ { # map entry number 0: 'pathPattern‘:'(/pnfs/sam/mammoth/copy1/monte_carlo/mcp03/)([^/]+)(/generated/)([^/]+)', 'destinationPath' : '/pnfs/sam/mammoth/copy1/monte_carlo/phase3/mcc99/gen/all', 'library' : 'sammam', 'file_family' : 'mc_phase3_gen', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, { # map entry number 1: 'pathPattern':'(/pnfs/sam/mammoth/copy1/monte_carlo/mcp05/)([^/]+)(/generated/)([^/]+)', 'destinationPath' : '/pnfs/sam/mammoth/copy1/monte_carlo/phase5/mcc99/gen/all', 'library' : 'sammam', 'file_family' : 'mc_phase3_gen', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, ( and so on) { # map entry number 29: 'pathPattern' : '(/pnfs/sam/mammoth/copy1/monte_carlo/phase5/mcc99/)([^/]+)(/digitized/)([^/]+)', 'destinationPath' : '/pnfs/sam/m2/copy1/monte_carlo/bphysmcp08/mcc99/sim/all', 'library' : 'samm2', 'file_family' : 'bphysmcp08', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, ] Currently 30 map entries in production

  14. Most Active Station List protofarm Hiedi's protofarm imperial-test Initial test station to get production data to Imperial College lancs Lancaster ccin2p3-analysis Lyon comp center for In2P3, France central-analysis Main D0 Analysis server d0_main_analysis Main D0 analysis server (d02ka) hoeve Nikhef Farm clued0 Roger Moore datalogger Station for d0online msu Station running at Michigan State University d0-demo-station d0 demo station central-compute d0lxcs cluster station prague-test-station first installation at prague lac-1 linux analysis cluster station d0nevis-station nevis labs/columbia d0small-01 small linux test station central-archive station to archive a second copy of all online data d0-test-station test station d02ka test station on d02ka pctestfarm to test the fbs/SAM stuff for the farms Rows 1

  15. Viewing Statistics • Queries • Plots • Enstore summaries

  16. SAM Stats (6/19 – 6/26) • Users and usage is picking up • Data sets created - 242 • Projects run - 607 • Files processed – 4586 • Files cached - 2675 • Files stored - 8291 • GB Stored –1.8 TB • reco_mcp06_p08.10.00_nikhef_pythia_ttbar-incl_mb-poisson-2.5_144170834_2001 was delivered – 57 times

  17. Cache stats Available from sam page, under plots and statistics

  18. Enstore stats Available under www-d0en.fnal.gov/enstore under “plots”

  19. Data added

  20. Decisions, decisions… • Station deployment and configuration issues • Station operational tuning • Disk assignment and Cache allocations • Fair share numbers • File family, File family widths • Tape storage resources • FSS priorities • Cache routing issues

More Related