200 likes | 302 Views
ORBMeeting. July 11, 2001. Outline. SAM Overview and Station description Resource Management Station Cache Station Prioritized Fair Share Job Control File Storage Server Setup and administration Station server File storage server
E N D
ORBMeeting July 11, 2001
Outline • SAM Overview and Station description • Resource Management • Station Cache • Station Prioritized Fair Share Job Control • File Storage Server • Setup and administration • Station server • File storage server • Current most active stations, viewing statistics, station specialization
Overview of Sam Name Server Database Server Global Resource Manager(s) Log server Shared Globally Station 1 Servers Station 3 Servers Local Station n Servers Station 2 Servers Mass Storage System(s) Arrows indicate Control and data flow Shared Locally
Components of a SAM Station /Consumers Producers/ Project Managers Temp Disk Cache Disk MSS or Other Station MSS or Other Station File Storage Server Station & Cache Manager File Storage Clients File Stager(s) Data flow Control eworkers
Resource Management: Cache • Level: Station; Established for particular groups • Parameters include: • Size of cache • Size of cache allowed to be “locked” • Refresh algorithm: LRU,FIFO,MRU, etc.
Resource Management: Jobs • Level: station; by group and access mode • Number of Concurrent Projects • Fair share algorithm • Each group assigned a number. The numbers are normalized so the sum is 1. • Determines each jobs queue assignment: sam_hi or sam_lo.
Station Setup • Configurable on startup • Min-delivery (Kbytes) • Preferred locations • Honor optimizer order • File release timeout • Max project file usage • Default batch system • Configurable by command • Adding disks • Adding caches • configuring group allocations caches, max size, locks, refresh algo, max projects, admins
Station Administration: Dump(1) lueking@d0mino:~ % sam dump station –groups *** BEGIN DUMP STATION central-analysis, id=21 running at d0mino 5 days 22 hours 24 minutes 20 seconds, admins: lueking Known batch systems: lsf Default batch system: lsf No Source location is preferred There are 1 authorized transfer groups Full delivery unit is enforced; external deliveries are unconstrained
Station Administration: Dump (2) AUTHORIZED GROUPS: group algo: admins: cope lueking melanson terekhov veseli white , swap policy: LRU, fair share: 0, quotas (cur/max): projects = 5/50, disk: 72838247KB/100000000KB, locks:0B/30000000KB group cal: admins: lueking terekhov veseli white , swap policy: LRU, fair share: 0, quotas (cur/max): projects = 1/10, disk: 11856085KB/78125MB, locks:0B/78125MB group demo: admins: lueking terekhov veseli white , swap policy: LRU, fair share: 0.608163, quotas (cur/max): projects = 2/50, disk: 4867877KB/5000000KB, locks:0B/0KB group dzero: admins: lueking melanson terekhov veseli white , swap policy: LRU, fair share: 0.142857, quotas (cur/max): projects = 10/100, disk: 499860527KB/500000000KB, locks:0B/100000000KB group emid: admins: lueking terekhov veseli white , swap policy: LRU, fair share: 0, quotas (cur/max): projects = 0/10, disk: 6396015KB/10000000KB, locks:0B/10000000KB group test: admins: lueking terekhov veseli white , swap policy: LRU, fair share: 0.11512, quotas (cur/max): projects = 1/20, disk: 21381359KB/26000000KB, locks:237179KB/20000000KB group thumbnail: admins: lueking melanson schellma , swap policy: LRU, fair share: 0.13386, quotas (cur/max): projects = 0/5, disk: 20687259KB/50000000KB, locks:0B/0KB *** END OF STATION DUMP ***
Resource Management: File Storage • Cache Routing table • Retry parameters • Auto-destination • File Family • File Family Width • Other storage parameters:library manager, storage group, cpio wrapper, permissions
File Storage Server: Setup • Configurable on startup • Default-route • Route – if sending to through remote station • Route =enstore,central-analysis:d0mino.fnal.gov:/sam/cache21/nikhef • Retrial options • --opter-retrial-count= ,--opter-retrial-interval= • --auth-retrial-count= ,--auth-timeout= • --stager-retrial-count= ,--stager-retrial-interval= • --xfer-retrial-count= ,--xfer-retrial-interval= • --relay-retrial-count= ,--relay-retrial-interval= • --dbs-retrial-count= , --dbs-retrial-interval= • Configurable by command • Get_encp_priority.py – to change priority sent to enstore
File Storage Server: Dump lueking@d0mino:~ % sam dump fss Next Generation FSS at station central-analysis running on d0mino.fnal.gov 1 days 17 hours 53 minutes 15 seconds No routing (all transfers are direct) Configuration for operation retrial (count, interval/timeout) DBS contact: 3, 1 hours Opter contact: 1, 1 hours Authorization receipt:1, 1 hours Stager contact: 1, 1 hours Transfer (retrials upon timeout and upon failure): 3, 6 hours Relay (multi-stage routing only): 3, 1 hours File Storage Server Dump: Stagers are known at nodes: d0mino.fnal.gov 932 requests submitted, 0 rejected, 931 complete File Store requests: reco_mcp06_p08.10.00_prague_pythia_qcd-incl-PtGt80.0_mb-poisson-2.5_179132943_2001:reqID 932) sam d0mino.fnal.gov:/sam/cache17/import/prague -> enstore:/pnfs/sam/m2/copy1/monte_carlo/phase6/mcc99/reco/all subm time 08 Jul 12:21:59 auth req time 08 Jul 12:21:59 auth time 08 Jul 12:21:59 stager contacted 08 Jul 12:21:59
Autodestination: Map configuration destList = [ { # map entry number 0: 'pathPattern‘:'(/pnfs/sam/mammoth/copy1/monte_carlo/mcp03/)([^/]+)(/generated/)([^/]+)', 'destinationPath' : '/pnfs/sam/mammoth/copy1/monte_carlo/phase3/mcc99/gen/all', 'library' : 'sammam', 'file_family' : 'mc_phase3_gen', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, { # map entry number 1: 'pathPattern':'(/pnfs/sam/mammoth/copy1/monte_carlo/mcp05/)([^/]+)(/generated/)([^/]+)', 'destinationPath' : '/pnfs/sam/mammoth/copy1/monte_carlo/phase5/mcc99/gen/all', 'library' : 'sammam', 'file_family' : 'mc_phase3_gen', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, ( and so on) { # map entry number 29: 'pathPattern' : '(/pnfs/sam/mammoth/copy1/monte_carlo/phase5/mcc99/)([^/]+)(/digitized/)([^/]+)', 'destinationPath' : '/pnfs/sam/m2/copy1/monte_carlo/bphysmcp08/mcc99/sim/all', 'library' : 'samm2', 'file_family' : 'bphysmcp08', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, ] Currently 30 map entries in production
Most Active Station List protofarm Hiedi's protofarm imperial-test Initial test station to get production data to Imperial College lancs Lancaster ccin2p3-analysis Lyon comp center for In2P3, France central-analysis Main D0 Analysis server d0_main_analysis Main D0 analysis server (d02ka) hoeve Nikhef Farm clued0 Roger Moore datalogger Station for d0online msu Station running at Michigan State University d0-demo-station d0 demo station central-compute d0lxcs cluster station prague-test-station first installation at prague lac-1 linux analysis cluster station d0nevis-station nevis labs/columbia d0small-01 small linux test station central-archive station to archive a second copy of all online data d0-test-station test station d02ka test station on d02ka pctestfarm to test the fbs/SAM stuff for the farms Rows 1
Viewing Statistics • Queries • Plots • Enstore summaries
SAM Stats (6/19 – 6/26) • Users and usage is picking up • Data sets created - 242 • Projects run - 607 • Files processed – 4586 • Files cached - 2675 • Files stored - 8291 • GB Stored –1.8 TB • reco_mcp06_p08.10.00_nikhef_pythia_ttbar-incl_mb-poisson-2.5_144170834_2001 was delivered – 57 times
Cache stats Available from sam page, under plots and statistics
Enstore stats Available under www-d0en.fnal.gov/enstore under “plots”
Decisions, decisions… • Station deployment and configuration issues • Station operational tuning • Disk assignment and Cache allocations • Fair share numbers • File family, File family widths • Tape storage resources • FSS priorities • Cache routing issues