330 likes | 444 Views
Experiences & directions of SHIFT/CORE at CERN October 13, 1994. Designed in 1990 • fast access to large amounts of data • good tape support • cheap & easy to expand • vendor independent First implementation in operation in Jan 1991. SHIFT. SHIFT Model. CPU Server. CPU Server.
E N D
Experiences & directionsof SHIFT/COREat CERNOctober 13, 1994
Designed in 1990 • fast access to large amounts of data • good tape support • cheap & easy to expand • vendor independent First implementation in operation in Jan 1991 SHIFT
SHIFT Model CPU Server CPU Server CPU Server CPU Server CPU Server CPU Server Disk Server Disk Server High Speed network Tape Server Tape Server Tape Server
• Centrally Operated RISC environment Grouped CSF, SHIFT, HOPE,PIAF, TAPES, Network, Racks ... CERNVM vs. CORE CORE
Decision to run down the CRAY X/MP Decision to run down the IBM 9000 Migrate batch by end of 94 Migrate interactive by end of 95 Decision to acquire a 64 node SP2 1993-1994 Strategy
1992 shift Crimson Crimson CHEOPS cassette tapes and robot ~250 GBytes SCSI disk SGI 340 SGI 340 SGI 340 SGI 340 Cray X-MP/48 FDDI Data Analysis Facility UltraNet cisco AGS+ IP router CERN site LAN SUN 4/630 SUN 4/330 SUN 4/330 ethernet 10 GBytes SCSI disk Simulation Facilities EXABYTE JUKEBOX 25 H-P 9000-720s STK 4280 STK 4280 STK 4280 CSF STK 4280 STK 4280 STK 4280 H-P Apollo DN10040 HOPE EXABYTE
CORE in 1994 Put les core picture
CPU server HP 9000-735/755 SGI Challenge SGI Power Series DEC 3000/400,500 IBM RS/6000-370 Disk servers SGI Power series SGI Crimson DEC 3000-600 IBM RS/6000 970 SUN Sparc 10 Tape servers IBM RS/6000-370 Sun 4/330 Monitors,consoles,YP,... Sun SPARCstations TOTAL CORE in 1994 # CPUs 43 44 4 11 10 4 2 3 1 1 8 3 8 143 # Systems 43 4 1 11 3 1 2 3 1 1 8 3 8 90
• Reliability Manpower/machine decreases Automation Monitoring "Standard" tools Centralize management CORE operationissues
CORE 1994 Here MTBI's per manuf
• CORE is the infrastructure operation, network, racks, cables, ... • Public services CSF,PIAF,SP2,CS/2 • Public staging pool • Tape servers CORE 1995
SHIFT 1995 here LEP Z0 + Req for 95 + NA 48/49 + LHC Plus data flows LEP LHC
DISKS 07/94 Vendor Model Formatted capacity Number Total size (MB) IBM OEM 0664CSH IBM OEM 0664M1H 1920 12 23040 IBM OEM 0664N1D 1920 228 437760 SGI 0664N1D 1920 1 1920 MICROP 2112 1001 1 1001 HP C2247 1001 10 10010 DEC DSP5350 3406 31 105586 DEC RZ26-VA 1075 10 10750 DEC RZ28-VA 2150 6 12900 DEC RZ74 3406 63 214578 SEAGATE ST12400 2048 52 106496 SEAGATE ST2383N 317 2 634 SEAGATE ST41200 990 44 43560 SEAGATE ST41600 1331 1 1331 SEAGATE ST41650 1351 162 218862 SEAGATE ST41651 1350 6 8100 SEAGATE ST42100 1812 12 21744 SEAGATE ST43400 2778 61 169458 SEAGATE ST4766N 669 1 669 Total 704 1392239
NTP robots 3494 & central data recording DLT robot Any "reasonable" media support TAPES 1995
UltraNet still the most used FDDI used more and more 1995 : HIPPi, FCS ... Spectrum Network
UltraNet BandwithOct 92 MB/S DAY No 293 294 295 296 297 298 299 300 301
Network Jacek Foils
UltraNet Software User application Software Socket Compatibility Library NFS Sockets UltraNet Driver UDP TCP Transport IP Network Data Link Driver Data Link EtherNet FDDI UltraNet Physical Link Hardware Assisted Protocol Engine
• Unix Tape subsystem multi-user, labels, multi-file, operation • Fast Remote File Access system • Remote Tape Copy System • Disk Pool manager • Tape Stager • Clustered NQS Batch System • Integration with standard I/O packages Fatmen, Zebra, EPIO • Network Operation • Monitoring SHIFT Software
• select a suitable tape server • initiates the tape-disk copy tpread -v I29127 -g SMCF -q 4,6 pathname tpread -v CUT322 ‘sfget -p opaldst filename‘ Remote Tape Copy System
High performance, reliability (improve on NFS) • C I/O compatibility Fortran subroutine interface • rfio daemon started by open on remote machine • optimized for specific networks • asynchronous operation (read ahead) • optional vector preseek RFIO
SW show what has changed since 1992 AFS
up to now : dpm + tpread lack of robustness faced to system errors concurrency not handled • staging space not handled Stager
Stager architecture stgcat CPU Server Tape Server Disk Server CPU Server Tape Server Disk Server CPU Server Tape Server
• process stage requests • manage space - By default purge oldest file when space needed - if (size < 1024) w=max(atime, mtime) else w=max(atime,mtime)-(86400*log(size/1024)) • a different algorithm may be provided Stager functions
• stagein • stageout/stageput • stagewrt • stageclr • stageqry Stager commands
• Staging of disk files • Transfer of files already staged on CERNVM • Access control lists • Prioritization of requests • Tape to Tape copy • Automatic file migration to robots Stager future extensions
SHIFT library in CERNLIB RFIO review for high speed protocols RFIO wide area RFIO "gateway" for SP/2 RFIO per platform tuning VMS final port RFIO Checkpoint/Restart Ports to Irix 6, AIX 4, HP/UX 10 SHIFT SW Futures