200 likes | 348 Views
Report from HEPiX 2012: Network, Security and Storage. david.gutierrez@cern.ch Geneva, November 16th. Network traffic analysis Updates on DC Networks IPv6 Ciber - security updates Federated Identity Management for HEP. Network and Security. Network Traffic Analysis ( i ). At IHEP
E N D
Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th
Network traffic analysis Updates on DC Networks IPv6 Ciber-security updates Federated Identity Management for HEP Network and Security
Network Traffic Analysis (i) • At IHEP Developed a solution for large network flow data analysis that can benefit from the parallel processing power provided by Hadoop. • Probes analyze network traffic at the border routers for HEP experiments and produce flow information: who talks to who, what protocols, how long, bytes exchanged… Around 20GB netflow data per day. • Data processing, analysis and storage based on Hadoop. Data representation based on RRD and Highstock 2
Network Traffic Analysis (ii) • At CNRS/IN2P3 Developed ZNeTS to record all network flow information in a DB and analyze it to detect and notify of resource misuse (p2p), scans, attacks, etc. Flow data is used to produce traffic statistics and views that can zoom from global network usage to host detail • Why? France legislation: connectivity providers need to store data to identify the user for one year • ZNeTS is free for French institutions (21 instances inside IN2P3, 50 outside) 3
Network Traffic Analysis (iii) • At DESY Use Open Source tools to help system administrators monitor, troubleshoot and detect anomalies. • nfdump tools to capture and store network flow information. nfsen and nfsight to process, analyze and visualize flow data. • Netdisco: Extract from network devices information about host, connection to switches, ARP, etc. and use it for troubleshooting and host inventory. 4
Updates on DC Networks • At CERN Changes on the DC network to accommodate the expected bandwidth demand growth and new services for Wigner and the LHC update • Migration from Force10 to Brocade with an upgrade to 5.2Tbps switching fabric for the LCG • Upgrade to 100Gbps of the LCG router interconnects (60 ports) • Firewall capacity doubled: 60Gbps total, 16Gbps stateful • Géant access upgrade from 20Gbps to 40Gbps • Network architecture for Wigner 5
HEPiX IPv6 Working Group 6 • HEP Site IPv6 Survey (WLCG and other HEP sites, 42 replies) • 15 sites IPv6 enabled. Providing DNS, web, email, CAs, Windows domain. • 10 sites plan the deployment within next 12 months • Other sites: • One proposed a new simpler architecture for IPv6! • “So far, there have been no reported requirements or requests from experiments or collaborations for IPv6” • In general, end systems and core networks are ready, applications, tools and sites are not • HEPiX IPv6 Testing • IPv6 readiness will be documented and maintained for all “assets” used by sites and the LHC experiments. Results of this ‘summer’ tests: GridFTP, globus_url_copy, FTS and DPM OpenAFS, dCache, UberFTP, SLURM, Torque… • Future plans: • Tests on the production infrastructure involving Tier 1 centres • Plan HEP IPv6 days • Observations: MUCH work to be done and effort difficult to find (volunteers)
IPv6 Updates (i) 7 • Deployment at IHEP Strong forces driving the IPv6 deployment: In China IPv6 has better available bandwidth and is free. One example: Tunnel IPv4 over IPv6 with USTC for HEP traffic • Dual stack Campus Network and 10Gbps to CNGI • Infrastructure monitoring with Cacti and patched Nagios • Address assignment: DHCPv6 for DC, SLAAC for users • OpenSource Firewall and IDS ready. Working on traffic analysis and anomaly detection • Observations • IPv6 traffic is mostly video/iptv. Data transfers are IPv4 • Moving HEP traffic to IPv6 is in the Work plan
IPv6 Updates (ii) Testing of network devices: completed IPv6 Testbed for CERN users: available New LANDB schema: in production Addressing plan in LANDB: in production Provisioning tools : on going Network configuration: on going User interface (network.cern.ch): on going Network services (DNS, DHCPv6, Radius, NTP): ongoing User training IPv6 Service ready for production 2011Q2 2011Q3 2012Q1 2012Q1 Today 2013Q2 8 • Deployment at CERN
IPv6 Updates (iii) 9 • Testbed at FZU • Monitoring of dual stack infrastructure using two nagios instances: IPv4 only and IPv6 only • Smokeping used to measure GridFTP latency and RTT between FZU and the HEPiXtestbedwith similar results for IPv4 and IPv6 • PXE over IPv6 not supported by manufacturers • Network equipment supports IPv6 hardware switching but very few support management via IPv6
Cyber-security update (i) From http://www.bizarrocomics.com 10
Cyber-security update (ii) 11 • Our full dependence on digital and the fact that we use interconnected accounts (Apple, Google, Amazon, …) make the security of our data depend on the weakest account. • Vulnerability market shift: It’s more profitable to sell vulnerabilities in the black market than publishing or selling to vendors. And you can get an offer from a government. • Windows, Linux or Mac OS? All of them are more or less equally affected by malware. • Latest vulnerabilities: • Java ‘0-day’ 1.6 and 1.7 on various OS (CVE-2012-4681, patched) • Disable Java in your browser if you don’t need it • Internet Explorer 6 to 9 (CVE-2012-4969, patched) • Ummm, write your own browser?
Federated Identity Management for HEP 12 “A framework to provide researchers with unique electronic identities authenticated in multiple administrative domains and across national boundaries that can be used together with community defined attributes to authorize access to digital resources” • A collaboration was started in 2011 called Federated IdM for Research (FIM4R). Requirements have been documented and prioritized. • Plan: Establish a pilot as a proof of concept for the architecture design and integration with WLCG • WLCG FIM pilot project started Oct 2012, lead by RomainWartel (CERN) to build a service enabling access to WLCG resources using home institute-issued federated credentials.
The Lustre File System Cloud Storage and S3 Tier1 Storage Storage
The Lustre File System (i) • At IHEP • 3 PB Lustre FS for detector raw data, reconstruction data, analysis results, public group data and user personal storage • 50 OSSs, 500 OSTs, 8k cores cluster • 10Gbit Ethernet • 1k clients, 0.2 billion files • At GSI • Phasing out old cluster to a new 1.4 PB for HPC • 50 OSSs, 200 OSTs, cluster of 8.6k cores • QDR Infiniband • 500 clients, 110M files • Teralink project: Outside institutes connect to storage via LNET gateways IB-Ethernet 14
The Lustre File System (ii) Pros • HEP jobs follow I/O patterns preferred by Lustre • I/O performance and linear scalability with OSS • Stability Cons • Central metadata server limiting scalability • Difficult to backup and recover metadata and data • Lots of small files and performance • Some recurring bug requiring upgrade 15
Cloud Storage and S3 (i) • CERN Cloud Storage Evaluation • Points of Interest: • Can we run cloud storage systems to complement or consolidate existing storage services? • Are the price/performance/scalability comparable to current CERN services? • S3 (Simple Storage Service) Protocol could be a standard interface for access, placement or federation of data allowing to provide storage services without change to user application • Focus on two S3 implementations of PB scale: OpenStack/Swift and Openlab collaboration with Huawei • Preliminary results: • Client performance of local S3-based storage solutions looks comparable with current production solutions • Achieved expected stability and aggregated performance (Huawei) 16
Cloud Storage and S3 (ii) • Huawei Massive Storage • Nano-scale server: Cell phone (ARM) processors with one disk • Spread the data to scale performance linearly • Map S3 to a Distributed Hash Table of disk keys stored in nodes • Data chunked into MB, protected with EC and stored at pseudo random locations • 1EB design goal. Current status: 384 node system 0.8 PB at CERN • Mucura: Bringing cloud storage to your Desk • Exploratory project by IN2P3 and IHEP to develop an open source software system that provides personal space on a cloud. • The interaction with remote files is the same as with your local files • The system provides you significantly more storage than is locally available in your personal computer (start with a few hundreds of GB) • Targeted at HEP user community excluding I/O intensive applications 17
Next Generation T1 Storage at RAL • Today CASTOR is used for tapes and disk-only storage. • Evaluate alternatives for disk-only aiming for production for 2014 data run. • Is the only case and depend on CASTOR development. CERN moving towards EOS • Dependence on (expensive) Oracle • Nameserver is a SPoF • IPv6 not on Roadmap • Based on a long list of MUSTs and SHOULDs selected for evaluation: dCache, CEPH, HDFS, orangeFS and Lustre • Tests include IOZone, RW throughput (file/gridFTP/xroot), deletion, draining and fault tolerance. • Tests ongoing, so far CASTOR is the most performant in some of the tests and not far off for others (well tuned!) 18