280 likes | 317 Views
Container Practices at IHEP. Wei Zheng Computer Center, IHEP,CAS 2019-4-3. Contents. IHEP Introduction Container Practices Container Orchestration with Kubernetes Container Security Next Work Summary. Introduction to IHEP Computing Platform.
E N D
Container Practices at IHEP Wei Zheng Computer Center, IHEP,CAS 2019-4-3
Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019
Introduction to IHEP Computing Platform • IHEP: Institute of High Energy Physics, Chinese Academy of Science, largest fundamental research center in China • IHEP Computing Center: network and computing service to HEP experiments BEPCII/BESII CEPC LHAASO CSNS JUNO DYB ISGC 2019
IHEP Local Cluster • Computing • HTC computing • ~10,000 cpu cores • Job slots utilization: >85%, 11.7 million jobs (2018.12-2019.3) • HPC Computing • 125 work nodes: 2,808 CPU cores • 10 GPU worker nodes, 0.5 PFLOPS • 80 GPU cards : NVIDIA Tesla V100 nvlink, 32GB ISGC 2019
IHEP Local Cluster • Login nodes • 30+ login nodes shared for all users • More than 200 active users • Storage • Lustre: 10 instances, totally 10 PB, 67% used • EOS: 2 instances, 3.5PB capacity ISGC 2019
Remote site cluster • Remote sites of BESIII distributed computing • Not big scale cluster • IHEP manpower is responsible for unified operation and maintenance ISGC 2019
Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019
Motivation • More software/services provide container APP versions • Store the old physical software runtime environment • Improve resource utilization by scheduling container jobs to run on remote sites/cloud • Automated deployment and scaling of services through container orchestration • …. ISGC 2019
Container performance- disk I/O • Lustre junofs disk performance in container • Benchmark IOzone container I/O performance loss less than 2% ISGC 2019
Container performance - job running • JUNO data processing in Bare-metal , VM and Docker • Compared with VM, Docker container has better resources utilization, Docker Performance Loss(PL) between 0.5%~2%, VM PL 3%~5%
Container images create/build • Docker image • Create image by mkimage.sh script • Build an image from custom dockerfile • Download from dockerhub/dockercloud • Use the officially provided image • Singulairy image • Download from shub/dockerhub • Build from singularity custom recipe files • Every image build both a writable and a read-only image ISGC 2019
Container images create/build Image script [root@bws0780 ~]# bash mkimage.sh -p yum -g " Base" SL75 From slc65-base MAINTAINER zhengwei RUN yum install -y make gcc-c++ gccbinutil \ && yum install -y libX11-devel libXpm-devellibXft-devellibXext-devel \ && yum install -y install mesa-libGL-develftgl-develmysql-devel\ && yum install -y fftw-develgraphviz-devel \ && yum install -y avahi-compat-libdns_sd-devel python-devel \ && yum install -y libxml2-devel gsl-static gsl-devel \ && yum install -y qt-devel && yum clean all CMD /bin/bash Docker file Singularity> cat Singularity-SL55Base BootStrap:yum OSVersion: 5.5 MirrorURL: http://mirror.ihep.ac.cn/slc/slc55/x86_64/SL UpdateURL: http://mirror.ihep.ac.cn/slc/slc55/x86_64/updates/RPMS Include: yum %setup………… Singularity recipe file ISGC 2019
Container images type • Operation System SL7.X SL6.X SL5.5 • Login/Work Node LoginNode SL75/65/69/55 WorkerNode SL65/69/55/58 • Physical software Bes, Juno, Lhaaso…. • Services Mysql, MonitorAgent, Apache, Grafana…… ISGC 2019
Container images stroage • Docker images • Ihep private docker registry • AFS /CVMS • Singulairy images • AFS/CVMS [root@mirror SL55]# ll -th -rwxr-xr-x 1 root root 8.2G Feb 28 15:30 WorkNode55-writable-20190227.img -rwxr-xr-x 1 root root 2.0G Feb 27 09:39 WorkNode55-onlyread-20190227.img ISGC 2019
Hep_container tool • Hep_container • Develop a container tool for IHEP computing platform users • Based on singularity, support docker in future • Satisfy users’ various container requirements • Location and type of the container images are transparent to the user • Easy to update container images • Automatically mount directory according to user’s group name • Lustre/eos/afs/cvmfs • Besfs/afs/cvmfs/…… • Support for IHEP and PKU site ISGC 2019
Container with job scheduler • Htcondor Docker universe • A docker universe job run a Docker container from image • HTCondor manages the running container as HTCondor job on an execute host • Then the running container can then be managed as any HTCondor job. universe = docker docker_image = Juno-worknode65 executable = /bin/cat arguments = /etc/hosts should_transfer_files = YES when_to_transfer_output = ON_EXIT output = out.$(Process) error = err.$(Process) log = log.$(Process) request_memory = 100M queue 1 ISGC 2019
Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019
Kubernetes build container LoginFarm • LHAASO experiment computing Login farm • Login nodes are managed by openstack+kubernetes • Start containers of SL7 Login nodes on Openstack VM platform by Kubernetes • Auto dynamic expansion • kube-proxy for load balancing • Cluster Loginfarm is more stable and highly available ISGC 2019
Loginfarm of LHAASO kubernetes • Openstack Queens(RDO) • Docker • V18.06 • Kubernetes nodes • 1 master and 1 HA • 3 slave worker • V1.12.0 • Host OS • CentOS 7.6 • VM Instance OS • Scientific Linux release 7.6 • Container OS • Scientific Linux release 7.5 ISGC 2019
Dashboard of LHAASO kubernetes ISGC 2019
Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019
Container security practice • Image Security • Build our own base images OS • Pull official image with docker certified and certified publisher • Offer read-only singularity image for users • Scan images to detect and prevent containers with known vulnerabilities or malicious packages • Host Security • Run containers as non-root users • Least privilege • Only needed run with –privileged=true or –cap –add ISGC 2019
Container security parctise • Kubernetes Security • Use Namespaces to Establish Security Boundaries • Update to the stable Kubernetes 1.11.0->1.12.0 for Kubernetes privilege escalation vulnerability CVE-2018-1002105 • Update docker v17.03->v 18.06 to solve risk CVE-2019-5736 [root@lhmtk8s01 ~]# kubectl get namespace NAME STATUS AGE default Active 102d ingress-nginx Active 7d23h kube-login Active 90d kube-public Active 102d kube-system Active 102d ISGC 2019
Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019
Next Work • Cluster SL6 os will upgrade to SL7 this year, SL6 and SL5 jobs will be only running in container • BESIII software container are building in process, which will be used on Tianhe-2, a fast national supercomputer in GuangZhou • Auto schedule jobs to remote sites or cloud to utilize their idle resources • More remote site such as ustc sdu buaa will join in • Jupyter notebook platform will be offered ISGC 2019
Contents • IHEP Introduction • Container Practices • Container Orchestration with Kubernetes • Container Security • Next Work • Summary ISGC 2019
Summary • Through test comparison, the performance loss of the container is very little and better than running on VM • Support kinds of customized container image for IHEP experiments • Hep_conainer tool provides users with a unified container portal to meet the customization needs of users and the needs of multiple sites • Realize containerization of LHAASO LoginNode through kuberteness, simple load balancing and scaling • More jobs will be run in container ISGC 2019
Thanks for your attentions! 谢谢! ISGC 2019