350 likes | 509 Views
Introduction to PSC Computing Systems. Alex Ropelewski ropelews@psc.edu Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing. Dedicated Biomedical Use Computers. Salk SGI Altix 4700 shared-memory NUMA system
E N D
Introduction to PSC Computing Systems Alex Ropelewski ropelews@psc.edu Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Dedicated Biomedical Use Computers • Salk • SGI Altix 4700 shared-memory NUMA system • Comprising 36 blades. Each blade holds 2 Itanium2 Montvale 9130M dual-core processors, (144 total core). • The four cores on each blade share 8 Gbytes of local memory (288 Gbytes of shared memory total) • Axon • Intel Xenon 2.5 Ghz quad core processors • 32 compute nodes (256 total cores) • 8 Gb/node (256 Gb + total) • Infiniband Interconnect • 2 Front Ends • 40TB Storage System
Dedicated Biomedical Use Computers • Opteron Cluster • Twenty node dual processor 1.4 Ghz AMD Opteron cluster • 4 Gb of memory/node • Quadrics interconnect
Computers Available for General Use (Including Biomedical) • bigben, a Cray XT3 MPP machine with 4136 processors. • pople, an SGI Altix SMP machine; 786 processors, 1.5 Tbytes of memory • ben, an HP Alphaserver cluster comprising 64 4-processor, 4-Gbyte compute nodes. • Front end machines running Linux and VMS • A file archiver These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Consulting • All active PSC users have access to PSC consulting resources: • 800-221-1641 • Phones are staffed Monday - Friday, 9 a.m. to 8 p.m. and Saturday, 9 a.m. to 4 p.m. (EST). • For best service, call for critical problems. • remarks@psc.edu • There is also documentation available at www.psc.edu These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
General Policies • The PSC has policies on computing related topics such as: • Passwords • File retention after grant expiration • Email addresses • To review these policies please see: • http://www.psc.edu/general/policies/policyoverview.html These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Passwords • Computer security depends heavily on maintaining secrecy of passwords • Most machines use a common Kerberos password: • Must be at least 6 characters long. • Longer than 8 characters can prevent you from logging in certain machines. These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Selecting Secure Passwords • Do NOT • simply add numbers to words that can be found in a dictionary, such as "helper01", "amoeba1", "1license" • simply substitute "1" for "L" or "0" for "o" or "1" for "I" in common words to get passwords like "he1per" or "am0eba" or "11cense" • Creating good passwords: • use first letter from an uncommon sentence/phrase that you can easily remember: • I married Sandie on July 2nd in Greentree (ImSoJ2iG) • My 4thgrade teacher was Sister Cyrilla: (M4gtwSC) These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Connecting and Transferring Files • Connect to the PSC machines using ssh • http://www.psc.edu/general/net/ssh/ssh.html • Transfer files between PSC and your home institution using kftp, scp or sftp • Putty: Free ssh client for Windows: • http://www.chiark.greenend.org.uk/~sgtatham/putty/ • Winscp: Free scp/sftp client for Windows • http://winscp.net/eng/index.php These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Opteron Cluster • Contains bioinformatics software and databases • To log into the cluster, ssh to: • bioinformatics.psc.edu • codon.psc.edu • The cluster uses a UNIX operating system • SLURM is used to run serial and parallel programs on the clusters nodes These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
UNIX Operating System • What is UNIX: • Command-line oriented operating system • Available on almost any computing platform. • Flavors of UNIX include: • HP-UX, CentOs, Ubuntu, Redhat, BSD, Solaris, many others • Low overhead • Lots of tools and programming languages • Why Learn UNIX: • Many bioinformatics tools only run on UNIX • Many bioinformatics jobs require UNIX skills
Bioinformatics Analyst Job Posting Responsibilities: • The Bioinformatics Analyst will process sequence data and apply quality control measures for generating high quality raw sequence and assembled data from next generation sequencing technologies. • Will perform whole genome alignments using existing alignment tools, including BLAST, mummer and patternhunter Perform mapping and post-mapping analysis with short reads using third-party and internally developed tools. • Responsible for receiving, processing and managing sequence data. • Evaluate new methodologies and tools and improve data processing and quality control protocols. • Develop suitable metrics for reporting the completeness and quality of the sequence delivered to the customers. Requirements: • B.S. in biology, computer science, bioinformatics or related field, or equivalent combination of education and experience • A minimum of 2 years experience in genomics and bioinformatics-related work. • Proficiency in Unix and experience in one or more of these programming languages -perl, SQL, jython and java is required. • Familiar with the use of commonly-used sequence analysis tools and genomic databases • Willing to multi-task and respond to new challenges as required. • Excellent communication skills. • Hands-on experience in a research or production environment http://jobview.monster.com/getjob.aspx?JobID=78527133&JobTitle=Bioinformatics+Analyst&brd=1&q=bioinformatics&cy=us&lid=316&re=130&AVSDM=2009-01-09+12%3a56%3a00&pg=1&seq=11&fseo=1&isjs=1&re=1000
Bioinformatics Job Posting Position Requirements: · Proven critical-thinking skills and demonstrated ability to manage and interpret large biological data sets · Demonstrated knowledge of genetics and/or molecular biology (e.g., successful college-level coursework) · Proficiency in computer-based DNA and protein sequence analysis using public biological databases · Experience using the Macintosh OS X and Windows operating system · Working knowledge of the Linux or Unix operating system · Basic knowledge of relational database structure and terminology · Proficiency with Microsoft Excel or other spreadsheet applications · Strong interpersonal and communication skills with a confident and cooperative service-oriented attitude · Fluency in both written and spoken English Highly qualified candidates will possess: · Demonstrated experience in analyzing gene expression (microarray) data · Experience in analyzing data from whole-genome association studies (e.g., SNP data) · Laboratory research experience in genetics, molecular biology, or biochemistry · The ability to be self-motivated and work independently, with minimal supervision Education Requirements: · Bachelor’s or Master’s Degree in biology, bioinformatics, or a related field http://jobview.monster.com/getjob.aspx?JobID=78680407&JobTitle=Bioinformatics+Data+Analyst+at+NIH+%2f+NHGRI&brd=1&q=bioinformatics&cy=us&lid=316&re=130&AVSDM=2009-01-15+14%3a31%3a00&pg=1&seq=2&fseo=1&isjs=1&re=1000
UNIX • To use UNIX, for sequence analysis one needs to become familiar with three basic areas: • General information on UNIX • UNIX commands and syntax • Text editor (such as vi, emacs, pico) • This talk presents the minimum that one needs to know in those areas These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
General Information • Commands are organized into “shells”: • sh, csh, ksh, tcsh • Shells can have different commands and different command syntax • Core UNIX commands work the same regardless of shell • Commands are case sensitive • General command syntax is: command -options parameters • Some commands can be listed in special files, which are executed when conditions warrant such as: .login and .cshrc and .profile These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
UNIX File and Directory Structure • Hierarchical (absolute) • No Special Filename Format • Filenames are case sensitive • Single dot . refers to the current directory • Double dots .. refers to the parent directory • $HOME refers to the login directory These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Special Characters • Wildcard characters: * ? [letters] • Home/user Directory: ~ ~user • IO Redirection: <stdin;>stdout;>&stdout+stderr • Concatenate >> • Place job in background: & • Redirect output from a command as input into another command (pipe): | • Stop a job: [control] z • Stop executing: [control] c These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Basic UNIX Commands • kpasswd (passwd) - Change your password • ls - List files in a directory • more - Display contents of a file • cp - Duplicate files • rm, rmdir - Remove a file or directory • mkdir - Create a directory • cd - Change directory • pwd - Show directory • man - Find Unix command usage information These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Basic UNIX commands - kpasswd • kpasswd (passwd) – Change Kerberos Password % kpasswd ropelews@PSC.EDU's Password: New password: Verifying password - New password: % These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Basic UNIX commands - ls • ls - List files in a directory • -l Long format • -a Show hidden files • -F Tag files with "/", "*", or "@" % ls a.doc a.cpr a.out FILE These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Basic UNIX commands - more • more - View contents of file by page % more file.f program intro integer I, J, K real rr,vv,cc parameter (I = 5) : : These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Basic UNIX commands - cp • cp - Duplicate files. % ls a.dat x.dat % cp x.dat xcopy.dat % ls a.dat x.dat xcopy.dat These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Basic UNIX commands - rm • rm, rmdir - Remove a file or a directory • -i inquire before remove • -r recursive remove % ls x.dat xcopy.dat z.file % rm *.dat % ls z.file These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Basic UNIX commands - directory • Directory navigation commands • mkdir - Create a directory • cd - Change directory • pwd - Show directory % mkdir sub1 % mkdir $HOME/sub2 % cd sub1 % pwd /usr/ue/2/ropelews/sub1 % cd $HOME/sub2 % pwd /usr/ue/2/ropelews/sub2 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Basic UNIX commands - man • man - Find Unix command information • man -k <keyword> - Find topics available • man <command> - Show command information % man -k directory mkdir (1) - make directories rm (1) - remove files or directories rmdir (1) - remove empty directories % man rmdir RMDIR(1) User Commands RMDIR(1) NAME rmdir - remove empty directories SYNOPSIS rmdir [OPTION]... DIRECTORY... DESCRIPTION Remove the DIRECTORY(ies), if they are empty. : These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
UNIX Text Editors • emacs – GNU UNIX editor • vi – Traditional UNIX editor • pico –A simple editor • To use full-screen capabilities, terminal type usually needs to be a set to a “vt100” • setenv TERM vt100; tset vt100 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
Which Editor Should You Use? • Use the editor that you are most familiar with! • emacs: • Powerful, works on Unix and some non Unix systems • Moderately easy to master • vi • Powerful, will be on every Unix system • Not intuitive, fairly difficult to master. • pico • Simple, intuitive, easy to learn These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
emacs • To Edit a file named <filename> enter: • emacs <filename> • To navigate: • <arrows keys> - Move cursor one space • <delete> - Delete character • To quit with or without saving: • <cntrl> X <cntrl> C • Then answer Y or N • For more information see: • http://www.gnu.org/software/emacs/ These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
vi • To Edit a file named <filename> enter: • vi <filename> • vi has two modes “navigation” mode (default) and “insertion” mode • To insert text, one must be in “insertion” mode. Several keys (i,a,o) will place you into insertion mode. • To leave the insertion mode, hit [esc] key. These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
vi (continued) • Commonly used vi keys: [arrows] - Move cursor dd - delete line h - Move cursor left dl - delete letter l - Move cursor right dw - delete word k - Move cursor up [esc] - stop insertion j - Move cursor down :wq - write then quit i - insert at cursor :q! - quit a - insert after cursor o - insert below line These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
pico • Based on editor in the Pine email program • To edit a file named <filename> enter: • pico <filename> • To navigate: • <arrows keys> - Move cursor 1 space • <delete> - Delete character • To quit with or without saving: • <cntrl> X • Then answer Y or N These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
SLURM scripts • A file containing a series of instructions for the computer • SLURM scripts are submitted by the user and run when the system has resources available to run the script • SLURM scripts can run parallel programs or serial programs • A SLURM script will be created for you for sequence analysis codes when you run the program makseq These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
SLURM commands • srun – submit a script file to the SLURM scheduling queue • squeue – show status of the SLURM scheduling queue • scancel – remove a running These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
SLURM - srun % srun –b –o test.log test.d srun: jobid 3197 submitted % squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2773 all pgy347_t jshen3 R 3-02:13:43 1 operon20 3194 all test.a ropelews R 2:21 1 operon11 3195 all test.b ropelews R 2:21 1 operon13 3196 all test.c ropelews R 2:21 1 operon14 3197 all test.d ropelews R 2:10 1 operon16 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
SLURM - scancel % squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2773 all pgy347_t jshen3 R 3-02:13:43 1 operon20 3194 all test.a ropelews R 2:21 1 operon11 3195 all test.b ropelews R 2:21 1 operon13 3196 all test.c ropelews R 2:21 1 operon14 3197 all test.d ropelews R 2:10 1 operon16 % scancel 3195 % squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2773 all pgy347_t jshen3 R 3-02:14:35 1 operon20 3194 all test.a ropelews R 3:13 1 operon11 3196 all test.c ropelews R 3:13 1 operon14 3197 all test.d ropelews R 3:02 1 operon16 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center