410 likes | 432 Views
Basics of Linux I The Linux Command Line Interface. [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. Updated for 2019-02-13. Study Resources : A Free Book. 500+ pages. * Some of the materials covered in today’s training is from this book.
E N D
Basics of Linux I The Linux Command Line Interface [web] portal.biohpc.swmed.edu[email] biohpc-help@utsouthwestern.edu Updated for 2019-02-13
Study Resources : A Free Book 500+ pages * Some of the materials covered in today’s training is from this book
Study Resources : Cheat Sheets On the portal Training -> Slides & Handouts
Study Resources : Follow Along… You can follow along using: • the Nucleus Web terminal on the BioHPC portal: https://portal.biohpc.swmed.edu/terminal/ssh/ • Putty or SSH from your PC • Terminal from your MacBook ssh <username>@nucleus.biohpc.swmed.edu
Study Resources: Man pages Get a command’s help page: man <command> Press q or Ctrl-C to exit the man page
The Terminal • Not too long ago….. • Computers were primarily found in research centers, business, educational institutions, and libraries. • Access points to these computers were called terminals. • Simple keyboard and monitor interface • Computer may be a small, single unit or part of a larger network • Many of these computers ran a licensed UNIX operating system developed by AT&T
Modern Unix Descendants Android Created by: Google Forked from Linux 2008-today 2001-today GNU/Linux Created by: Linus Torvalds and thousands of community developers 1991-today macOS Created by: Apple, Inc. 2001-today
What operating system do BioHPC machines primarily run on? • Red Hat Enterprise Linux 7.4 • Created by Red Hat, Inc. • GNU/Linux distribution • Linux Kernel 3.10 • Gnome 3 Desktop Environment • Bourne-Again Shell (bash)
SSH – Secure Shell The majority of your interactions with the Nucleus cluster will likely be through SSH. Most modern GNU/Linux distributions have an OpenSSH client installed by default. Mac OS X also has SSH. PuTTY is recommended for MS Windows. ssh s178337@nucleus.biohpc.swmed.edu Encrypted traffic
The Text (Command-Line Interface) Shell The interaction between user and the operating system is provide by a shell. The shell accepts keyboard commands and hands them off to the operating system. The BioHPC default shell is Bash – the Bourne-Again Shell. [s178337@Nucleus005 ~]$ echo –n Hello, world! user name arguments host name working directory options/switches name of the command
Logging into Nucleus – Where Am I? [s178337@Nucleus005 ~]$ pwd – print working directory [s178337@Nucleus005 ~]pwd /home2/s178337 ls – list contents of a directory [s178337@Nucleus005 ~]ls [s178337@Nucleus005 ~]ls /home2/s178337
Linux Basics: The File System Files on a Linux system are arranged in a hierarchical directory structure. / tmp work … project home2 apps bin {department} shared shared {department} {uid} {uid} shared {PI_lab} shared {uid or project_name} shared
Navigating the file system How does one change their working directory? cd – change directory Let’s try accessing our personal work directory. [s178337@Nucleus005 ~]$ cd /work/biohpcadmin/s178337
Linux Command Line: Files and Directories Files and directories may be referenced by an absolute or relative path • Absolute path—specify the location of a file or directory from / (root directory) • cd /work/biohpcadmin • Pros: You know exactly where you are going! • Cons: Tedious if there are many nested folders • Relative path— paths relative to your working directory • Working directory: . • Up one folder: .. • Relative path of subfolder: s178337 / work (I am here) biohpcadmin s178337
Determining your storage quota [s178337@Nucleus005 ~]$ quota -s Disk quotas for user s178337 (uid 178337): Filesystem space quota limit grace files quota limit grace lysosomehome:/home2 42872M 51200M 71680M 468k 0 0 [s178337@Nucleus005 ~]$ biohpc_quota Current BioHPC Storage Quotas: FILE | SPACE USAGE | NUMBER OF FILES SYSTEM | USED SOFT HARD | USED SOFT HARD ---------|----------------------------|--------------------------------- User quotas for s178337 ---------|----------------------------|--------------------------------- home2 | 42872M 51200M 71680M | 468k 0 0 work | 343.6G 5T 7T | 69544 0 0 ---------|----------------------------|--------------------------------- Group quotas for biohpc_admin ---------|----------------------------|--------------------------------- project | 22.93T 0k 0k | 13515770 0 0 archive | 3.234T 30T 40T | 2053323 0 0
How much storage is a directory occupying? Example: cd /work/biohpcadmin/s178337/gutenberg What’s inside? [s178337@Nucleus005 gutenberg]$ ls 0178 inefficient_reader.py~ 1189large_txt.bin 1019 cat.txt multithreaded_efficient_reader.py 112cat_txt.py multithreaded_efficient_reader.py~ 123 cat_txt.py~ multithreaded_inefficient_reader.py 134efficient_reader.py multithreaded_inefficient_reader.py~ 145 efficient_reader.py~ readme.txt 156inefficient_efficient_readers.tar.gz README.txt~ 167inefficient_reader.py slurm_cat.sh How much space does this directory and all of its contents use? [s178337@Nucleus005 digits_jobs]$ du -hs . 23G . disk usage human-readable; summarize
Exploring the file system [s178337@Nucleus005 ~]$ cd /project/shared/biohpc_training Print the contents of the directory using the long format. [s178337@Nucleus005 ~]$ ls –l [s178337@Nucleus005 biohpc_training]$ ls -l total 10360628 -rwxr-xr–x 1 root root 23 Feb 5 11:50 c475_r0ck_4m_1_r16h7.txt -rwxr-xr–x 1 root root 22643931 Feb 5 13:48 HD728.R1.fastq.gz -rwxr-xr–x 1 root root 10476699185 Oct 31 11:34 large_txt.bin -rwxr-xr–x 1 root root 4716 Feb 5 13:49 regions.txt permissions file name owner Size (bytes) last modified time group ownership
Exploring the file system What type of file is c475_r0ck_4m_1_r16h7.txt ? [s178337@Nucleus005 biohpc_training]$ file c475_r0ck_4m_1_r16h7.txt c475_r0ck_4m_1_r16h7.txt: ASCII text Let’s concatenate the contents of this file to the standard output of the terminal. (this just one way to print a file to the terminal) [s178337@Nucleus006 biohpc_training]$ cat c475_r0ck_4m_1_r16h7.txt Cats rock! Am I right? While file extensions help with organization, we can still determine what a file is without an extension. What type of file is RJ_WS? [s178337@Nucleus006 biohpc_training]$ file RJ_WS RJ_WS: ASCII text
Bash has a very useful auto-completion shortcut for typing commands more quickly. Give it a try! Try to type: cat /project/shared/biohpc_training/c475_r0ck_4m_1_r16h7.txt Using bash auto-completion.
Viewing large text files A file does not have to be very large before concatenating them to the standard output becomes unhelpful. To illustrate, let’s examine HD728.R1.fastq.gz. The file extension has a .fastq.gz file extension, but what does using file produce? [s178337@Nucleus006 biohpc_training]$ file HD728.R1.fastq.gz HD728.R1.fastq.gz: gzip compressed data, was "HD728.R1.fastq", from Unix, last modified: Sun May 13 07:25:19 2018 The file is a compressed file. It’s contents would be unreadable to us. Let’s decompress the file first using gzip. $ gzip-cd HD728.R1.fastq.gz > HD728.R1.fastq Exercise: Using the program wc, count how many lines of text are inside HD728.R1.fastq How would one access information on how to use this program?
Viewing large text files [s178337@Nucleus006 biohpc_training]$ wc -l HD728.R1.fastq 2019172 HD728.R1.fastq Considering that most terminal emulators default to a size of 80 x 24 characters, trying to cat 2M+ lines to the standard output is going to be a bit problematic. Let’s try it anyway…
Interrupting a Running Program What happens if I need to kill a program that is running? Pressing CTRL + C will send an interruption signal (SIGINT) to the program which usually kills it. If not….you should consult man kill
Head, Tail, More, Less Not always practical to print an entire file to the shell. Commands: head – Print the first 10 lines of each FILE to standard output tail – Print the first 10 lines of each FILE to standard output. • Exercise: • Print the first 50 linesof HD728.R1.fastq! Hint: man head What if one wanted just wanted to peruse the file without using cat? Commands: more – a filter for paging through text one screenful at a time less – like more, but allows both backward movement and forward movement
Redirection What if one wanted to take the stdout of a command and save it to a file? process stdin stderr “>” redirects stdout to a file $ ls /usr/bin > ls_out.txt “>>” appends stdout to a file $ echo “EOF” >> ls_out.txt Why do we see stdout for: ? $ ls /bin/usr > ls_out.txt stdout
Redirection What if one wanted to redirect both stdout and stderr to a file? process stdin stderr Method 1: Two redirections: stderr to stdout using file descriptors and then stdout redirected to file stdout numeric file descriptor: 1 stderr numeric file descriptor: 2 $ ls –l /bin/usr > out.txt 2>$1 Method 2: Single redirection $ ls –l /bin/usr&> out.txt stdout
Redirection What if I DON’T CARE AT ALL about the stdout and stderr ?!?!?! process stdin We can simply redirect stdout and stderr to a special place where only things can go in and never come out. $ ls –l /bin &> /dev/null stderr stdout
Text Editors Editors vi / vim Cryptic commands! Cheat sheet on the portal. Quick tutorial: http://www.washington.edu/computing/unix/vi.html Emacs An extensible, customizable text editor Quick tutorial: http://www.gnu.org/software/emacs/tour/ nano Easier to use. Quick tutorial: http://mintaka.sdsu.edu/reu/nano.html Any text editor from your PC or Mac Mount your directories as network drives https://portal.biohpc.swmed.edu/content/guides/biohpc-cloud-storage/
Permissions $ ls -l drwxr-xr-x 4 dtrudgian biohpc_admin 58 Feb 16 15:13 all_training drwxr-xr-x 7 dtrudgian biohpc_admin 140 Feb 12 10:36 Apps drwxr-xr-x 2 dtrudgian biohpc_admin 26 Feb 16 15:12 cli_training drwxr-xr-x 8 dtrudgian biohpc_admin 4.0K Feb 16 14:25 Cluster_Installs drwxr-xr-x 3 dtrudgian biohpc_admin 4.0K Feb 16 11:49 Desktop drwxr-xr-x 2 dtrudgian biohpc_admin 10 Feb 16 14:10 Documents drwxr-xr-x 9 dtrudgian biohpc_admin 135 Feb 16 14:32 Downloads -rw-r--r-- 1 dtrudgian biohpc_admin 336 Feb 16 15:16 error.txt drwxr-xr-x 10 dtrudgian biohpc_admin 4.0K Feb 9 12:45 Git drwxr-xr-x 17 dtrudgian biohpc_admin 4.0K Feb 16 15:17 ownCloud drwxr-xr-x 2 dtrudgian biohpc_admin 10 Feb 16 14:18 Pictures drwxr-xr-x 5 dtrudgian biohpc_admin 102 Feb 4 11:19 portal_jobs Permissions Owner Group
Octal Permissions Add up the permissions you need for each class, e.g.rx = 5 rw = 6 rwx = 7 r = 4 w = 2 x = 1 -rw-r--r-- 1 dtrudgian biohpc_admin 336 Feb 16 15:16 error.txt Owner can read+writeGroup can readOthers can read 6 4 4
Permissions chmod u/g/a +/- r/w/x filename Class: u = user (owner) g = group a = all r read w write x execute + Add permission - Remove permission chmodg+rw test.txt Add read/write permissions for the group chmoda+xscript.sh Add execute permission for everyone chmod g-x script.sh Remove execute permission for the group chmod 700 script.sh -rwx------ chmod 640 script.sh –rw-r-----
Demo - Creating a shared folder Public Sequencing data /archive/shared/biohpc_training/sample_data/vc_calling_session2 Create a copy that is readable and writeable by your only your primary group in /project/department/lab/shared/sequencing_data Move all bam files into /project/<department>/<lab>/shared/sequencing_data/bam Move all fastq.gz files into /project/<department>/<lab>/shared/sequencing_data/fastq Remove any additional files in /archive/shared/biohpc_training/sample_data/vc_calling_session2
Copying data Create an empty directory $ mkdir –p /project/biohpcadmin/shared/sequencing_data (-p ensures parent folders are created if not already present) Copy each everything recursively (-r) from the source to destination $ cp -r /archive/shared/biohpc_training/sample_data/vc_calling_session2/* /project/biohpcadmin/shared/sequencing_data/ OR Copy the entire folder recursively as: $ cp -r /archive/shared/biohpc_training/sample_data/vc_calling_session2/project/biohpcadmin/shared/sequencing_data
Moving data To move files, one can use the mv command. Note: mv will attempt to preserve original permissions Moving all .bam files $ cd /project/biohpcadmin/shared/sequencing_data $ mkdir bam $ mv *.bam bam/ Moving all .fastq files $ mkdirfastq $ mv *.fastqfastq/
Deleting Files Be very cautious of your ability to destroy files! There is no Recycling Bin to restore your files. Once files are deleted by the CLI, it is generally difficult to recover them. Make sure important data is backed up! rm is used similarly to cp and mv To delete everything in a folder (non-recursively) $ rm/project/biohpcadmin/shared/sequencing_data/* To delete a folder recursively $ rm –r /project/biohpcadmin/shared/sequencing_data To delete objects interactively (slowest, but safest) $ rm –i/project/biohpcadmin/shared/sequencing_data/*
Wildcards * Match any number of characters ls * Any file ls notes* Any file beginning with notes ls *.txt Any file ending in .txt ls *2019* Any file with 2015 somewhere in its name ? Match a single character ls data_00?.txt Matches data_001, data002, data_00A etc. [] Match a set of characters (bracket expression) ls data_00[0123456789].txt ls data_00[0-9].txt Matches data_001 – data_009, not data_00A
Applying new Permissions $ ls -la vc_calling_session2/ total 163875 drwxr-xr-x 2 s178337 biohpc_admin 4096 Feb 12 13:29 . drwxrwxrwx 3 root root 4096 Feb 12 13:29 .. -rwxr-xr-x 1 s178337 biohpc_admin 130 Feb 12 13:29 germline.design.txt -rwxr-xr-x 1 s178337 biohpc_admin 41003480 Feb 12 13:29 HD728.nanocourse.bam -rwxr-xr-x 1 s178337 biohpc_admin 1581120 Feb 12 13:29 HD728.nanocourse.bam.bai -rwxr-xr-x 1 s178337 biohpc_admin 22643931 Feb 12 13:29 HD728.R1.fastq.gz … … … By default, cp will apply your ownership and primary group to files Are these the permissions we want for group-only read and write? If not, what command will apply the appropriate permissions? $ chmod –R 770 /project/biohpcadmin/shared/sequencing_data
Environmental Variables – Controlling the behavior of the Shell Several variables control the behavior of the shell. You can print all of these variables with: $ env Or print them individually $ echo $SHELL /bin/bash $ echo $HOME /home2/s178337 $ echo $USER S178337 $PATH variable is one of the most important and tells the shell where your programs are: $ echo $PATH /cm/shared/apps/slurm/16.05.8/sbin:/cm/shared/apps/slurm/16.05.8/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin The module system on BioHPC modifies this $PATH so that programs are made available to the user. One can also manually edit their $PATH $ export PATH=/home2/s178337/bin:$PATH