330 likes | 424 Views
CSC’s unix environment. corona.csc.fi and sepeli.csc.fi. ssh connection to Corona. text-based connection does not need much bandwidth no graphics. X-connection to Corona. possibility to use graphical interfaces requires locally installed X-emulator needs more bandwidth.
E N D
corona.csc.fi and sepeli.csc.fi
ssh connection to Corona • text-based connection • does not need much bandwidth • no graphics
X-connection to Corona • possibility to use graphical interfaces • requires locally installed X-emulator • needs more bandwidth
Can I log into Corona if I don’t know unix well? • you can only delete your own files • you need to be an expert to cause big damage • Security is important! Keep your password fresh and safe.
Directories: • $HOME (/mnt/mds/univX/group/username) • - permanent, size limit 200 Mb • $METAWRK (/mnt/mds/metawrk/username) • - storage time 1 month, no size limit • $WRK (/wrk/username) • - storage time 1 week, no size limit • $TMP (/tmp/username) • - storage time 1 day, no size limit • $ARCHIVE (/mnt/fs/archive/univX/group/username) • permanent, no size limit, only for storage • Project directory • - a spcial large area of permanent disk space • for the common usage of the group (needs an application)
Directories of Kalle Käyttäjä (kkayttaj) proj04.tar $ARCHIVE archive1 univ1 oy kkayttaj archive Gradu.tar fs project1 $METAWRK metawrk kkayttaj Fasta_results run.tmp own_programs $HOME / home u1 kkayttaj univ1 oy report.txt data.dat $WRKDIR wrk kkayttaj $TMPDIR tmp kkayttaj test.rubbish
Unix commands ls ls -l ls -l myDirectory
Commands for directories: cd change directory ls list the contents of a directory pwd print (=show) working directory mkdir make directory rmdir remove directory
Commands for files: cat print file to screen cp copy less view text file rm remove mv move/rename a file head show beginning of a file tail show end of a file grep find lines containing given text
Examples of using files and directories $HOME thesis.txt directory1 casein.fasta directory2 bunnies.txt casein.phy structures cd Go back to home directory from anywhere. cd .. Move one level up in the directory hierarchy. (cd .. in ”structures” directory moves you to directory ”directory1”) cp thesis.txt directory1/structures Copies file ”thesis.txt” to the subdirectory “structures”. cp casein.phy ../directory1/ Copies file “casein.phy” to subdirectory “directory1”
Use command “less” to view text files less filename return (next line) space (next screen) b (previous screen) h (show help for less) q (quit) /string (find string from the file) ls -la | less (pipe ls output to less)
Nano (or pico) text editor nano filename ctrl-c (line number) ctrl-g (help menu) ctrl-k (cut a line) ctrl-o (save) ctrl-r (read a file) ctrl-v (next page) ctrl-c (find a word) ctrl-x (exit) ctrl-y (previous page)
Use eog and ggv for displaying images Eog can display e.g. jpg, tiff, gif and png files. eog filename.pgn Ggv can display ps and pdf files ggv filename.ps ps2pdf converts a PostScript file into a pdf-file ps2pdf filename.ps Note: eog and ggv require X connection You can use Scientist’s Interface too ( Settings: show)
General features: arrow keys browse previous commands tabulator auto-fille commands or file names manual pages man command control-c stops the currently running program (or process) Special characters: * (asterisk), wild card, means any text ls *.fasta | (pipe) guides output of a command to an input of another commands ls *.fasta | less > Writes output to a new file ls > files_of_the_directory.txt ~ (tilde) means your home directory as does $HOME cp test.txt ~/file.txt cp text.txt $HOME
Batch queue jobs at CSC • Batch queues in Corona and Sepeli • maximum time limit for interactive jobs is 2 h (CPU h) • longer jobs must be submited through the batch queue system • even rather small jobs can cause overload to the the front node of sepeli • Queue systems aim to optimize the usage of the computing resoiurces • customer defines, how much computing time, memory and processors the job needs • the queue system starts the job when suitable resources are available • during the execution the job can effectively utilize the reserved resources
N1 grid engine in Corona and Sepeli • Both Corona and Sepeli use N1 grid engine queue system • The maximum time and memory limits are different in Sepeli and Corona • Max. time Max. mem Max. proc • Corona 168 h ( 7 days) 192 Gb 32 • Sepeli 240 h (10 days) 4 Gb/subjob 128
N1 grid engine in Corona and Sepeli • In minimum, a batch job script must include a computing time estimate and all the commands needed to run the program: • #!/bin/tcsh • #$ -l h_rt=24:00:00 • raxml -n test1 -s ratite.phy -m HKY85 • The script file is submitted with command: • qsub batch_job.file • The job can be followed with commands • qstat • qstat -u username
N1 grid engine in Corona and Sepeli • Structure of a batch queue file • #!/bin/tcsh “shebang” tells what command shell to use • The lines containing the batch queue definitions start with #$ • Most common definitions • #$ -l h_rt=h:min:sec reserved time • #$ -l v_mem=max_mem(M,G) maximum memory size • #$ -pe cre n_proc Number of processors • #$ -o run.log output file • #$ -e error.log error file • #$ -cwd run job in the directory where it was submitted (works only in corona)
N1 grid engine in Corona and Sepeli • Note that batch jobs start from the home directory with the same • Settings as what the user has just after login. • In the batch job file you must take care of: • Moving to right directory (cd $METAWRK/ or -cwd ) • Setting up the program environment (use emboss etc.) • Giving all the parameters what the execution of the commands needs • #!/bin/tcsh • #$ -l h_rt=24:00:00 • #$ -o ratite_run.log • #$ -e ratite_run.log • cd $METAWRK/birds/ • raxml -n test1 -s ratite.phy -m HKY85
N1 grid engine in Corona and Sepeli • For “interactive” programs you can use <<EOF -structure • #!/bin/tcsh • #$ -l h_rt=24:00:00 • #$ -o mrbayes_run.log • #$ -e mrbayes_run.log • cd $METAWRK/birds/ • mrbayes64 <<EOF • log start filename=data.log • execute rat1.nxs • mcmc • no • sump • sumt • quit • EOF
N1 grid engine in Corona and Sepeli • Note that in sepeli batch jobs can only use files that locate in the $WRKDIR • Directory. ($WRKDIR is the “home directory in computing nodes) • For short or interactive jobs you can use interactive batch jobs • qrsh -l h_rt=4:00:00 • Qrsh opens an interctive session to a one computong node. • The maximum length of the session is defined by -l h_rt
More information about CSC Unix environment Unix operating system: http://www.csc.fi/metacomputer/neuvonta.html.en http://www.csc.fi/oppaat/metakone/ Text editors: http://www.csc.fi/cschelp/kaytto/editorit.html.en
Advantages of unix EMBOSS - more programs (e.g. Vienna, hmmer, meme) - possibility to use list files - big analysis tasks - you can analyze the same data with other unix programs (Clustal, Phylip, BLAST, FASTA, etc.)
EMBOSS in Corona • use emboss – initializes EMBOSS • showdb - displays the databases linked to EMBOSS • wossnameterm - finds programs related to a given term • wossname - lists descriptions of all EMBOSS programs
EMBOSS in Corona • you can start a program by typing its name • you can give parameters interactively corona > seqret Reads and writes (returns) sequences Input sequence(s): swiss:P12067 Output sequence [lyc1_pig.fasta] • or you can give parameters in command line (you can often feed in more parameters in command line) corona ~> seqret swiss:P12067 Reads and writes (returns) sequences Output sequence [lyc1_pig.fasta]:
EMBOSS file formats • EMBOSS uses USA (Uniform Sequence Address) description for sequence files. format::database:name (e.g. fasta::swiss:CAS1_human) • EMBOSS reads and writes several sequence formats including fasta, gcg, staden, swiss, text, clustal. The default format is fasta. One file can include several sequences • EMBOSS can use list files, which contain sequence names in USA format. List file has to be indicated with @-character to the program (seqret @list.txt) • short sequences can be fed in command line using asis::sequence seqret asis::TGCAGCTGCTGCAGCTGCTGC
EMBOSS results • results are stored to a new file (either text file or image) • text files can be viewed with less- and pico- programs • images can be viewed through X-term connections or stored as a postscript file • Use Scientist’s interface to transport data between your machine and Corona
EMBOSS command options -help short command help -opt ask more parameters interactively -auto use default parameters corona ~> seqret -help Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outseq] seqoutall Output sequence(s) USA Optional qualifiers: (none) Advanced qualifiers: -firstonly bool Read one sequence and stop General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbos
EMBOSS general options Many EMBOSS programs use general options that are not included in the help information. For example: -sbegin starting point in the sequence -send ending point in the sequence -sreverse use reverse sequence -sask ask -sbegin, -send and -sreverse parameters interactively -osname name of the output file -ossingle write sequences into separate files
Image output • EMBOSS program asks for image format: Graphics device[x11]: • x11 = show in the screen (requires X-term connection) • ps = write image into post-script file. • Data = write a data file instead of image
How to find the right EMBOSS program? • manuals grouped by program functionhttp://www.csc.fi/molbio/progs/emboss/Apps/groups.html • wossname program: Text search of EMBOSS manuals • EMBOSS - GCG “dictionary”:http://www.csc.fi/molbio/progs/emboss/comparison.html