300 likes | 479 Views
Functions and Separate Compilation. Dr. Nancy Warter-Perez May 7, 2002. Outline. Discuss solution to homework 6 Introduction of workshop 11 Overview of File I/O - workshop 11.1 Switch statement - workshop 11.2 Functions - workshop 11.3 Separate compilation - workshop 11.4
E N D
Functions and Separate Compilation Dr. Nancy Warter-Perez May 7, 2002
Outline • Discuss solution to homework 6 • Introduction of workshop 11 • Overview of • File I/O - workshop 11.1 • Switch statement - workshop 11.2 • Functions - workshop 11.3 • Separate compilation - workshop 11.4 • projects, header files, etc. • Homework 11 Bioinformatics Programming
Homework 6 - Solution // This program will compute the %GC of a given sequence for a specified sliding window size. // Written By: Prof. Warter-Perez // Date Created: April 23, 2002 // Last Modified: // April 23, 2002 - Modified the program to write data to an output file. // April 24, 2002 - Modified the program to compute hydrophobicity using Kyte-Doolittle scale // May 7, 2002 - Modified the program to read sequence from an input file. The input and // output files are user defined. #include<stdlib.h> #include<string> #include<iostream> #include<fstream> using namespace std; Bioinformatics Programming
Homework 6 - Solution int main () { string seq, file_in, file_out; float count; int i, j, window_size; float hydro[25] = {1.8, 0, 2.5, -3.5, -3.5, 2.8, -.04, -3.2, 4.5, 0, -3.9, 3.8, 1.9, -3.5, 0, -1.6, -3.5, -4.5, -0.8, -0.7, 0, 4.2, -0.9, 0, -1.3}; fstream fout, fin; Bioinformatics Programming
Homework 6 - Solution cout << "This program will compute the hydrophobicity of a given sequence \nfor a specified sliding window size.\n" << endl; // Open the input file. cout << "Please enter the input filename:\t" << flush; cin >> file_in; fin.open(file_in.c_str(), ios::in); if(fin.fail()) { cout << "Error: input file does not exist. Program Terminating." << endl; return EXIT_FAILURE; } Bioinformatics Programming
Homework 6 - Solution // Open the output file. cout << "Please enter the output filename:\t" << flush; cin >> file_out; fout.open(file_out.c_str(), ios::out); // Read in the sequence. fin >> seq; // Read in the window size. cout << "Enter the window size: "<< flush; cin >> window_size; Bioinformatics Programming
Homework 6 - Solution // Compute the average hydrophobicity for specified window // size and write to output file. for (i = 0; i < seq.size() - window_size + 1; i++) { count = 0; for (j = i; j < i + window_size; j++) { count = count + hydro[toupper(seq.data()[j]) - 'A']; } fout << i << "\t" << count/window_size << endl; } return EXIT_SUCCESS; } Bioinformatics Programming
Bacteriorhodopsin Sequence (short) The protein sequence in FASTA format: >gi|461612|sp|P33972|BACR_HALHS BACTERIORHODOPSIN (BR) LWLGTAGMFLGMLYFIARGWGETDGRRQKFYIATILITAIAFVNYLAMALGFGLTFIEFGGEQHPIYWAR YTDWLFTTPLLLYDLGLLAGADRNTIYSLVSLDVLMIGTGVVATLSAGSGVLSAGAERLVWWGISTAFLL VLLYFLFSSLSGRVANLPSDTRSTFKTLRNLVTVVWLVYPVWWLVGSEGLGLVGIGIETAGFMVIDLVA Bioinformatics Programming
Window size = 14 Bioinformatics Programming
Bacteriorhodopsin Sequence (long) • Bacteriorhodopsin precursor (BR) (number P02945) • www.ncbi.nlm.nih.gov/entrez • FASTA format • (Thanks to Edain Velazquez) • MLELLPTAVEGVSQAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPFGGEQNPIYWARYADWLFTTPLLLLDLALLVDADQGTILALVGADGIMIGTGLVGALTKVYSYRFVWWAISTAAMLYILYVLFFGFTSKAESMRPEVASTFKVLRNVTVVLWSAYPVVWLIGSEGAGIVPLNIETLLFMVLDVSAKVGFGLILLRSRAIFGEAEAPEPSAGDGAAATSD Bioinformatics Programming
Hydrophobicity – Bacteriorodopsin (Window size = 10) Bioinformatics Programming
Lecture 4 - Plot Bioinformatics Programming
Workshop #11 • Workshop 11.1 Write a program to read in a PAM matrix into a 2-dimensional array. To test, print the 2-D array to stdout. Assume the 2-D array is a global array. • Workshop 11.2 Convert the program of 11.1 into a function without the prints to stdout. Test with dummy programs that display output to stdout. Bioinformatics Programming
Workshop #11 • Workshop 11.3 Write a function that takes index I and index J and returns the PAM score for row I, column J. Assume the PAM matrix is a global array. • Workshop 11.4 Test each function separately and then combine into oneprogram that prompts the user for 2 amino-acids and returns their PAM score. Place the support functions developed in workshop 11.2 and 11.3 in a separate file than the main function. Use a header file to link them. Bioinformatics Programming
File I/O (C++) • #include <fstream.h> • fstream fin, fout; // fin and fout are object names • fin.open(“infilename”, ios::in); // open a file to read • int x; fin >> x; // read an integer from input file into x • char c; fin >> c; // read a character from input file into x • string s; fin >> s; // read a string from input file into s • fout.open(“outfilename”, ios::out); • fout << x; // will write x into output file Bioinformatics Programming
2-D Arrays • int nums[2][3] = {{2,4,6},{-9,-7,-5}}; nums[0][0] == 2 nums[0][1] == 4 nums[0][2] == 6 nums[1][0] == -9 nums[1][1] == -7 nums[1][2] == -5 [0] [1] [2] 2 4 6 [0] [1] -9 -7 -5 Bioinformatics Programming
Workshop #11 • Workshop 11.1 Write a function to read in a PAM matrix into a 2-dimensional array. • Have to parse the input to ignore file heading information and matrix column and row headings. Bioinformatics Programming
Switch Statement int x, y; switch (x) { case 0: y = 1; break; case 1: y = 2; break; case 2: y = 3; default: y = 4; } x = 0? y = 1 x = 1? y = 2; x = 2? y = 4! Else y = 4 Bioinformatics Programming
Switch Statement • Works with char and int char c; int y; switch (c) { case ‘a’: y = 1; break; case ‘b’: case ‘c’: y = 2; break; case ‘z’: y = 3; } Bioinformatics Programming
1-D Arrays as Look-Up Tables int table[3] = {1, 2, 2}; char c; if(c != ‘z’) y = table[c - ‘a’]; else y = 3; Bioinformatics Programming
Functions • Break program into modules or functions • Easier to understand program • Functions can be reused (e.g.,library functions) • Easier to develop a program step by step • Can test each function independently • First function in a program must be “main” Bioinformatics Programming
Functions <return_type> func_name (arg1_typ arg1_name, …, argN_typ argN_name) { function body } • Func_name – name of the function • main – all programs must start with this function • Return_type – type of value returned by function • Arguments • call-by-value – arguments are inputs to function that can’t be modified by function • Function prototype (used in header files [*.h]) <return_type> func_name (arg1_typ, …, argN_typ); • Library functions – commonly used functions • stdlib.h, stdio.h, math.h, string.h (to name a few) Bioinformatics Programming
Workshop #11 • Workshop 11.2 Convert the programs of 11.1 into a function without the prints to stdout. Test with dummy programs that display output to stdout. Bioinformatics Programming
Projects Separate Compilation Library functions *.a Compile File1.c (or .cpp) Link Executable *.exe … FileN.c (or .cpp) Compile Object files *.obj Bioinformatics Programming
Separate Compilation • Break program into different files (can be developed by different people) • Arrange functions logically into files • Information is communicated between functions using header files • Projects contain all files that need to be compiled for a given executable Bioinformatics Programming
Header Files • filex.cpp can export information to other files using a header file, filex.h • usually contains • function prototypes • constant declarations • global variables (extern) • user defined • #include “filex.h” // can specify the path if not // in same directory Bioinformatics Programming
Header Files • filex.cpp #include <iostream.h> #include “filex.h” int call_x(int a) { cout << a << endl; return a++; } • filex.h int call_x(int a); • filey.cpp • #include <iostream.h> • #include “filex.h” • void main () { • int y; • y = call_x(5); • cout << y << endl; • } Bioinformatics Programming
Workshop #11 • Workshop 11.3 Write a function that takes as input the PAM matrix, index I, and index J and returns the PAM score for row I, column J. • Workshop 11.4 Test each function and combine into oneprogram that prompts the user for 2 amino-acids and returns their PAM score. Place the support functions developed in workshop 11.2 and 11.3 in a separate file than the main function. Use a header file to link them. Bioinformatics Programming
Homework #11 – due 5/14 • Write a function to determine the score of 2 sequences aligned by the Needleman-Wunsch method using the scoring method proposed in the Lecture 5. A gap in an aligned sequence will be represented by a period (“.”). • Test your function with a program that reads the sequences from standard input and displays the score to standard output. You should test your program with different sequences. • Modify your program to use PAM scoring rather than Match/Mismatch scores. Bioinformatics Programming
Needleman-Wunsch Method The sequences abcdefghajklm abbdhijk are aligned and scored like this a b c d e f g h a j k l m | | | | | | a b b d . . . h i j k match 4 4 4 4 4 4 mismatch -3 -3 gap_open -2 gap_extend -1-1-1 for a total score of 24-6-2-3 = 13 Bioinformatics Programming