180 likes | 317 Views
CSE-700 Parallel Programming Assignment 6. 박성우. POSTECH Oct 19, 2007. Species and Sequences. Species. Sequence 1. Sequence 2. Sequence n. Ortholog. Last Common Ancestor. S. By speciation. Human. Dog. S1. S2. Paralog. Human. S. By duplication. Human. S1. S1'. Inparalog.
E N D
CSE-700 Parallel ProgrammingAssignment 6 박성우 POSTECH Oct 19, 2007
Species and Sequences Species Sequence 1 Sequence 2 ... Sequence n
Ortholog Last Common Ancestor S By speciation Human Dog S1 S2
Paralog Human S By duplication Human S1 S1'
Inparalog Last Common Ancestor S By speciation Human Chimpanzee S1' S1 S2 By duplication
S S1 S2 S' S1' S2' Paralog - Outparalog LCA = Last Common Ancestor LCA Human Dog
Coortholog Species A Species B S1' S1 S2 S2'
Input • Assume a total of n species S1, S2, ..., Sn • For each pair of species {Si, Sj} • Ortholog and paralog relations • Thus n(n + 1)/2 ortholog/paralog files
Seed Ortholog Species A Species B Cluster 1.0 Si Sj
Invariant: No Two Seed Orthologs for Any Sequence Species A Species B Sj 1.0 Si 1.0 Sk
Ortholog and Paralogs Species A Species B Cluster 1.0 Si Sj Si'
Output • Assume a total of n species S1, S2, ..., Sn • Ortholog and paralog relations among all these species • In each cluster, • seed ortholog from each pair of species • paralogs may be included.
S1' S4' S1 S4 Example of Cluster [1] A B S2 S2' D C S3 S3'
S1' S4' S1 S4 Example of Cluster [2] A B S2 S2' D C S3 S3'
S1' S4' S5' S1 S4 S5 Bad Clusters [1] A B S2 S2' D C S3 S3' E
S6' S6 Bad Clusters [2] D C S4' S4 S3 S4'' S5 E
Input File Format • Each line consists of: • Cluster number • Similarity score • Species name • Seed ortholog • Sequence name
Goal • Implement ANY sequential algorithm • There is no definitive answer. • Then parallelize it. • A parser and an output module are provided. • no string comparion • all integer operations