100 likes | 293 Views
Bio::Structure. Why. Ad hoc scripts Code is not reusable Nothing available for structure work in Bioperl (Biopython, Biojava). It’s fun. Protein Data Bank (PDB). ~18000 structures column based format entries created by user, validated when submitting
E N D
Why • Ad hoc scripts • Code is not reusable • Nothing available for structure work in Bioperl • (Biopython, Biojava) • It’s fun Bio::Structure
Protein Data Bank (PDB) • ~18000 structures • column based format • entries created by user, validated when submitting • multiple versions of PDB format • not all entries in same format • a lot of info squeezed into format • entry / model / chain / residue / atom • ‘lingua franca’ • mmCIF (successor, but who uses it) Bio::Structure
Example code • my $structio = Bio::Structure::IO->new(-file => $bpti_file, -format => 'pdb'); • # read the structure • my $struc = $structio->next_structure; • # loop over whole structure and store CYS SG atoms • for my $res ($struc->residue) { # we only look at CYS residues • next unless($res->id =~ /^CYS/); # and we only take SG atoms • for my $atom ($struc->get_atoms($res)) { • next unless( $atom->id eq "SG"); • push @sgatoms, $atom; • } • } • # • # loop over all SG atoms and calculate the distance between them • # • [ … ] • # get $atom1 and $atom2 • my $dist = calculate_distance($atom1,$atom2); • printf("%-6s %s - %-6s %s %-.2f\n", • $struc->parent($atom1)->id, $atom1->id, • $struc->parent($atom2)->id, $atom2->id, $dist); Bio::Structure
Example code (cont) • # now have a look what the annotation was in the PDB file • print "\nThe annotation in the PDB file\n"; • my ($ann) = $struc->annotation->get_Annotations("ssbond"); • my $txt = $ann->as_text; # this text starts with "Value: " • $txt =~ s/^Value: //; # it contains lines of 65 chars long each • for (my $t = 0; $t <= length($txt); $t += 65) { • my $line = substr ($txt,$t, 65); • print "$line\n"; • } • Complete code can be found in the bioperl distribution in • examples/structure/struct_example2.pl Bio::Structure
Output • The measured SG-SG distances are • CYS-5 SG - CYS-14 SG 24.10 • CYS-5 SG - CYS-30 SG 8.36 • … • CYS-5 SG - CYS-55 SG 2.00 • … • CYS-14 SG - CYS-38 SG 2.05 • … • CYS-30 SG - CYS-51 SG 2.05 • The annotation in the PDB file • 1 CYS 5 CYS 55 • 2 CYS 14 CYS 38 • 3 CYS 30 CYS 51 Bio::Structure
What’s not perfect – the future • “ CA “ “CA “ (*) • MTRIXn • segID (*) • altloc (biopython) • Writing ‘old format’ PDB files • BOF ideas • (*) fixed in CVS Bio::Structure
Thanks • Bioperl mailing list for help in designing objects • Yan-Yuan Tseng • Ethan Merritt • Joe Krahn • AlgoNomics Bio::Structure
kris.boulez@algonomics.comAlgoNomics NVTechnologiepark 4B-9052 Gent-Belgiumhttp://www.algonomics.com/