Bioinformatica BioPerl

BioinformaticaBioPerl Dr. Giuseppe Pigola – pigola@dmi.unict.it

Link Utili • http://www.bioperl.org • Utilizzare il tool Perl Package Manager: • http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows • Altri package: • http://biojava.org • http://biopython.org • http://www.biophp.org Bioinformatica

BioPerl • BioPerl è una collezione di moduli Perl che favoriscono lo sviluppo di script relativi ad applicazioni bioinformatiche; • Dato che Perl è un ottimo linguaggio per la manipolazione di testo risulta molto efficace nelle applicazioni bioinformatiche; • BioPerl è orientato agli oggetti; Bioinformatica

Namespace di BioPerl • Bio:: Seq:Oggetto sequenza (DNA,RNA, Proteina); • Bio::SeqIO:Recupero e conservazione delle sequenze (in tanti formati); • Bio::SeqFeature:Caratteristiche (Gene, Esone,Promotore, etc); • Bio::Annotation:Usato per memorizzare link a DB, letteratura e commenti; • Bio::AlignIO; • Bio::SimpleAlign; • Bio::DB; • Bio::SearchIO; • ………. • …. Bioinformatica

Manipolare Sequenze • Crea un oggetto sequenza con determinati attributi: Use Bio::Seq; $seq = Bio::Seq->new(’-seq’=>’actgtggcgtcaact’,’-desc’=>’Sample Bio::Seq object’, ’-display_id’ => ’something’,’-accession_number’ => ’accnum’,’-moltype’ => ’dna’); $seq->display_id(); # Common Name $seq->seq(); $seq->length(); $seq->subseq(5,10); #Restituisce una stringa $seq->accession_number(); $seq->moltype(); $seq->primary_id(); # Indipendente dagli ID nei vari DB $seq->trunc(5,10) # Sottostringa (nuovo oggetto) $seq->revcom # Sequenza complementare (nuovo oggetto) $seq->translate # Traduzione of the sequence (nuovo oggetto) $seq->translate(p1,p2,p3) # p1=simbolo codone di stop, p2=aa X, p3= frame; Bioinformatica

Semplici Statistiche • Statistiche sulla sequenza: Use Bio::Seq; use Bio:: Tools::SeqStats; $seq = Bio::Seq->new(’-seq’=>’actgtggcgtcaact’,’-desc’=>’Sample Bio::Seq object’, ’-display_id’ => ’something’,’-accession_number’ => ’accnum’,’-moltype’ => ’dna’); $seq_stats = Bio::Tools::SeqStats->new($seq); $weight = $seq_stats->get_mol_wt(); #inf e sup (array) $monomer_ref = $seq_stats->count_monomers(); # frequenze (hash) $codon_ref = $seq_stats->count_codons(); # for nucleic acid sequence (array) Bioinformatica

BLAST in Locale • Ricercare sequenze simili sul DB “ecoeli.nt”: Use Bio::Seq; Bio::Tools::StandAloneBlast; @params = (’program’ => ’blastn’,’database’ => ’ecoli.nt’); $factory = Bio::Tools::StandAloneBlast->new(@params); $input = Bio::Seq->new(’-id’=>"test query“,’-seq’=>"ACTAAGTGGGGG"); $blast_report = $factory->blastall($input); Bioinformatica

Smith-Waterman o Blast2Seq • Deve essere installato (bioperl-ext): Use Bio::Seq; use Bio::Tools::pSW; Bio::Tools::StandAloneBlast; $seq1 = Bio::Seq->new(’-seq’=>’actgtggcgtcaact’,’-desc’=>’Sample Bio::Seq object’, ’-display_id’ => ’something’,’-accession_number’ => ’accnum’,’-moltype’ => ’dna’ ); $seq2 = Bio::Seq->new(’-seq’=>’actgtggcgtcaact’,’-desc’=>’Sample Bio::Seq object’, ’-display_id’ => ’something’,’-accession_number’ => ’accnum’,’-moltype’ => ’dna’ ); $factory1 = new Bio::Tools::pSW( ’-matrix’ => ’blosum62.bla’,’-gap’ => 12,’-ext’ => 2, ); $factory1->align_and_show($seq1, $seq2, STDOUT); #Allinea e mostra $aln = $factory1->pairwise_alignment($seq1, $seq2); # Allinea e restituisce un oggetto; $factory2 = Bio::Tools::StandAloneBlast->new(’outfile’ => ’bl2seq.out’); $bl2seq_report = $factory2->bl2seq($seq1, $seq2); # Usiamo AlignIO.pm per creare un oggetto SimpleAlign dal report di blast2seq $str = Bio::AlignIO->new(’-file ’=>’ bl2seq.out’,’-format’ => ’bl2seq’); Bioinformatica

ClustalW – TCoffee • Deve essere installato (bioperl-ext): Use Bio::Seq; use Bio::Tools::Run::Alignment::Clustalw; @params = (’ktuple’ => 2, ’matrix’ => ’BLOSUM’); $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); $ktuple = 3; $factory->ktuple($ktuple); # Cambia il parametro prima dell’esezuzione $seq_array_ref = \@seq_array; # @seq_array è un array di sequenze $aln = $factory->align($seq_array_ref); Bioinformatica

GenScan • Deve essere installato (bioperl-ext): use Bio::Seq; use Bio::Tools::Genscan; $genscan = Bio::Tools::Genscan->new(-file => ’result.genscan’); # $gene è una istanza di Bio::Tools::Prediction::Gene # $gene->exons() ritorna un array di oggetti Bio::Tools::Prediction::Exon while($gene = $genscan->next_prediction()){ @exon_arr = $gene->exons(); } $genscan->close(); Bioinformatica

Esempio: Formattare una sequenza • Legge da File una sequenza in formato FASTA e la riscrive in un altro file in formato EMBL: • Formati: Fasta, EMBL, GenBank, Swissprot, PIR, GCG, SCF, phd/phred, Ace, oppure raw (plain sequence); use Bio::SeqIO; $in = Bio::SeqIO->new('-file' => "inputfilename", '-format' => 'Fasta'); $out = Bio::SeqIO->new('-file' => ">outputfilename", '-format' => 'EMBL'); while ( my $seq = $in->next_seq() ) { $out->write_seq($seq); } Bioinformatica

Esempio: Formattare un allineamento • Legge da File un allineamento in formato FASTA e lo riscrive su un altro file in formato PFAM: use Bio::SeqIO; $in = Bio::AlignIO->new(’-file’ => "inputfilename" ,’-format’ => ’fasta’); $out = Bio::AlignIO->new(’-file’ => ">outputfilename“,’-format’ => ’pfam’); while ( my $aln = $in->next_aln() ) { $out->write_aln($aln); } Bioinformatica

Esempio: Accedere ad un DB (1) • Ricerca la sequenza ROA1_HUMAN sul DB di genbank, stampa Accession number, descrizione e sequenza (in formto FASTA): • Formati: Fasta, EMBL, GenBank, Swissprot, PIR, GCG, SCF, phd/phred, Ace, oppure raw (plain sequence); #!/usr/bin/perl use strict; use Bio::DB::GenBank; use Bio::Seq; use Bio::SeqIO; my $database = new Bio::DB::GenBank; my $seq = $database->get_Seq_by_id('ROA1_HUMAN'); print "Seq: ", $seq->accession_number(), " -- ", $seq->desc(), "\n\n"; my $out = Bio::SeqIO->newFh ( -fh => \*STDOUT, -format => 'fasta'); print $out $seq; Bioinformatica

Esempio: Accedere ad un DB (2) • Ricerca la sequenza ROA1_HUMAN sul DB di genbank, stampa Accession number, descrizione e sequenza (in formto FASTA): #!/usr/bin/perl use Bio::Perl; $seq_object = get_sequence("genbank","ROA1_HUMAN"); write_sequence(">roa1.fasta.txt",'fasta',$seq_object); Bioinformatica

Esempio: Accedere ad un DB (3) • Ricerca la sequenza AB077698 sul DB di genPept, e la stampa sul STDOUT: #!/usr/bin/perl -w use strict; use Bio::DB::GenPept; use Bio::DB::GenBank; use Bio::SeqIO; my $db = new Bio::DB::GenPept(); my $out = new Bio::SeqIO(-format => 'fasta'); my $acc = 'AB077698'; my $seq = $db->get_Seq_by_acc($acc); if( $seq ) { $out->write_seq($seq); } else { print STDERR "cannot find seq for acc $acc\n"; } $out->close(); Bioinformatica

Esempio: Accedere ad un DB (4) • Ricerca sul DB Taxonomy di NCBI (deve essere installato XML::Twig): #!/usr/bin/perl -w use Bio::DB::Taxonomy; my $db = new Bio::DB::Taxonomy(-source => 'entrez'); $node1 = $db->get_Taxonomy_Node(-taxonid => '9606'); $node2 = $db->get_Taxonomy_Node(-name => 'Homo sapiens'); $pnode = $node->get_Parent_Node(); $parentid = $node->parent_id; my @class = $node->classification; $node->name; $node->scientific_name; Bioinformatica

Bioinformatica BioPerl

Bioinformatica BioPerl

Presentation Transcript

Bioperl modules

BioPerl

BioPerl

Bioinformatica I

BioPerl - documentation

BioPerl – An Overview

BioPerl

BioPerl

BioPerl

Introducing Bioperl

Installing Bioperl

BioPerl

Bioperl modules

Bioinformatica I

BioPerl