220 likes | 234 Views
Bioinformatics Toolbox. Yaohang Li Department of Computer Science North Carolina A&T State University. Bioinformatics Toolbox. Extends MATLAB to provide an integrated software environment Genome Analysis Proteome Analysis Applications Drug Discovery Genetic Engineering
E N D
Bioinformatics Toolbox Yaohang Li Department of Computer Science North Carolina A&T State University
Bioinformatics Toolbox • Extends MATLAB to provide an integrated software environment • Genome Analysis • Proteome Analysis • Applications • Drug Discovery • Genetic Engineering • Biological Research
Functionalities of Bioinformatics Toolbox • Data Analysis Functions • Connecting to Web accessible databases • Reading and converting between multiple data formats • Determining statistical characteristics of data • Manipulating and aligning sequences • Modeling patterns in biological sequences using Hidden Markov Model (HMM) profiles • Reading, normalizing, and visualizing microarray data • Creating and manipulating phylogenetic tree data • Interfacing with other bioinformatic software
Functionalities of Bioinformatics Toolbox • Prototype and Develop Algorithms • Visualize Data • Sequence alignments • Gene expression data • Phylogenetic trees • Protein structure analysis • Share and Deploy Applications • Create stand-alone applications • GUI interface
Installation • Required Software • MATLAB • Statistics Toolbox • Additional Software • Signal Processing Toolbox • Image Processing Toolbox • Optimization Toolbox • Neural Network Toolbox • Database Toolbox • MATLAB Compiler
Data Formats and Databases • Web-based databases • GenBank (getgenbank) • GenPept (getgenpept) • European Molecular Biology Laboratory EMBL (getembl) • Protein Sequence Database PIR-PSD (getpir) • Protein Data Bank PDB (getpdb) • Raw Data • Read data generated from gene sequencing instruments • Reading/Writing Data Formats • Sequence data • Multiply Aligned Sequences • Gene Expression Data from Microarrays
Sequence Analysis • Sequence Analysis • Find information about a nucleotide or amino acid sequence • Using computational methods • Tasks • Identify genes • Determine the similarity of two genes • Determine the protein coded by a gene • Determine the function of a gene by finding a similar gene in another organism with a known function • Example • Sequence Statistics • Sequence Alignment
Sequence Statistics • Task • Starting with a DNA sequence, calculate statistics for the nucleotide content • Example: Determining Nucleotide Content • Task • Studying the human mitochondrial genome • While many genes that code for mitochondrial proteins are found in the cell nucleus, the mitochondrial has genes that code for proteins used to produce energy • Procedure • Find the nucleotide sequence for the genome • Look at the nucleotide content for the entire sequence • Determine open reading frames and extract specific gene sequences
Determining Nucleotide Content • Step 1: • Use Matlab help browser to explore NCBI website • Step 2: • Search NCBI website for information • Step 3: • Select a result page
Getting Sequence Information into MATLAB • MATLAB provides an integrated environment for bringing sequence information into MATLAB • Get sequence information from a Web database • You can also load the sequence from a MAT file • Get information about the sequence
Determining Nucleotide Composition • Knowledge • Sections of a DNA sequence with a high percent of A+T nucleotides usually indicates intergenic parts of the sequence • Low A+T and higher G+C nucleotide percentages indicate possible genes • High CG dinucleotide content is located before a gene • Statistics functions of bioinformatics toolbox • Determine if the sequence has the characteristics of a protein-coding region
Determining Nucleotide Composition (II) • Count the nucleotide • basecount.basecount(mitochondria) • In the reverse complement of a sequence • Basecount(seqrcomplement(mitochondria)) • Show the pie chart
Determining Codon Composition • Background • Trinucleotides (codon) code for an amino acid • 64 possible codons • Knowing the percentage of codons in a sequence can be helpful when comparing with tables for expected codon usage • Bioinformatics toolbox • Count condons in a nucleotide sequence • codoncount(mitochondria)
Amino Acid Conversion and Composition • Determining the relative amino acid composition • Characteristic profile for the protein • Amino acid composition • Atomic composition • Molecular weight • Convert a nucleotide sequence to an amino acid sequence
Amino Acid Conversion and Composition (cont.) • Count the amino acids in the protein sequence • aacount(ND2AASeq, ‘chart’, ‘bar’) • Determine the atomic composition and molecular weight of the protein
Sequence Alignment • Task • Determine the similarity between two sequences • Example • Starting with a DNA sequence for a human gene, locate and verify a corresponding gene in a model organism
Comparing Amino Acid Sequences • Convert the DNA sequence to Amino acid sequences • Draw a dot plot comparing human and mouse amino acid sequence
Global Alignment • Align two amino acid sequences • Using Needleman-Wunsch algorithm
DNA Microarray Data Analysis • DNA Microarray • A parallel snapshot of gene activities • Simultaneously measure the activity and interactions of genes • Insights into mechanisms of living systems • Scientific Tasks • Identification of coexpressed genes • Discovery of sample or gene groups with similar expression patterns • Identification of genes whose expression patterns are highly differentiating with respect to a set of discerned biological entities • Study of gene activity patterns under various stress conditions
Microarray Analysis • Microarray Data • Research the function of cells • Compare the differences between healthy and diseased tissue • Observe changes with the application of drugs • Example • Visualizing Microarray Data • Analyzing Gene Expression Profiles
Statistics of Microarray • Look at the distribution of data in each of the blocks
Other Functions • Phylogenetic Tree Tool • Protein Structure Analysis • Data Visualization