520 likes | 681 Views
Sajid Khan . Chapter 5. Data Visualization. Deoxy Human Hemonglobin . PDB entry 1A3N. Image produced with PDB Structure Explorer . Root Mean Squared Deviation (RMSD).
E N D
Deoxy Human Hemonglobin. PDB entry 1A3N. Image produced with PDB Structure Explorer.
Root Mean Squared Deviation (RMSD) The degree of similarity is often expressed as a Root Mean Squared Deviation (RMSD), which represents the distance between the corresponding atoms in each molecule. Similar structures typically have an RMSD in the 1–3 Angstrom range, with larger RMSD values corresponding to greater deviations in similarity. However, as the size of the protein increases, the minimum RMSD to qualify for what is considered a good fit increases. Whereas an RMSD of 10 Angstroms would be considered a poor fit for a small protein, it might be considered excellent for a longer protein with several hundred amino acids.
The Challenge of Structure Comparison. Each pair of protein backbones has the same RMSD value, but different relative amounts of structure similarity. Visualization, together with the RMSD value, provides the best indicator of structure similarity. A—Uniformly Distributed Difference B—Localized Difference C—Significant Difference with Few Atoms D—Small Difference with Many Atoms.
Sequence Visualization Sequence Maps Map Viewer. NCBI's Map Viewer program integrates physical and genetic map information for specific sequences, proteins, and genes. This view shows the position of the gene associated with type 2 neurofibromatosis, located on chromosome 22.
Map Viewer Map Viewer provides a graphic depiction of nucleotide sequences through a composite of genetic, cytogenetic, physical, and radiation hybrid maps, each of which have their particular uses. Genetic maps show the relative position and order of genes and other sequences on a chromosome, and serve as high-level approximations of relative distances between sequences. Cytogenic maps provide a gross indication of the position of exons and entrons along a chromosome, based on optical microscope techniques. Physical maps show the actual physical location of sequences on a chromosome. Radiation hybrid maps link genetic and physical maps.
Gene Mapping Processes Sequence mapping involves first breaking up the chromosome at random into large fragments, which are then cloned with bacteria to make a bacterial artificial chromosome (BAC).
Structure Visualization One of the primary activities in proteomics R&D is determining and visualizing the 3D structure of proteins in order to find where drugs might modulate their activity. In contrast to visualizing the sequence of nucleotides on a strand of DNA, visualizing the primary structure of a protein adds little to the knowledge of protein function. Barring the introduction of some new technology, cataloging, interpreting, and dissecting the proteome will take many years. Unlike a nucleotide sequence, which is a relatively static structure, proteins are dynamic entities that change their shape and association with other molecules as a function of temperature, chemical interactions, pH, and other changes in the environment.
Visualization Tools Visualization Technologies. Visualization tools leverage the pattern-recognition capabilities of the viewer's visual apparatus as opposed to the logical, intellectual capabilities that can be more easily saturated.
Rendering Tools Most of the imaging work in bioinformatics involves data from the Protein Data Bank (PDB) or the Molecular Modeling Database (MMDB). Glutamine Synthetase is 1FPY.
Graphical representation. • Using graphical representations of dataprovides added meaning and context. • It aids in understanding as those with littleknowledge of the subject maybe able tocomprehend the results. • It leaves less room for confusion.
Graphical representation and bioinformatics • The majority of the data is in abstract formthat needs visualization technologies toenhance user understanding. • This need is more pronounced in the areas ofsequence visualization, user interfacedevelopment, protein structure visualizationand as a complement to numerical analyses,especially statistical analysis.
Graphical representation and bioinformatics • In each application area, the rationale for using graphics instead of tables or strings of data is to shiftthe user’s mental processing from reading andmathematical, logical interpretation to faster patternrecognition. • The perceptual clues in graphical displays can enhanceimmediate understanding of the data being presented. • While providing context and indication of relative importance, relationships, that would otherwise be incomprehensible, are also brought forward. 1/29/2009
Sequence Visualization • Working with strings that represent nucleotidesequences is like programming in machinecode because, although possible, it is arduous,error-prone and time-consuming process thatdoesn’t lead to efficiency or easy maintenanceand one that requires extensive programdocumentation.
Sequence Visualization • A step up from machine code is Assemblylanguage, which allows programmers to usemnemonics such as “CLR” to clear a buffer and“ADD” to add two values. • forced to think in terms of low-level CPUinstructions. • constantly switch between a high-level problemsuch as how to best rotate a molecule in 3-Dspace and a low-level problem such as whetherto use integer or float in the rotation algorithm.
Sequence Visualization • Further up the programming hierarchy arelanguages such as C++, Java, perl and HTMLthat insulate programmers from theunderlying computational hardwareinfrastructure and allow them to work at alevel nearer the application purpose.
Sequence Visualization • Higher still are the flow diagrams or storyboards -maps of sorts - that provide a graphic overview ofthe application that can be understood andcritiqued by non-programmers. • Returning to the nucleotide sequence work, theparallel to these storyboards are gene maps -high-level graphic representations of wherespecific sequences reside on a chromosome.
Sequence Maps • When it comes to visualizing nucleotidesequences, the obvious organizationmetaphors are the amino acids, proteins,chromosomes segments and genes. • Gene maps provide a high-level view of relative and absolute gene and nucleotidesequence location.
Map Viewer Map Viewer program integrates physical and genetic map information for specific sequences, proteins and genes. It is part of NCBI’s Entrez integrated system and providesa composite interface to several of NCBI’s online databases.
Map Viewer • enables users to identify a particular gene location with an organism’s genome, the distancebetween genes and the sequence data for a genein a particular chromosomal region. • provides a graphic depiction of nucleotide sequences through a composite of genetic, cytogenetic, physical and radiation hybrid maps,each of which have their particular uses.
Map Viewer • It illustrates how the main computational challenge in visualizing linear nucleotide sequences lies in integrating data from multipledatabases. • The sequences represented by the sequence maps are one-dimensional so there is relativelylittle computational overhead involved. • Sequences culled from NCBI’s sequential databases are mapped onto the appropriategraphic and relevant links are provided to thecorresponding databases.
Genetic Maps • Show the relative position and order of genes andother sequences on a chromosome. • Serve as high-level approximations of relativedistances between sequences. • Measured in terms of recombination frequency. • Useful for a researcher who, for example, is interested in the probability that the genes willseparate during meiosis.
Physical Maps • Show the actual physical location ofsequences on a chromosome. • Too detailed and difficult to work through. • Resolution of the map depends on themethodology used to create it. • Simplest form is cytogenic mapping.
Cytogenic Maps • Provide a gross indication of the position ofexons and entrons along a chromosome. • Based on optical microscope techniques. • Most appropriate, for example, for a researcher interested in quickly estimating therelative amount of DNA on a chromosomethat is involved in coding.
Radiation Hybrid Maps • Most valuable mapping techniques link genetic andphysical maps. • Most common methods involve: - radiation hybrid (RH) mapping • Can be used to reveal the distance between genetic markers by exposing DNA measured doses of radiation, which causes the DNAto break up. By varying the amount of radiation, the averagedistance between DNA sequence breaks can be modified. • Can be used to localize virtually any genetic marker. - simple sequence length polymorphisms (SSLPs) • SSLPs are arrays of repeat sequences that display length variations. • SSLPs can serve as both a genetic marker and a basis for sequencemapping - a Rosetta Stone of sorts.
Accuracy of Mapping • Dependant on computational methods usedto manipulate the data acquired byexperimentation or modeling. • The typical process involves an integration ofseveral mapping approaches.
Gene Mapping Processes Cut Assign Genetic Markers Sequence Gene Physical Map Genetic Map Frag Create BACs Sequence Frag BACs Gene Mapping Processes. A variety of techniques are available for creatingphysical and genetic maps.
Sequence Mapping - the process • Break up the chromosome at random into largefragments • Clone these with bacteria to make a bacterial artificial chromosome (BAC). • Order the BACs to maximize the contiguous regionwhile using the minimum number of BACs. • Break BACs to <500 nucleotides. • Sequence each fragment. This defines each contiguousregion. • The result is a physical map that may have a few gapsbetween contiguous regions.
Structure Visualization • A nucleotide sequence is a relatively staticstructure. • Proteins are dynamic entities. They change theirshape and association with other molecules as afunction of: - Temperature - Chemical interactions - pH - Changes in the environment
Visualization Tools • Hundreds of visualization tools available. • Many tools are hardware- or process-specific. Visualization Tool Example Nucleotide Location Map Viewer Protein Structure SWISS-PDBViewer, WebMol, RasMol, Protein Explorer, Cn3D, VMD, MolMol, MidasPlus, Pymol, Chime, Chimera User Interface Third-Party Browsers, VRML, Java Applets, C++ General-Purpose Microsoft Excel, Starta Vision 3D, Max3D, 3D-Studio, Ray Software Dream Studio, StatView, SAS/Insight, Minitab, Matlab General Purpose Stereo Goggles, Data gloves, 3D (Stereo) Displays, Haptic Hardware Devices
Rendering Tools • Most of the imaging work in bioinformatics involves data from Protein Data Bank (PDB) orModeling Database (MMDB). • Searching for a structure is typically throughprotein name or ID.
Rendering Tools 1/29/2009
Rendering Tools • Representative protein structure renderingprograms available as free downloads fromthe internet include RasMol, PyMol, SWISS-PDBViewer and Chimera. • Following is a summary of the features ofthese programs:
Comparison Feature RasMol Cn3D PyMol SWISS-PDBViewer Chimera Architecture Stand-alone Plug-in Web-enabled Web-enabled Web-enabled Manipulation Power Low High High High High Hardware Low/Moderate High High Moderate High Requirement Ease of Use High; command-line Moderate Moderate High Moderate language command-line language and GUI Small size; very easy Powerful; GUI; ray- Powerful; GUI; built- Special Features Powerful; GUI Powerful; GUI to install and use; tracing option in extensions for established user collaboration base; highly portable Output Quality Moderate Very high High High Very high Documentation Good Good Limited Good Very good Online and user Online and user Online and user Online and user Online and user Support groups groups groups groups groups Speed High Moderate Moderate Moderate Moderate/Slow OpenGL support Yes Yes Yes Yes Yes Extensibility No No Yes; supports No Highly extensible; Python supports Python Operating Systems Universal Universal Universal Universal Universal
Rendering Tools • The selection of a protein structure renderingprogram should be a function of: - Ease of use - Power - Speed - Special features - Cost - Hardware requirements - Documents and support - Overall functionality
Rendering Tools • The more complex the rendering output, thegreater the computational load, and the moretime required to render each image. • Often, time and performance limitationsdictate the use of a simple, fast renderingpackage such as RasMol for day-to-dayrendering, and one of the higher-endpackages, such as Chimera, for publication-quality output.
User Interface • Hides the intricacies of the computerhardware and software. • Presents users with images, sounds andgraphics. • Allows users to interact on a cognitive level. • Focuses the attention on what is beingpresented.
User Interface • Every computer application and every workstation has a user interface defined byhardware and software. • A computer can run anything from a OS to web-browser but the usability, usefulness andaccessibility of associated data is defined bythe user interface.
User Interface • The user interface determines the density ofthe information that can be presented to theuser. • This is defined by the Information Theorywhich suggests that user interface is themedium through which the data flow.
User Interface and Information Theory Relevant & Irrelevant Data Relevant Data User Interface Information Transmitter Medium Receiver Destination Source Application Interface Noise Eyes, Ears & User Proprioceptors Awareness Hardware Source Irrelevant Data 1/29/2009
User Interface and Information Theory • An application such as a 3D protein visualizationtool, is the information source. • The data created by the application is themessage. • The computer interface hardware, including thevideo card and monitor, is the transmitter. • The user interface, including buttons and othergraphics rendered on the computer monitor,serves as the medium.
User Interface and Information Theory • The irrelevant data includes components ofthe system that interfere with the messagegenerated by the application such as - Superfluous graphics - Distracting colours - Other data that serves to confuse users
User Interface and Information Theory • The receiver is the user’s perceptualapparatus, including - eyes for visual content - ears for audio content - proprioceptors for tactile or haptic content • Finally, the message, now containing relevantand irrelevant data, reaches the ultimatedestination - the user’s awareness.
User Interface • Being the medium, it is the major bandwidth-limiting element in the delivery of data fromthe application to the user. • Everything that affects the effectiveness of theuser interface affects delivery of data. • Users don’t need to know anything about thecomplicated underlying processes of theapplication.
User Interface Components • Designing an interface involves more than simply deciding on the layout for buttons andcheck boxes on a display. • Even the simplest user interface is - Complex - Multi-tiered - Supports communication
User Interface Components • The user interface minimally consists of a physical interface between the user and thecomputer. • It may also include - graphical - logical - emotional and - intelligent components.