280 likes | 298 Views
Introducing a cutting-edge search interface allowing users to dynamically control queries and visualize data for in-depth understanding. This tool simplifies complex database searches, especially for non-experts, across multiple scientific fields.
E N D
Simple search interface • Strengths: • simple, easy to use form • allows multiple search fields to be combined • relatively fast, despite performing quite complex SQL queries • Weaknesses: • not exposing the power of a relational database • user can't specify the relationship between search fields: • "name" AND "title" AND "keyword" • "name" OR "title" OR "keyword" • ( "name" OR "title" ) AND NOT "keyword" • the search form is defined by the authors of the search system, not the author of a query
Describing complex searches • We want to allow the user to entirely control their query • Since HTML forms are inherently static, we'll use an applet to provide a dynamic "form" that will let the user: • choose the fields to be searched • specify the relationships between search fields • choose the result fields and how results are presented • perform "complex" sub-queries e.g. SSM, FASTA
Graphical DB search system • MSDpro uses an applet for constructing queries and a server to execute them • Avoids the need for the user to understand a complex database schema or know SQL • The user describes their query entirely graphically, including logical operations such as AND, OR and NOT • Applet generates an XML description of the user’s query, which is sent to the MSD query server and converted to SQL automatically
Automatic SQL generation • The query server is a Java servlet: • accepts a query description as XML • converts the user’s query description into a true SQL query, which is then submitted to the search database • Searches can include components that are executed outside of the database, e.g. sequence similarity, determined using FASTA or structural similarity, determined using SSM
Visualisation The process of representing abstract data to aid in understanding the meaning of the data. Not to be confused with rendering data (drawing pictures) Typically though, we render data in such a way to visualize the information within that data.
Introduction Biological data comes from & is of interest to: Chemists : reaction mechanism, drug design Biologists : sequence, expression, homology, function. Structure biologists : atomic structure, fold, classification, function. Medicine : clinical effect Education : Media : Presentation of diverse information to a diverse audience. Each has there own point of view (context). Expert = scientist working within their own field of expertise Non-expert = scientist using data/information outside their field Novice = Non-scientist
Web pages These are notoriously badly designed often resulting in the information on that site being unusable. The front page should load quickly The main point should appear on the first full screen Clutter – not logically laid out Too busy – cannot find the salient point 8% men & 0.5% women are colour blind Bad text/fonts Too often it doesn’t work User will go somewhere else The latest wiz-bang stuff only works on the latest browsers Only works in one browser – they only tested on one. Does not conform to standard HTML Not just presentation of results Google is a good design
Asking questions Asking questions • Biological data is very complex • Chemistry, Biology, Physics, Statistics, Medicine.. • Most users will be from a different field • Asking the right question is difficult. • The user cannot use the correct terminology • Too many things to query (2000 attributes in MSD) • SQL : not suitable for most users • Interface too complex • Too many check boxes, widgets etc • Trying to be too clever • The “Go” button is buried somewhere
Result presentation Results Biological data is complex • Chemistry, physics, biology, statistics, medicine… Experts users want all the detail • Ie : want to use a specific method • They want all the details • The want (I hope) the statistical validity of the results The non-expert wants the best practice answer returned within their own context. • The want comparative analysis with other fields • The want to know the results are valid
Query design Suitable for text queries Only one logic AND or OR Predefined Easy to use Limited scope 2000 attributes -> 2000 check-boxes ! The simple text box design is very common
Query design Graphical interface Multiple logic AND/OR/NOT Under users control Slower Steep learning curve Some users just cannot get it Intuitive once mastered Pretty
Query design Figurative 2D sketch for 3D query (Active sites) Informative – presents meaning for the question Slower Less error prone select distinct entry_id, ligand_id from contact_search sel where neighbour_code_3_letter in ('SER','HIS') and DISTANCE <= 2.0 and type_id = 1 and neighbour_substruct_code = 'side' and MACROMOL_SEC_STRUCT_TYPE = 1 intersect select distinct entry_id, ligand_id from contact_search sel where neighbour_code_3_letter = 'HIS' and ( NEIGHBOUR_ATOM_NAME = 'NE2' and type_id = 1 and distance <= 2.0 or NEIGHBOUR_SYMBOL = 'N' and type_id = 1 and distance <= 2.0) and TYPE_ID != 0 group by entry_id, ligand_id having count(distinct neighbour_residue_id) >= 2 intersect select distinct entry_id, ligand_id from contact_search sel where neighbour_code_3_letter = 'HIS' and NEIGHBOUR_ATOM_NAME = 'NE2' and DISTANCE <= 2.0 and type_id = 1 and neighbour_substruct_code = 'side' and MACROMOL_SEC_STRUCT_TYPE = 2 intersect select distinct entry_id, ligand_id from contact_search sel where neighbour_code_3_letter = 'HIS' and NEIGHBOUR_SYMBOL = 'N' and DISTANCE <= 2.0 and type_id = 1 and neighbour_substruct_code = 'side' and MACROMOL_SEC_STRUCT_TYPE = 3 intersect select distinct entry_id, ligand_id from residue_contact sel where neighbour_code_3_letter in ('HIS','SER','HIS') and BOND_STRENGTH != 10 group by entry_id, ligand_id having count(*) >= 3; HIS|SER:S/H>C2.0 HIS.ne2:S/S>C2.0 HIS.[n]:S/T>C2.0
YAMGP (yet another molecular graphics program) Many different programs are available VMD AstexViewer@MSD-EBI LigPlot Quanta InsightII Bobscript WebMol Frodo iMol Chime Grasp Pymol POVRay Spock Rasmol Pymol Mage Raster3D Yasara Molscript Chimera O MolMol Whatif Frodo XtalView WebLab-viewer Swiss-PDBviewer
Result visualisation Multiple types of biological data • Textual data • 3D structure • 2D chemical sketches • 1D sequence • Node linked • General/derived data • Web pages • Errors/Variance • Data provenance
AstexViewer@MSD-EBI • Java 1.1 Applet • Should run under most browsers • Small footprint, high speed. • Structure • Line, stick, ball & stick, sphere, schematic, surface + texture map. • Written by Mike Hartshorn (Astex therapeutics Ltd). • Multiple structures supported
AstexViewer@MSD-EBI Sequence • Multiple sequence alignment • Editing, • Annotation, colours… • Consensus alignment • Pick, Brushing & Magic lens
Chemistry 2D flat representation Annotation, colours… Interaction types Placement fn(contact distance) Editable Pick, Brush and magic lens
Graphs Graphs • 2D, 2D grid and ND • Linkage plots • Annotation, colours… • Ramachandran, etc… • Pick, Brush Magic Len
AstexViewer@MSI-EBI Visualisation Lensing Linked views Brushing Picking Flying views Hyperbolic distortion Animation Solid rendering Depth cues Colour,lighting Highlighting Etc…
Visualisation : comparative analysis Similarity/Difference Data superposition Attribute display Colour, size… Correlation Attribute mapping Sequence colour by structure alignment