780 likes | 1.83k Views
KNIME. Visual Programming for Metabolomics. Stephan Beisken. Visual Programming.
E N D
KNIME Visual Programming for Metabolomics Stephan Beisken
Visual Programming • “Visual programming languages enable physicians and other computer users with little knowledge of programming to develop computer software. The physician uses a visual paradigm to "draw" the computer interface and then attaches short segments of computer code to buttons, menus, and list boxes.” Ebell, M. H. (1993). Visual programming languages. M.D. Computing : Computers in Medical Practice, 10(5), 305–11.
Motivation • Simplify your (working) life • Data processing and analysis requires various different tools to work together in sequence • Data input and output • Spreadsheets • Data transformation • Transposition, aggregation, string manipulation • IsaCreator • Formatting of tables
Agenda • Introduction • Tutorial • Installation and Extensions • Overview of the Workbench • Nodes and Table Models • Exercises • Introductory Examples • MassCascade • OpenMS • XCMS • Slides, software, workflows, and data for takeaway
Disclaimer • Workflows are great • It does not have to be KNIME, there are many other solutions • Every method that captures information in a consistent manner and enables reproducibility is great • Transparency • Ability to share data and ‘everything’ that was done to the data
Introduction • KNIME: Konstanz Information Miner • http://www.knime.org/ • Developed at University of Konstanz in Germany • Desktop version available free of charge (open source) • Modular platform for building and executing workflows using predefined components: nodes • Core functionality available for tasks such as data mining, analysis, and manipulation • Extra features and functionality available in KNIME through extensions from various groups (community) and vendors • Written in Java based on the Eclipse SDK platform
Workflow Concepts • Workflow execution • Can execute complex, multi-step operations on input data • Can be run be “non-experts” using predefined parameter templates ensuring optimal results • Can be set up for specific measurement systems • Can be shared across researchers
Functionality • Data manipulation and analysis • File & database I/O, sorting, filtering, grouping, joining, pivoting • Data mining and machine learning • R, WEKA, KNIME, interactive plotting • Cheminformatics • Conversions, similarity, clustering, (Q)SAR analysis, etc. • Scripting integration • R, Perl, Python, Matlab, Octave, Groovy • Reporting and much more • Bioinformatics, HTS & image analysis, network & text mining • Marketing, big data and business analytics
Modules (Community Extensions) • http://tech.knime.org/community • Chemoinformatics • CDK (EMBL-EBI), RDKit (Novartis), Indigo (GGA), • ErlWood(Eli Lilly), Enalos (NovaMechanics) • ChEMBL and ChEBI (EMBL-EBI) • Bioinformatics • OpenMS (Tübingen, ETH Zurich) • MassCascade (EMBL-EBI) • HCS (MPI), NGS (Konstanz), Image analysis • Integration • Python, Perl, R, Groovy, Matlab (MPI), PDB web services client (Vernalis), REST and SOAP web service support
Applications cont. Regression Calibration
Advantages Disadvantages • Intuitive to use • No or little programming experience required • Good for prototyping • Lots of functionality • Very modular and flexible • Active community • Extensible • Visual Feedback • Steep learning cure • Resource greedy • No (free) server edition • Slower execution than standalone scripts
Installation • Download and unzip KNIME • No further setup required • ./knime.ini contains arguments for launch • Install new modules (nodes) from update sites • Explorer and installation wizard provided • Workflows and data are stored in a workspace • ~/<user>/knime/workspace • C:\Users\<user>\knime\workspace • Preferences in: File Preferences KNIME
Workbench Auto-layout Execute Execute all nodes Node description tabs workflow projects favorite nodes public server workflow editor node repository outline console
Nodes • Node: Basic processing unit of a workflow • performs a particular task Input port(s) – on the left of icon Title Output port(s) – on the right of icon Icon • Status display (‘traffic lights’) • Red (not ready) • Amber (ready) • Green (executed) • Blue bar during execution (with percentage or flashing) Right-click menu To configure and execute the node, display the output views, edit the node, and display data for the ports Sequence number
Dialogs • Double-click opens configuration dialogs • Explicit column types
Tables Table rows Column specifications Various renderers Column types
Exercises: Preliminaries • Pre-installed KNIME Desktop 2.9.1 • Workflows • starters, xcms, openms, masscascade • Data • FAAH knockout LC/MS data • ESB tomato LC/MS QC data • ChEBI SDFile, KEGG SDFile • Plug-Ins (more in About KNIME Installation Details) • R (interactive) • Erl Wood, CDK • OpenMS, MassCascade
Exercises: Installation • Open your KNIME directory • ~/Desktop/knime_2.9.1 • ./knime.exe • Memory allocation • ./knime.ini
Exercises: Starters • More examples available from the Examples repository
Exercises: MassCascade https://bitbucket.org/sbeisken/masscascadeknime/wiki/ExampleWorkflows
Exercises: XCMS http://www.bioconductor.org/packages/devel/data/experiment/manuals/faahKO/man/faahKO.pdf
Exercises: OpenMS http://ftp.mi.fu-berlin.de/OpenMS/release-documentation/OpenMS_tutorial.pdf
Final Remarks • Workflows can make exploratory or repetitive data tasks easier and save time • Extensive data pre-processing functionality • Extensions for statistics, machine learning, bio-, and cheminformatics • Integration of R (XCMS) and spectrometry extensions can help you to build elaborate pipelines and share work • Can help to organize one’s thoughts. • It’s actually quite a bit of fun.
Resources • KNIME Forum • http://www.knime.org/ • KNIME Learning Hub • http://www.knime.org/learning-hub • QuickstartGuide • http://tech.knime.org/files/KNIME_quickstart.pdf • Happy to Help • beisken@ebi.ac.uk