280 likes | 641 Views
Galaxy (of bioinformatics) Overview. Martin Senger <martin.senger@kaust.edu.sa> [using also few slides from the presentations from the Galaxy Developers Conference 2011]. Three basic themes. What Galaxy can do... (or could do) Show me where I can try for myself
E N D
Galaxy(of bioinformatics)Overview Martin Senger <martin.senger@kaust.edu.sa> [using also few slides from the presentations from the Galaxy Developers Conference 2011]
Three basic themes... • What Galaxy can do... (or could do) • Show me where I can try for myself • What can we do to make our Galaxy better ...and what this is not • a detailed tutorial how to use Galaxy • a way to convince you that I understand everything about Galaxy
What is Galaxy • A web-based interface to the command-line tools (of any kind) and their combinations (“workflows”) • Galaxy performs analysis interactively through the web, on arbitrarily large datasets • Galaxy remembers what it did - history • Flexibility to include anybody’s command-line tools • by writing wrappers whose templates are available • An environment for sharing tools (or their wrappers) • “Tools Shed” repository
Galaxy has data... (well, “datasets”) • Locally stored data • user-specific • shared between users • e.g. genome builds • Origin of data • uploaded data from your computer • using a web interface • using an FTP server • fetched from external databases (“datasources”) • only those that are “aware” of Galaxy • internally: two ways how to fetch data (async vs. sync.) • you need to be familiar with these databases and their UIs
Datasource – an example 1 2 3
Galaxy has data...and data have types • Data have metadata • allowing to use data only for those tools that recognize such data types • Data have attributes • annotate data • convert data to a new format • change data type
Finally, Galaxy can do workflows... • Automated set of steps – perhaps each time with different input data (of the same type) • reproducibility (usable in publications) • reusability (sharing workflows with others) • created from the scratch (using a workflow editor) or from your history
An example – a workflow editor Thanks to:
Users creating non-trivial workflows user would not have done this from the command line on our cluster
http://main.g2.bx.psu.edu/screencast • If we have time (6mins) click here: • Creating a workflow from your history
There are many ways to skin a cat... • Where are all these galaxies? • public servers • available immediately, free of charge • http://main.g2.bx.psu.edu/ • and few others, such as http://galaxy.nbic.nl/ • usually limited resources • you cannot customize them to your special needs • KAUST/CBRC Galaxy • http://galaxy.cbrc.kaust.edu.sa/ • running on an internal cluster with limited resources • but we can do with it whatever we need to do • Galaxy in the Amazon clouds (CloudMan) • when you do not have infrastructure in house • when you have particular resource (cores, memory...) needs • when you need a customization • if you have a credit card • details in this presentation: • http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=CloudManGalaxyOnTheCloud.pdf • Galaxy has also the RESTfull API for programmatic access (beta)
Our KAUST/CBRC GalaxyThere’s no such thing as a free lunch... ...we need to: Image courtesy of http://mychinaconnection.com/english-proverb/there-is-no-free-lunch/
How to make a better use of our Galaxy • Data issues • add genome-wise data we (CBRC) need • add data usable for others (Core, students...) • Tools • make a subset of tools we really need and test them fully • consider to wrap other tools (not yet available by default) • Logistics • provide user-oriented courses • create a user group to share experience and to promote knowledge • monitor its stability and usage • Hardware/sysadmin issues • Install it on better hardware (in due time) • Change the current queue priority (a chicken-egg problem) • Add an ftp server
Thank you. Any questions please? More info: • Galaxy home page: • http://galaxy.psu.edu/ • An overview presentation: • http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=IntroductionSession.pdf