1 / 27

why to become a Pyologist

why to become a Pyologist. Perl is for plumbers – Python is for biologists. Stefan Maetschke Teasdale Group. why. why, why, why …. Biologists suffer for no good reason Perl is difficult to write and read Perl gives weak error feedback Perl obscures basic concepts

davida
Download Presentation

why to become a Pyologist

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. why to become a Pyologist Perl is for plumbers – Python is for biologists Stefan Maetschke Teasdale Group

  2. why why, why, why … • Biologists suffer for no good reason • Perl is difficult to write and read • Perl gives weak error feedback • Perl obscures basic concepts • Limited understanding of principles • Low productivity • Reduced research scope • Perl is for plumbers - Python is for scientists • I want to have an easy life

  3. sys admin plumbing vi awk/Perl grep/diff scientist Python plumbers and others spectrum of tasks, tools and roles SW developer • designing • Emacs/IDE • C/C++/Java • UML/Unit test

  4. Perl Python Guido van Rossum Larry Wall 1991 1987 There should be one way There’s more than one way Easy Difficult equals( , ) Cross-platform, open-source, scripting language, multi-paradigm, dynamic typing, statement ratio: 6

  5. you must be joking! list = [ [‘a’, ’b’, ’c’], [1, 2, 3] ] print list[0] @list = ( [‘a’, ’b’, ’c’], [1, 2, 3] ); print “@{$list[0]}\n”; my @list = ('a', 'b', 'c'); my %hash; $hash{‘letters'} = \@list; print "@{$hash{‘letters'}}\n"; list = ['a', 'b', 'c'] hash = {} hash[‘letters'] = list print hash[‘letters'] class Person: def __init__(self, age): self.age = age package Person; use strict; sub new { my $class = shift; my $age = shift or die "Must pass age"; my $rSelf = {'age' => $age}; bless ($rSelf, $class); return $rSelf; } http://www.strombergers.com/python/

  6. More Perl bashing… def add(a, b): return a + b sub add { $_[0] + $_[1]; } sub add($, $) { local ($a, $b) = _@; return $a + $b; } sub add { my $a = shift; my $b = shift; return $a + $b; } sub add { my ($a, $b) = _@; return $a + $b; } sub diff { my ($aref, $bref) = _@; my (@a) = @$aref; my (@b) = @$bref; return scalar(@a) + scalar(@b); } def diff(a, b): return len(a) - len(b) http://www.strombergers.com/python/

  7. Higher order concepts Data structures Functions Classes complexity wall everything you can do in Python you can do in Perl but you don’t simple scripts ≈ 100 lines=> fun stops => Python allows you to break through the complexity wall

  8. googliness • C 53,000 1,820 572 • Java 7,760 2,890 320 • C++ 1,290 3,100 231 • C# 1,020 794 161 • Perl 1,150 685 101 • Python 527 798 199 • Ruby 470 806 186 • Scala 394 354 69 • Haskell 212 323 74 X language X load file X bioinformatics kilo-hits, May 2008

  9. and the winner is… <- without Psyco http://shootout.alioth.debian.org/

  10. damn lies and stats sourceforge projects • Perl declining, Python increasing ? • May 2008, keyword search: Perl 3474, Python 4063 http://rengelink.textdriven.com/blog/

  11. Four attributes: • sepal length • sepal width • petal length • petal width see the light… classify Iris plants • Three species: • Iris setosa • Iris versicolor • Iris virginica http://archive.ics.uci.edu/ml/datasets/Iris Fisher, R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936)

  12. Iris – convert data

  13. Iris – correlation

  14. Iris – do stats

  15. Iris – linear regression

  16. Iris – plot data

  17. libs for life science • Scientific computing: SciPy, NumPy, matplotlib • Bioinformatics: BioPython • Phylogenetic trees: Mavric, Plone, P4, Newick • Microarrays: SciGraph, CompClust • Molecular modeling: MMTK, OpenBabel, CDK, RDKit, cinfony, mmLib • Dynamic systems modeling: PyDSTools • Protein structure visualization: PyMol, UCSF Chimera • Networks/Graphs: NetworkX, PyGraphViz • Symbolic math: SymPy, Sage • Wrapper for C/C++ code: SWIG, Pyrex, Cython • R/SPlus interface: RSPython, RPy • Java interface: Jython • Fortran to Python: F2PY • … Check also out: http://www.scipy.org/Topical_Softwareand: http://pypi.python.org/pypi

  18. last words • Perl perfect for plumbing • Python excellent for scientific programming • Easy to learn, write and maintain • Suited for scripting and mid-size projects • Huge number of scientific libraries • Python is an attractive alternative to Matlab/R • Easy integration of Java, C/C++ or Fortran code

  19. questions isn’t Python lovely… Interest: Python Course?

  20. links • Wikipedia – Pythonhttp://en.wikipedia.org/wiki/Python • Instant Pythonhttp://hetland.org/writing/instant-python.html • How to think like a computer scientisthttp://openbookproject.net//thinkCSpy/ • Dive into Pythonhttp://www.diveintopython.org/ • Python course in bioinformaticshttp://www.pasteur.fr/recherche/unites/sis/formation/python/index.html • Beginning Python for bioinformaticshttp://www.onlamp.com/pub/a/python/2002/10/17/biopython.html • SciPy Cookbookhttp://www.scipy.org/CookbookMatplotlib Cookbookhttp://www.scipy.org/Cookbook/Matplotlib • Biopython tutorial and cookbookhttp://www.bioinformatics.org/bradstuff/bp/tut/Tutorial.html • Huge collection of Python tutorialhttp://www.awaretek.com/tutorials.html • What’s wrong with Perlhttp://www.garshol.priv.no/download/text/perl.html • 20 Stages of Perl to Python conversionhttp://aspn.activestate.com/ASPN/Mail/Message/python-list/1323993 • Why Pythonhttp://www.linuxjournal.com/article/3882

  21. some papers • Bassi S. (2007) A Primer on Python for Life Science Researchers. PLoS Comput Biol 3(11): e199. doi:10.1371/journal.pcbi.0030199 • Mangalam H. (2002)The Bio* toolkits--a brief overview. Brief Bioinform. 3(3):296-302. • Fourment M., Gillings MR. (2008)A comparison of common programming languages used in bioinformatics.BMC Bioinformatics 9:82.

  22. to whom it may concern NP = Non-Programmer • NPs who don’t use Perl yet • NPs who want to see the light • NPs who want to give their code away without being rightfully ashamed • Matlab aficionados

  23. one of ten Perl myths http://www.perl.com/pub/a/2000/01/10PerlMyths.html “…we can happily consign the idea that ‘Perl is hard’ to mythology.” Swap two sections of a string: “aaa:bbb” -> “bbb:aaa” “…Perl works the way you do…” while (<>) { s/(.*):(.*)/$2:$1/; print; } while (<>) { chomp; ($first, $second) = split /:/; print $second, ":", $first, "\n"; } “…That's one, fairly natural way to think about it…” from re import sub for line in file: print sub(‘(.*):(.*)’, r’\2:\1’, line) for line in file: line = line.strip() first, second = line.split(‘:’) print second+’:’+first

  24. camel chaos • does not scale well • complex syntax • cryptic commands • does not encourage clear code • difficult to read/maintain • hard to understand the principles • error prone • no check of subroutine arguments • variables are global by default • …

  25. why Python • overcome the complexity wall • many, excellent scientific libraries • clear, easy to learn syntax • hard to do it wrong • does not require prior suffering/experience

  26. my bias • R&D: C/C++ -> applied ML in robotics, image processing, quality control • SW Development: Java -> Speech Processing, Data Mining • Computational Biology: Java, Python • Other languages I played with: Ada, APL, Basic, MatLab, Modula, Pascal, Perl, Prolog, R, Groovy, Forth, Fortran, Scala, Assembly code

More Related