570 likes | 773 Views
BridgeDb. Martijn van Iersel BiGCaT Maastricht. The 7 Virtues of Bioinformatics. Solve a problem Start small Modularity Design for code re-use Open Source Attention to detail Eat your own dog-food. Solve a problem. What problem are you solving?. Problem: Identifier Mapping.
E N D
BridgeDb Martijn van Iersel BiGCaT Maastricht
The 7 Virtues of Bioinformatics • Solve a problem • Start small • Modularity • Design for code re-use • Open Source • Attention to detail • Eat your own dog-food
Solve a problem • What problem are you solving?
Problem: Identifier Mapping Entrez Gene 3643 ? Agilent reporter A46_P45789
Problem: Usability • Check for double IDs • Check for missing IDs • Only 1000 at once • Check alignment of Excel columns • Manual • Error-prone
Solution: Built-in Mapping • Generic bioinformatics platforms should have identifier mapping built-in. BioConductor PathVisio Cytoscape ... Batteries Included
Solution: Built-in Mapping Entrez Gene 3643 Mappingservice Agilent reporter A46_P45789
Problem: Which mapping service? • Synergizer • EnsMart • DAVID • CRONOS • AliasServer • MatchMiner • OntoTranslate
Solution: Abstraction Layer classIDMapperRdbrelational database interfaceIDMapper class IDMapperFiletab-delimited text classIDMapperBiomart web service
CyThe-saurus NetworkMerge WikiPathways PathVisio Tools Cytoscape Plugins BridgeDb Internet webservices LocalDatabase Tab-delimitedtext files MappingServices BioMart PICR BridgeDb-REST BMC Bioinformatics. 2010 Jan 4;11(1):5
1: JAVA interface 2: REST interface BridgeDb interface
API Overview BridgeDb.connect(...) IDMapper.mapID(...) Xref.getUrl() DataSource.getUrl()
1: JAVA interface 2: REST interface BridgeDb interface
REST API http://webservice.bridgedb.org/Human/xrefs/L/1234 ILMN_1713029 Illumina 3255967 Affy NP_001025186 RefSeq IPI00005930 IPI GO:0042752 GeneOntology NM_033282 RefSeq 3255968 Affy 94233 Entrez Gene ENSG00000122375Ensembl Human 234226_at Affy A6NEB4 Uniprot/TrEMBL 0001780601 Illumina GO:0008020 GeneOntology 606665 OMIM A_23_P24234 Agilent 14449 HUGO
REST API http://<Base URL>/<Species>/<function> [ /<argument> ... ]\ http://webservice.bridgedb.org/Human/xrefs/L/1234 http://webservice.bridgedb.org/Human/search/ENSG00000122375 http://webservice.bridgedb.org/Human/attributeSet http://webservice.bridgedb.org/Human/properties http://webservice.bridgedb.org/Human/targetDataSources http://webservice.bridgedb.org/Human/attributes/L/3643 http://localhost:8183/Human/xrefs/L/3643
Problem: Custom Microarrays ? Custom probe #QXZCY!34
EnsMart Custom table Solution: Stacking
Entrez Custom microarray Ensembl Relation defined by mapping source A Relation defined by mapping source B Inferred, transitive relationship
MIRIAM Resources http://www.ebi.ac.uk/miriam/
Solution: MIRIAM Resources Regular expression for autodetection Pattern for generating URLs Link to documentation
The 7 Virtues of Bioinformatics • Solve a problem • Start small • Eat your own dog-food • Attention to detail • Modularity • Design for code re-use • Open Source
A Question to Linus Torvalds Q: “Do you have any tips for people who want to undertake a large open source project?” A: “Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large.… … If it doesn't solve some fairly immediate need, it's almost certainly over-designed.… …You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project”
Also from Linus Torvalds “I'm right and anyone who disagrees is stupid and ugly” “My name is Linus Torvalds and I am your god.”
Code Re-Use BAD Bioinformatician No Twinkie For you! • Reinventing the wheel is one of the 7 Deadly sins of Bioinformatics
Code Re-Use Q: How to design re-usable code? A: Actually use it in more than one project from the start Cytoscape bridgedb PathVisio
Open source • Public money -> Public code • Reproducibility • Academic ideal • Trust • Insurance against vendor lock-in
Open source • Now where are all those free programmers?
Open Source Web site Bug tracker Version control Mailing list
Eat your own dog food • Are you named “alkfdjlkdsf”? • Why not “Hélène O’Brian?” • …or “Bobby Tables”?
Eat your own dog food • Real data has missing values • Real data has commas instead of dots • Real data has duplicate identifiers • Real data starts with “ID” in the first cell* *Which Excel doesn’t like
Hallway usability testing • Grab a passer-by from the hallway and put them in front of your program • (We usually use students)