1 / 14

Session V: Life Science Identifiers - Use Cases, Future Directions

Explore LSIDs in bioinformatics, their use in GenePattern, sharing samples, and future directions for identifying biological objects structurally. Learn lessons from early implementations and barriers to use.

tfowlkes
Download Presentation

Session V: Life Science Identifiers - Use Cases, Future Directions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session V: Life Science Identifiers - Use Cases, Future Directions

  2. Recent History • LSIDs 3 years old • I3C evaluating AGAVE, BSML • encoded IDs as tuples/triples • If we could not agree on a data standard, could we at least agree on how we write the identifiers

  3. Today • OMG Spec • google “+LSID +bioinformatics” • 686 results (10/27/04, 2:40pm) • 700 results (10/27/04, 7:20am)

  4. Broad Use Cases

  5. How GenePattern is using LSIDs • Identify analysis tasks and pipelines via LSIDs • Create sharable pipelines referencing tasks via LSIDs • Provide a repository and retrieval for analysis tasks by LSID

  6. Example: ALL/AML Analysis Training Data Test Data all_aml_train 27 ALL, 11 AML expression samples all_aml_test 20 ALL, 14 AML expression samples Preprocess Filter uninformative genes Preprocess Filter uninformative genes SOM Clustering Cluster samples to separate tumor types Weighted Voting Train-test Build a classifier and compute its accuracy on a test set Class Neighbors Find genes that most closely match a profile Weighted Voting Cross-Validation Build a classifier and compute its accuracy using cross-validation Golub and Slonim et al., 1999

  7. Example: ALL/AML Analysis urn:lsid:broad.mit.edu:cancer.software.genepattern.module.pipeline:00001:0 Training Data Test Data all_aml_train 27 ALL, 11 AML expression samples all_aml_test 20 ALL, 14 AML expression samples Preprocess urn:lsid:broad.mit.edu :cancer.software.genepattern.module.analysis:00020:0 Preprocess urn:lsid:broad.mit.edu :cancer.software.genepattern.module.analysis:00020:0 SOM Clustering urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00029:0 Weighted Voting Train-test urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00027:0 Class Neighbors urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00001:0 Weighted Voting Cross-Validation urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00028:0 Golub and Slonim et al., 1999

  8. LSIDs enable • Reproducible research • exactly repeating an in silico experiment • ‘modernizing’ pipelines to latest • Tracking module provenance • Someday • Data will be available via LSID too…

  9. Future… urn:lsid:broad.mit.edu:cancer.software.genepattern.module.pipeline:00001:0 Training Data Test Data urn:lsid:broad.mit.edu: cancer.microarray: abcde:1.0 urn:lsid:broad.mit.edu: cancer.microarray: zyxwv:1.0 Preprocess urn:lsid:broad.mit.edu :cancer.software.genepattern.module.analysis:00020:0 Preprocess urn:lsid:broad.mit.edu :cancer.software.genepattern.module.analysis:00020:0 SOM Clustering urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00029:0 Weighted Voting Train-test urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00027:0 Class Neighbors urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00001:0 Weighted Voting Cross-Validation urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00028:0 Golub and Slonim et al., 1999

  10. Other LSID use at the Broad • Sample management • Sharing samples (tissues, clones, etc) between program groups • LSIDs identify samples • Permits scientists to find all experiments done with a sample in any Broad program

  11. Other LSID use at the Broad 2. GeneCruiser web service • annotation web service for microarray probes • maps probe set identifiers to GO, GenBank, SwissProt etc • Interface returns LSIDs to these other sources for their identifiers

  12. Use Cases and Future Directions • What does it actually mean to identify a biological object such as "a gene"? • How does LSID address structural elements of biological and chemical objects? • What are the lessons learned from early implementations of LSID?

  13. Use Cases and Future Directions • What granularity of object do we identify? • Should LSID be a URI not a URN? • Should virtual persistent identifiers for derived/calculated properties be used? • What are the barriers to widespread use? • Data/Metadata split – is this a problem? • Phil Lord mentioned @end of yesterday in MyGrid talk

  14. Best LSID quote… • “LSIDs are in a sense just a sociological con trick, since they are nothing more than cheap and cheerful URNs” –David Shotten

More Related