170 likes | 365 Views
LIMS SOLR Integration . Jake Lin Shmulevich Lab jake.lin@systemsbiology.org. LIMS for Systems Genetics. Systems Genetics - study of complex traits (phenotypes) resulting from multiple genotypes and environment interactions LIMS web app- content and process management
E N D
LIMS SOLR Integration Jake Lin Shmulevich Lab jake.lin@systemsbiology.org
LIMS for Systems Genetics • Systems Genetics - study of complex traits (phenotypes) resulting from multiple genotypes and environment interactions • LIMS web app- content and process management • Spring MVC with Addama components • Aid research and improve operations • Sample and experiment tracking • Annotations • Visualizing - relationships and results • Pipelines - bash + python + http • Data sharing
Resources and Content • 8 Natural Variant Crossings • ~3000 progeny • 67 Sequencing submissions • 46 Multiplexed ~48 degrees • 400,000 progeny images • ~10X more content
Robust Search Heart of Information Management • Simple & Fast • Accurate & Meaningful
XPath Search • Hierarchy • file directory • RESTful - http/ajax • Domain • Drawbacks: • Slow LIMS Web Search Addama JCR JCR addama JCR
Wraps Lucene - .jar • Doug Cutting • Apache • Matured • Ported to C++/C#,Pyton,Perl,... • IBM, Apple,... • High performance text search engine library • indexing • querying • Simple Configuration • Web admin • REST/HTTP APIs • solr.war SOLR + Lucene LIMS Web Search
SOLR schema.xml $TOMCAT_HOME/webapps/ROOT/solr/conf/schema.xml <!-- progeny --> <field name="ypgKey" type="string" indexed="true" stored="true"/> <field name="ypgMatingType" type="text" indexed="true" stored="true"/> <field name="ypgGenotype" type="text" indexed="true" stored="true"/> <field name="ypgParentA" type="text" indexed="true" stored="true"/> <field name="ypgParentAlpha" type="text" indexed="true" stored="true"/> <field name="ypgSiblings" type="text" indexed="true" stored="true"/> <field name="ypgCrossingRef" type="text" indexed="true" stored="true"/> ... <!-- composite --> <field name="ypgFields" type="text" indexed="true" stored="true" multiValued="true" /> <copyField source="*Key" dest="limsKey"/> <copyField source="ypg*" dest="ypgFields"/> <copyField source="*" dest="allFields" /> • Field types determine tokenizing and indexing • impact 'fuzzy' and 'like' search • http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
SOLR HTTP Post #Update/Insert - CSV • curl 'http://saskatoon:8080/solr/update/csv?commit=true' --data-binary @YCR_YPGAll.csv -H 'Content-type:text/plain; charset=utf-8' #Update - JSON • curl 'http://saskatoon:8080/solr/update/json?commit=true' --data-binary @YCR_YPG500.json -H 'Content-type:application/json' #Delete • curl 'http://saskatoon:8080/solr/update?commit=true' -H "Content-Type: text/xml" --data-binary '<delete><query>ypgKey:testKey_001</query></delete>'
Data Migration Update/Insert - CSV • LIMS built in export results to CSV function Import from Database • http://wiki.apache.org/solr/DataImportHandler
SOLR HTTP Get //Find all progenies for YCR6 http://systemsgenetics.systemsbiology.net:8080/solr/select/q=ypgCrossingRef:YCR6&wt=json&rows=5000&fl=ypgKey,ypgBoxNumber,ypgCrossingRef,ypgMatingType,ypgGenotype,ypgParentA,ypgParentAlpha,ypgAlias,ypgTetrad,ypgStatus,ypgPosition,ypgComments,ypgDateFrozen,ypgSiblings QTime: 9 ms {"response":{numFound:400,"docs":[{"ypgKey:ypgX",...}, {"ypgKey:ypgX",...}, ...]}} • &hl=true&hl.fl=ypgPosition,ypgStatus
More get examples //range http://saskatoon:8080/solr/select/?q=yoBoxNumber:[1%20TO%202]&wt=json //AND + OR http://saskatoon:8080/solr/select/?q=(yoBoxNumber:[1%20TO%202]%20AND%20yoFields:wine)&wt=json ...
SOLR ExtJs AJAX Get function getYPGSolrUrl(searchTerm) { return "/solr/select/?" + "q=" + searchTerm + "&wt=json&rows=5000&" + "fl=ypgKey,ypgBoxNumber,ypgCrossingRef,ypgMatingType,ypgGenotype,ypgParentA,ypgParentAlpha," + "ypgAlias,ypgTetrad,ypgStatus,ypgPosition,ypgComments,ypgDateFrozen,ypgSiblings"; } function goSearch(index, ypgSearchInput, ypgSearchOption) { if (ypgSearchInput == '') { ypgSearchInput = 'YPG'; } ypgSearchInput = checkWildcard(ypgSearchInput); var searchWin = getSearchLoadingWindow("yprogeny-"); var searchUrl = getYPGSolrUrl(ypgSearchOption + ":" + ypgSearchInput); searchWin.on("show", function () { var sb = Ext.getCmp("yprogeny-search-statusbar"); sb.showBusy(); }); searchWin.show(); Ext.Ajax.request({ url: searchUrl, method: "GET", success: function(response) { var searchResultObj = Ext.util.JSON.decode(response.responseText); myYPGData = []; loadYPGSearchResult(index, searchResultObj.response, function() { Ext.getDom("sample-search-result-list").innerHTML = ""; Ext.getDom("yo-form").innerHTML = ""; searchWin.close(); renderYPGSearchResult(); }); }, failure: function() { eventManager.fireStatusMessageEvent({ text: "Search Results failed for url:" + searchUrl, level: "error" }); } }); } //Post function postSolrUpdates(jsonObj, callback) { var docsol = {}; docsol["doc"] = jsonObj; var add = {}; add["add"] = docsol; Ext.Ajax.request({ url: "/solr/update/json?commit=true", method: "POST", jsonData: { add: docsol }, success: function() { callback(); }, failure: function() { Ext.Msg.alert("Error", "Failed updating/adding record - please let Jake know:" + jsonObj); } }); }
SOLR ExtJs AJAX Post /* jsonObj contains new and existing annotation values from form */ function postSolrUpdates(jsonObj, callback) { var docsol = {}; docsol["doc"] = jsonObj; var add = {}; add["add"] = docsol; Ext.Ajax.request({ url: "/solr/update/json?commit=true", method: "POST", jsonData: { add: docsol }, success: function() { callback(); }, failure: function() { Ext.Msg.alert("Error", "Failed updating/adding record - please contact Infocore with this info:" + jsonObj); } }); }
SOLR Java HttpClient public void testPost(String url, JSONObject jsonObject) { try { HttpClient client = new HttpClient(); PostMethod post = new PostMethod(url); post.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new DefaultHttpMethodRetryHandler(3, false)); JSONObject postObject = new JSONObject(); postObject.put("doc", jsonObject); JSONObject addObject = new JSONObject(); addObject.put("add", postObject); //"docs":[{"limsadminKey":"dudley_limsadminkey","limsKey":"dudley_limsadminkey","limsadminYoCount":791, // "limsadminYoMaxNum":512,"limsadminYoBoxNum":7,"limsadminYoPosition":"G3", // "allFields":["dudley_limsadminkey","791","512","7","G3","420","420","6","B5","1768","1768","20","A1","367","367","531","531","240","240","344","344"]}]}} post.setParameter("jsonData", "application/json"); post.setRequestEntity(new StringRequestEntity(addObject.toString(), "application/json", null)); post.setRequestHeader("Content-Type", "application/json"); int statusCode = client.executeMethod(post); System.out.println("Post " + url + "\nStatus code:" + statusCode); System.out.println(IOUtils.toString(post.getResponseBodyAsStream(), "UTF-8")); post.releaseConnection(); assertEquals(0,0); } catch (IOException e) { e.printStackTrace(); assertEquals(0,1); } catch (JSONException ej) { ej.printStackTrace(); assertEquals(0,1); } }
Notes and observations • update act as inserts, delete existing doc • must use lowercase for wild card (*) search • keys must be primitive type • index corruption with java 1.7 • start/stop tomcat • [www@saskatoon ROOT]$ ../../bin/shutdown.sh • [www@saskatoon ROOT]$ ../../bin/startup.sh
References Lucene in Action - Manning Press http://lucene.apache.org/solr/ http://lucene.apache.org/solr/tutorial.html http://wiki.apache.org/solr/SchemaXml http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://www.systemsbiology.org/Scientists_and_Research/Faculty_Groups/Dudley_Group Science Perspective http://www.systemsbiology.org/Scientists_and_Research/Faculty_Groups/Shmulevich_Group In progress http://code.google.com/p/lims-systemsgenetics/
Thanks Shmulevich Lab Andrea Eakin Hector Rovira John Boyle Ilya Shmulevich Dudley Lab Gareth Cromie Cathy Ludlow Patrick May Adrian Scott Aimee Dudley