180 likes | 309 Views
Knowledge Workshop Questions and Panel. caGrid Knowledge Center February 2011. Questions - Security. Certificate Duration Question : How long do grid certificates last?
E N D
Knowledge WorkshopQuestions and Panel caGrid Knowledge Center February 2011
Questions - Security Certificate Duration Question: How long do grid certificates last? Answer: The durations are configurable when Dorian is installed. Typically, user certificates are valid for a maximum for 12 hours and host certificates are valid for several years. Action Item: Consider a feature request to allow users to view certificate lifetime information in GAARDS. Question: Can user credentials be valid for more that 12 hours? Answer: No. 12 hours is the maximum supported user proxy duration using Dorian.
Questions - Discovery Service Discovery Question. Can the Discovery Client perform find grained searched using concept codes? Answer: Yes. The Discovery Client provides the following operations: • discoverDataServicesByModelConceptCode • discoverServicesByDataConceptCode • discoverServicesByConceptCode • discoverServicesByOperationConceptCode
Question - FQP FQP Question: When performing a federated query it is necessary to delegate your credential to the FQP service. This requires a user to know the FQP service identity. Where do I get that? Answer: Correct. Currently, you cannot lookup the FQP identity in GAARDS. A new feature caGrid 1.4 will allow Dorian to be configured to allow you to find the identity of any service. Until then: Training: /O=caBIG/OU=caGrid/OU=Training/OU=Services/CN=fqp.training.cagrid.org Production: /O=caBIG/OU=caGrid/OU=LOA1/OU=Services/CN=cagrid-fqp.nci.nih.gov
Questions – FQP (continued) Question: What are the maximum number of concurrent FQP jobs that can be processed by the FQP service? Answer: This depends upon the configuration of your FQP host and the memory allocated to the FQP service container.
Panel Discussion Purpose: To gather deployment and usage information from users who are utilizing caGrid on a daily basis. Panel • UAB – Harsh Taneja, Nive Thota • Utah – Ron Price • Roswell Park Cancer Center – Ken Quinn • Chronic Lymphocytic Leukemia Research Consortium – Bill Stephens • CVRG, ACTSI – Tahsin Kurc and Stephen Granite • TRIAD – David Ervin
Question 1: How can NCI and/or Grid KC better support caGrid 1.x? • Bill: Resurrect Boot Camp training. Anyone who is trying to set up a semantically annotated grid service is somewhat on their own because Boot Camps stopped in 2009. This causes KCs to support users through the process, which is not optimal. Boot camps used to cover from domain modeling all the way through to service creation. • Ken: Agreed on training. We hired a person to do grid project. There's a wealth of documentation out there, but it's difficult to enter into it without help. • One approach is to make the product easier to use. Or you hire the right people to get it done. • Currently have a class of individuals / institutions that need a lot of help to get the infrastructure up and running. Maybe the KC supports the more technical class of users.
Question 1: • Would be interesting to get feedback from Service Providers. Are they really able to support the institutions that come to them? What are the kinds of projects that are being supported by the SSPs? From the outside, it appears that there are institutions that just want one of the caBIG tools up and running.
Question 2: List 3 additions / modifications to caGrid that, if carried out, would have made your project easier to accomplish. • Stephen: Thinking from where we originally started, what we did not have was a tool to test the services. Take the GAARDS UI to use it and extend it for testing services. Would be nice if there was a generic test tool that could be extended and used to test that is going on. Would be nice to be able to have it invoke methods and see if it is working properly. (Maybe Taverna can do this?) • Bill / Dave: Konrad Rokicki presented a Grid Health Dashboard recently that tests all services in the Index Service and grades them based upon queries performed against each exposed object. We could create a version of this using the caGrid Client Development Guide software. • Bill: 1) Inability to serialize more that 2 levels of data between grid client and grid services. (may be a bug in our services) 2) Data model can accurately represent the graph but be incorrect for the service. Would be nice to be able to compare old & new models and apply changes to grid services rather than totally recreate. Small model changes requires a full regeneration. This can take a tremendous amount of development effort to perform.
Bill: • Inability to serialize more that 2 levels of data between grid client and grid services. (may be a bug in our services) • Data model can accurately represent the object graph but be incorrect for the service. Would be nice to be able to compare old & new models and apply changes to grid services rather than totally recreate the service. Even small model changes requires a full regeneration. This can take a tremendous amount of development effort to perform. • Ron Price: • Programmatic federated query tutorials. • If you're doing custom client side development, you need to hunt down the jars. This is difficult.
Question 2: • Tahsin: • Security in caGrid is still the most complicated. If you want to deploy your own caGrid instance, the most complicated, time consuming and most frustrating part is setting up the security services. This part should be easier to set up and test. • We have users with their own account management. Large scale applications tend to be better with integrating with other systems. Would be good to have a mechanism built in to caGrid to integrate these other systems and mapping them. Similar to single sign on. caGrid has done a good job at integrating with other caBIG applications, but not so much with other tools. • Querying the grid. Need the ability to query multiple targets. We support CQL right now. One question we need to think about is whether or not we need to provide facilities to query other types of query languages. The others have tooling that can be helpful. • Service Development: Start with application as if you are going to deploy on your own local machine and then be able to stub out. Start from the libraries and then turn into a caGrid application.
Question 2: • David • Some of the query capability (CQL) is pretty good, however what would be a huge win would be to be able to run the query all at once. • Serialization framework has turned out to be a major pain. Not really a clean way to plug in other serialization frameworks. • Niveditha • Some of the documentation was available, but it was hard to know that functionality that we wanted was not available. Was the functionality there? We needed to read a lot of documentationto find answers • Issues with queries. In caCORE, it would be good to know how the data maps, how the associations translateand about overflow errors. • It would be nice if CQL translated into cleaner SQL.
Question 2: • Harsh • Would be nice to have a DCQL designer, a tool to be able to graphically design the query. Need to visualize. • Dedicated federated query analyzer. Do some caching at the end of the data. Be able to know how to restructure the query. • Comparative documentation. If you go down one path, (security / non-secure environments), what are the differences that are needed. • Ken • Bootcampwould've helped immensely. • caGrid world needs to be cognizant that there are people in the Windows VM world. • Query robustness, which has already been addressed. • Grid Portal query tool. Doesn't appear to be that useful and does not have an intuitive interface. Was ruled out for their internal grid project b/c it wasn't attractive. • Wasn't anything off the shelf that enabled people to work in other environments.
Question 2: • General comments: • Easy to use CQL & DCQL tool integrated into the portal. • General need for support for other environments • Perl: see http://search.cpan.org/~bosborne/caGRID-CQL1-1.0.2/ • SAS • R: see: https://cabig.nci.nih.gov/tools/RProteomics • Interface with PACS
Question 3: List three additions/modifications to caGrid that, if carried out, would have allowed you to make your project more effective • CQL translator would be good to have. • We have our information stored on Excel spreadsheet / Access. Why not an Excel /Access Data Service? • Note: Some of this is being prototyped by the .NET Team. • Apache POI could support Excel services, too. • Mechanisms to scale up Access to MSSQL.
Question 4: If you had to redeploy your project without using caGrid, what software would you use and how would this alternate software stack impact your project. • Stephen • RESTfulservices, using SPARQL queries. • Software wise, much more of a Java shop, but Perl, PHP, .NET, C++ could also be used. • Mobile agents that are able to get the data. • Feeding a data integrator. • Bill • Would have uses a web application framework. Groovy & GRAILS, Spring Roo and JRuby. Because they are JVM-based they can use the caGrid client.jar files to integrate with the grid. We tested this with Grails. • Unfortunately these solutions are not MDA.
Question 4: • Ron • Federated query, why re-invent it? • Recreate as SOAP / REST services. • Use Java EE for security. • Mark • LIFT – a secure web framework using Scala • UAB • We were looking at federated approach. Looked at other frameworks. • Looking at I2B2 and how their queries work and if there are any advantages to that approach. • Looking at having in-house development from scratch, using lighter weight services. Would be a big deal to redesign. • Web 3.0, RDF, SPARQL. • SPARQL to CQL converter.
Question 4 • Ken • If we took a different approach and didn't have development staff. We would have looked at GeneTegra. It uses semantic web queries and SPARQL queries. We’re evaluating GeneTegra. It is a tool that we can hand off and let others develop queries. • Worried about the semantic infrastructure activities. Where is grid going to take us into semantics in the future.