280 likes | 455 Views
GATE development hints. Reporting bugs Submitting a patch The user guide Continuous integration. Bugs, feature requests. Use the tracker on SourceForge http://sourceforge.net/projects/gate/support Give as much detail as possible
E N D
GATE development hints • Reporting bugs • Submitting a patch • The user guide • Continuous integration
Bugs, feature requests • Use the tracker on SourceForge • http://sourceforge.net/projects/gate/support • Give as much detail as possible • GATE version, build number, platform, Java version (1.5.0_15, 1.6.0_03, etc.) • Steps to reproduce • Full stack trace of any exceptions, including "Caused by…" • Check whether the bug is already fixed in the latest nightly build
Patches • Use the patches tracker on SourceForge • Best format is an svn diff against the latest subversion • Save the diff as a file and attach it, don't paste the diff into the bug report. • We generally don't accept patches against earlier versions
Patches (2) • GATE must compile and run on Java 5 • Not sufficient to set source="1.5" and target="1.5" but compile on Java 6 • This doesn't prevent you calling classes/methods that don't exist in 5 • Test your patch on Java 5 before submitting
The User Guide • Everything in GATE is (theoretically) documented in the GATE User Guide • http://gate.ac.uk/userguide • Every change to the core should be mentioned in the change log • http://gate.ac.uk/userguide/chap:changes • User guide is written in LaTeX
Updating the user guide • Lives in subversion • https://gate.svn.sourceforge.net/svnroot/gate/userguide/trunk • Build requires pdflatex, htlatex (tex4ht package), sed, make, etc. • On Windows, use Cygwin • Download http://gate.ac.uk/sale/big.bib and put in directory above the .tex files
Updating the user guide (2) • Edit the .tex files • Graphics, screenshots, etc. should be .png • Check in changes to .tex files, the PDF and HTML are regenerated automatically by…
Hudson • Continuous integration platform • Automatically rebuilds GATE and user guide (among others) whenever they change • Also does a clean build of GATE every night • Nightly builds published at http://gate.ac.uk/download/snapshots
Hudson • Junit test results available for each build • http://gate.ac.uk/hudson
Running GATE Embedded in Tomcat (or any multithreaded system) Issues and tricks
Introduction • Scenario: • Implementing a web service (or other web application) that uses GATE Embedded to process requests. • Want to support multiple concurrent requests • Long running process - need to be careful to avoid memory leaks, etc. • Example used is a plain HttpServlet • Principles apply to other frameworks (struts, Spring MVC, Metro/CXF, Grails…)
Setting up • GATE libraries in WEB-INF/lib • gate.jar + JARs from lib • Usual GATE Embedded requirements: • A directory to be "gate.home" • Site and user config files • Plugins directory • Call Gate.init() once (and only once) before using any other GATE APIs
Initialisation using a ServletContextListener • ServletContextListener is registered in web.xml • Called when the application starts up <listener> <listener-class>gate.web..example.GateInitListener</listener-class> </listener> public void contextInitialized(ServletContextEvent e) { ServletContext ctx = e.getServletContext(); File gateHome = new File(ctx.getRealPath("/WEB-INF")); Gate.setGateHome(gateHome); File userConfig = new File(ctx.getRealPath("/WEB-INF/user.xml")); Gate.setUserConfigFile(userConfig); // site config is gateHome/gate.xml // plugins dir is gateHome/plugins Gate.init(); }
GATE in a multithreaded environment • GATE PRs are not thread-safe • Due to design of parameter-passing as JavaBean properties • Must ensure that a given PR/Controller instance is only used by one thread at a time
First attempt: one instanceper request • Naïve approach - create new PRs for each request public void doPost(request, response) { ProcessingResource pr = Factory.createResource(...); try { Document doc = Factory.newDocument(getTextFromRequest(request)); try { // do some stuff } finally { Factory.deleteResource(doc); } } finally { Factory.deleteResource(pr); } } Many levels of nested try/finally: ugly but necessary to make sure we clean up even when errors occur. You will get very used to these…
Problems with this approach • Guarantees no interference between threads • But inefficient, particularly with complex PRs (large gazetteers, etc.) • Hidden problem with JAPE: • Parsing a JAPE grammar creates and compiles Java classes • Once created, classes are never unloaded • Even with simple grammars, eventually OutOfMemoryError (PermGen space)
Second attempt: using ThreadLocals • Store the PR/Controller in a thread local variable private ThreadLocal<CorpusController> controller = new ThreadLocal<CorpusController>() { protected CorpusController initialValue() { return loadController(); } }; private CorpusController loadController() { //... } public void doPost(request, response) { CorpusController c = controller.get(); // do stuff with the controller }
Better than attempt 1… • Only initialise resources once per thread • Interacts nicely with typical web server thread pooling • But if a thread dies, no way to clean up its controller • Possibility of memory leaks
A solution: object pooling • Manage your own pool of Controller instances • Take a controller from the pool at the start of a request, return it (in a finally!) at the end • Number of instances in the pool determines maximum concurrency level
Blocks if the pool is empty: use poll() if you want to handle empty pool yourself Simple example private BlockingQueue<CorpusController> pool; public void init() { pool = new LinkedBlockingQueue<CorpusController>(); for(int i = 0; i < POOL_SIZE; i++) { pool.add(loadController()); } } public void doPost(request, response) { CorpusController c = pool.take(); try { // do stuff } finally { pool.add(c); } } public void destroy() { for(CorpusController c : pool) Factory.deleteResource(c); }
Exporting the grunt work -the Spring Framework • Spring Framework • http://www.springsource.org/ • Handles application startup and shutdown • Configure your business objects and connections between them using XML • GATE provides helpers to initialise GATE, load saved applications, etc. • Built-in support for object pooling • Web application framework (Spring MVC) • Used by other frameworks (Grails, CXF, …)
Initialising GATE with Spring <beans xmlns="http://www.springframework.org/schema/beans" xmlns:gate="http://gate.ac.uk/ns/spring"> <gate:init gate-home="/WEB-INF" plugins-home="/WEB-INF/plugins" site-config-file="/WEB-INF/gate.xml" user-config-file="/WEB-INF/user-gate.xml"> <gate:preload-plugins> <value>/WEB-INF/plugins/ANNIE</value> </gate:preload-plugins> </gate:init> </beans>
Loading a saved application • scope="prototype" means create a new instance each time we ask for it • Default is singleton - one and only one instance <gate:saved-application id="myApp" location="/WEB-INF/application.xgapp" scope="prototype" />
Spring servlet example • Spring provides HttpRequestHandler interface to manage servlet-type objects with Spring • Declare an HttpRequestHandlerServlet in web.xml with the same name as the Spring bean
Spring servlet example • Write the handler assuming single-threaded access • Will use Spring to handle pooling for us public class MyHandler implements HttpRequestHandler { public void setApplication(CorpusController app) { ... } public void handleRequest(request, response) { Document doc = Factory.newDocument(getTextFromRequest(request)); try { // do some stuff with the app } finally { Factory.deleteResource(doc); } } }
Tying it together • web.xml <!-- set up Spring --> <listener> <listener-class> org.springframework.web.context.ContextLoaderListener </listener-class> </listener> <!-- servlet --> <servlet> <servlet-name>mainHandler</servlet-name> <servlet-class> org.springframework.web.context.support.HttpRequestHandlerServlet </servlet-class> </servlet>
Tying it together (2) • applicationContext.xml <gate:init ... /> <gate:saved-application id="myApp" location="/WEB-INF/application.xgapp" scope="prototype" /> <bean id="myHandlerTarget" class="my.pkg.MyHandler" scope="prototype"> <property name="application" ref="myApp" /> </bean> <bean id="handlerTargetSource" class="org.springframework.aop.target.CommonsPoolTargetSource"> <property name="targetBeanName" value="myHandlerTarget" /> <property name="minIdle" value="3" /> <property name="maxIdle" value="3" /> <property name="whenExhaustedActionName" value="WHEN_EXHAUSTED_BLOCK" /> </bean> <bean id="mainHandler" class="org.springframework.aop.framework.ProxyFactoryBean"> <property name="targetSource" ref="handlerTargetSource" /> </bean>