250 likes | 267 Views
This research outlines the development and implementation of a language detection grid service, exploring the interface, implementation details, and encountered problems with future work planned.
E N D
Deployment of a Language Detector Grid Service Introduction to Grid Computing Felix Hageloh Roberto Valenti University of Amsterdam, 02-11-2005
Overview • Introduction • Required Steps • Our Service • Introduction • The basic idea • Use Case • Interface • Implementation • Problems Encountered • Future Work • Conclusions • Questions
Introduction • Our chosen task: Grid Services • Task Goals: • Build a grid service. • Aggregate the service with another to provide additional, higher-level services
Steps • Get access to the systems • Authentication • Security issues • Obtain User Certificate • Obtain Host Certificate • Implement the service • Create required files • WSDL • QNames • WSDD • JNDI • Compile and create GAR file • As Globus user: • Deploy service • Start container
But you all know this… So… we jump to our service.
Our Service: Introduction • We were requested to implement a useful service which could be integrated on other services • We are AI students so… Let’s Merge AI and Grid Computing!!
Our Service: The Basic Idea • Idea: Language Detection Is a necessary first step in a multitude of applications • Useful Web Service Examples: • Email filtering • Information retrieval • Spell checkers • Can also be component of an aggregated grid service
Our Service: The Basic Idea • What about creating a Language Detector on the Grid? • Training and Testing can be extremely time consuming running on a single machine • Data difficult to obtain -> can be shared on the Grid • Duplicate data for parallel computing
Our Service: Use Case Simple Interface: • Receives a piece of text • Returns a string indicating the language
Our Service: Adding States • Grid services can have states (as opposed to web services) • Not necessary for our service but for the learning factor • Added “dummy” states to our service: • Last Operation • Times Used
Our Service: Interface • Requests and Responses <xsd:element name="detect" type="xsd:string" /> <xsd:element name="detectResponse“ type="xsd:string“ /> <xsd:element name="getLanguageRP"> <xsd:complexType /> </xsd:element> <xsd:element name="getLanguageRPResponse" type="xsd:string" /> <xsd:element name="getLastOpRP"> <xsd:complexType /> </xsd:element> <xsd:element name="getLastOpRPResponse" type="xsd:string" /> <xsd:element name="getTimesUsedRP"> <xsd:complexType /> </xsd:element> <xsd:element name="getTimesUsedRPResponse" type="xsd:int" />
Our Service: Interface • Port Types <portType name="LanguageDetectorPortType" … > <operation name="detect"> <input message="tns:DetectInputMessage" /> <output message="tns:DetectOutputMessage" /> </operation> <operation name="getLanguageRP"> <input message="tns:GetLanguageRPInputMessage" /> <output message="tns:GetLanguageRPOutputMessage" /> </operation> <operation name="getLastOpRP"> <input message="tns:GetLastOpRPInputMessage" /> <output message="tns:GetLastOpRPOutputMessage" /> </operation> <operation name="getTimesUsedRP"> <input message="tns:GetTimesUsedRPInputMessage" /> <output message="tns:GetTimesUsedRPOutputMessage" /> </operation> </portType>
Our Service Implementation
Language Detection: Basic Idea • Essentially based on probabilities of character combinations • Every language has typical character combinations that are very frequent in that language • “th” in english • “ij” in dutch • Easy for humans to detect a language even when we don’t know that specific language
Language Learning: Standard Process • Standard machine learning process
Language Learning: Markov Models • Basic Markov Model • kth order Markov Model
Language Detection: Classification • Transitional probabilities estimated as • Classification
Language Detection: Example • The training text for a language consists of the string • Learned model: • the probability of the string would be: test text test ( ^^, t, 1.0 ) ( ^t, e, 1.0 ) ( te, s, 0.5 ) ( es, t, 1.0 ) ( st, , 1.0 ) ( te, x, 0.5 ) ( ex, t, 1.0 ) ( xt, , 1.0 ) P(test|L) = P(t|^^)*P(e|^t)*P(s|te) *P(t|es)*P(_|st) = 1*1*0.5*1*1=0.5
Problems Encountered • Necessary tools had to be installed (ANT) • Problems on our machine (GRAM) • Conflicts with other team • Buggy shell script to build gar file • Sensitive to path lengths/ names
Future Work • Connect with other services • Make training and evaluation a grid service • Make it part of a multi lingual retrieval engine • Web interface (interactive)
Conclusions • Successfully managed to create and deploy our own web service • Broke loose from the tutorial web service structure • Merged Grid Computing with AI • Got hands on experience with Grid applications and structure • A lot of possibilities to integrate and/or extend the implemented service