230 likes | 357 Views
FLAVIUS Meeting – WP4. June 8, 2010. Giurgiu Bogdan Wong William. Agenda. LW contributions Keys to successful integration Complete integration picture Translation REST API Trustscore ™ and Reporting REST API Version 2 Customization through dictionaries Customization through training
E N D
FLAVIUS Meeting – WP4 June 8, 2010 Giurgiu Bogdan Wong William
Agenda • LW contributions • Keys to successful integration • Complete integration picture • Translation REST API • Trustscore™ and Reporting • REST API Version 2 • Customization through dictionaries • Customization through training • FLAVIUS Language Weaver Roadmap • Questions & Answers
Keys to a Successful Partner Integration • Ability to integrate with Language Weaver Machine Translation for development and testing • Ability to customize baseline engines with dictionaries • Ability to customize baseline engines with training of domain/customer specific vertical system
Translation REST API • SimpleHTTP base communication protocol • Leverage HTTP calls – POST, GET, DELETE • Web 2.0 used by Amazon, Twitter, etc. • Supported text formats: TXT, HTML, TMX, XLIFF • Data is encrypted using SSL (via HTTPS) • Authentication using a custom HTTP scheme • Two addition headers added to every request • LW_Date – Contains a date/time string based on the request time • Authorization – Contains a string made up of three strings (each separated by a colon): “LWA:<userid>:<signature>” • Unique signature generated using a keyed-HMAC (Hash Message Authentication Code) and a SHA1(Secure Hash Algorithm) digest
Translation Rest API /v1/user +HTTP POST User /v1/translation/src.tgt/lpid=<id> + HTTP POST Blocking Translations Language Pair /v1/translation/src.tgt/lpid=<id> + HTTP POST /v1/translation/src.tgt/lpid=<id>/<jobid> + HTTP GET/DELETE Non-Blocking Translations /v1/lpinfo + HTTP GET
Translation REST API • Blocking Translation Request • HTTP POST to https://lwaccess.languageweaver.com/v1/translation/[src].[tgt]/lpid=[lpid]/[optional-params]/ • Appropriate small chunks of data (less than 640 bytes) • Mandatory Input Parameters: • [src] – three letter code for the source language (e.g. “eng” for English) • [tgt] – three letter code for the target language • [lpid] – integer denoting the specific language pair system to be used • “source_text=” – [string] - URL escaped version of the input source (POST DATA) • Optional Input Parameters: • input_format=[value] – string declaring the input format. Choose from “html”, “plain”, “xliff”. • input_encoding=[value] – string defining the input format. Only “utf8” supported • Sample Calls: • Create Blocking Translation Job for Text, Get Language Pair details
Translation REST API • Non-Blocking Translation Request • HTTP POST to https://lwaccess.languageweaver.com/v1/translation-async/[src].[tgt]/lpid=[lpid]/[optional-params]/ • Appropriate for large size files • Mandatory /Optional Input Parameters are similar with the Blocking Translation • Sample calls: • Create Non-Blocking Translation Job for Text/ URL/ File • Get Language Pair details, Get User Info • Followed by HTTP GET’s to https://api.languageweaver.com/v1/translation-async/[src].[tgt]/[jobID]/lpid=[lpid]/[optional-params]/ • [jobID] – integer denoting the specific translation submitted with the POST • Sample calls: • GET Non-Blocking Translation Job for Text/ URL/ File
Translation REST API • Sample code – C# Example // Step 1: Construct the path. Check to see if the LPID and/or input_format is submitted string szPath = "/v1/translation/" + szSrcLang + "." + szTgtLang + "/"; if (0 != szLPID.Length) szPath = szPath + "lpid=" + szLPID + "/"; if (0 != szInputFormat.Length) szPath = szPath + "input_format=" + szInputFormat + "/"; // Step 2: Construct the URL string szURI = m_szHostName + szPath; System.Console.WriteLine(szURI); // Step 3: Prepare the POST request HttpWebRequest request = (HttpWebRequest)WebRequest.Create(szURI); PrepareHttpRequestHeader("POST", szPath, ref request);
Translation REST API // Step 4: Attach the POST data szSourceText = "source_text=" + szSourceText; byte[] postDataBytes = Encoding.UTF8.GetBytes(szSourceText); request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = postDataBytes.Length; Stream requestStream = request.GetRequestStream(); requestStream.Write(postDataBytes, 0, postDataBytes.Length); requestStream.Close(); // Step 5: Read the response HttpWebResponse response = (HttpWebResponse)request.GetResponse(); StreamReaderresponseReader = new StreamReader(response.GetResponseStream(), Encoding.UTF8); string lpInfoResponse = responseReader.ReadToEnd(); // Step 6: Parse the XML document for the translated text XmlDocumentxmlDoc = new XmlDocument(); xmlDoc.LoadXml(lpInfoResponse); System.Console.WriteLine(lpInfoResponse); XmlNodeListnodeList = xmlDoc.GetElementsByTagName("translated_text"); szTargetText = nodeList[0].InnerText.Trim();
Translation REST API – Header Generation • Sample code – C# Example • Generate Header // Step 1: Get the current HTTP date string szHttpDate = GetHttpDate(); // Step 2: Generate the signature szRequestType = szRequestType.ToUpper(); string szSignature = GenerateSignature(szRequestType, szHttpDate, szURI); // Step 3: Add the two new headers to the request object request.Headers.Add("LW_Date", szHttpDate); request.Headers.Add("Authorization", "LWA:" + m_szUserID + ":" + szSignature); System.Console.WriteLine(szSignature);
Translation REST API – Header Generation • Generate Signature Encoding u8Encoding = new UTF8Encoding(); HMACSHA1 hmacsha1 = new HMACSHA1(u8Encoding.GetBytes(m_szAPIKey)); string szMessage = szRequestType.Trim() + "\n" + szHttpDate.Trim() + "\n" + szURI.Trim(); string szSignature = Convert.ToBase64String(hmacsha1.ComputeHash(u8Encoding.GetBytes(szMessage.ToCharArray()))); return szSignature;
Translation REST API • Sample request – response for Create Non-Blocking Translation Job for Text e.g. HTTP POST request to https://lwaccess.languageweaver.com/v1/translation-async/eng.fra/lpid=74/ <?xml version='1.0' encoding='UTF-8'?> <lwresponse> <service_version>v1</service_version> <requested_url>/v1/translation-async/eng.fra/lpid=74/</requested_url> <request_type>POST</request_type> <request_time>Wed Mar 3 14:55:51 2010</request_time> <source_language>eng</source_language> <target_language>fra</target_language> <response_data type='translation-async_post'> <retrieval_url>https://lwaccess.languageweaver.com/v1/translation-async/eng.fra/90079.3bccc5e58d50ce7dcaf950f562ec2303/lpid=74</retrieval_url> <job_id>90079</job_id> <translation_signature>3bccc5e58d50ce7dcaf950f562ec2303</translation_signature> <src>eng</src> <tgt>fra</tgt> <lpid>74</lpid> <input_format>text/plain</input_format> <input_encoding></input_encoding> <dictionary></dictionary> <customizer></customizer> <source_text><![CDATA[Hello World]]></source_text> <server> <version>5.1.2 release ENGFRAU20_5.1.x.0</version> </server> </response_data> </lwresponse>
Trustscore™ and Reporting • Internal LW milestone • Migration to version 2 of REST API • Reporting: • Words per minute • Number of documents translated • Average document length • Details about the TrustScore™ • Other metrics to be defined • Trustscore™: • Scored from 1-5 • Document level scoring • Segment level scoring not supported
REST API Version 2 • New format • Sample of Create Non-Blocking Translation Job for Text • https://api.languageweaver.com/v2/language-pair/[lpid]/translation-async/[optional-params]/ • Mandatory and Optional parameters same as v1 • Additional calls/ functionality related to: • Trustscure • Reporting • Dictionary
Customization through Dictionaries • Structure • One entry per term, one translation per entry • Search & Replace mechanism that applies unconditionally • Size • Up to 300.000 entries • Best practice to build one • Using CSV files • Limitations • No limitations on the content • Recommend use of dictionaries is via phrase replacement instead of word replacement • Gender is not automatically generated • UTF-8 • Impact on performance • No significant impact
Customization through Training Parallel Aligned Text LW Training Compute Cloud d Optional: Regression Text Evaluation Product Delivery via TOD • Data: • Fix noisy text • More text • Text alignment • Text segmentation Optional: Test Text
Customization through Training • Structure: • Train on any language pair specified in the FLAVIUS agreement • Inputs: TMX parallel segments, optional regression text files, optional test sets for evaluation • Outputs: • Trained engine • Results of BLEU scored test set • Translated output of regression text files • Metrics from input training corpus • Evaluate customized engine via TOD deployment
Thank you! Accelerating the way the world communicates