180 likes | 344 Views
Modeling and Integrating Legacy Sites into Web Services with WebXcript. Dickson K.W. CHIU Senior Member, IEEE dicksonchiu@ieee.org. Motivation of WebXcript. Most web services available only through HTML web pages (e.g., online ordering) Need human attention Long delay impairing E-commerce
E N D
Modeling and Integrating Legacy Sites into Web Services with WebXcript Dickson K.W. CHIU Senior Member, IEEE dicksonchiu@ieee.org
Motivation of WebXcript • Most web services available only through HTML web pages (e.g., online ordering) • Need human attention • Long delay impairing E-commerce • Motivated by scripts in terminal emulation programs (Telix/Procomm) • Generic tool for automating web interactions • Also useful for casual end-users – e.g., get stock price into personal db from a web page Dickson Chiu (2004)
Functions of WebXcript • Wrapper language for providing programmatic web services • An environment for integrating heterogeneous web services • A complete set of primitive for responding to HTML forms • Information extraction from web pages based on pattern-matching • Interface to back-end databases or storing data to files • Exceptions and alerts • Full compatibility with XML technologies (WebXcript is in XML syntax) Dickson Chiu (2004)
Project History • Idea originates from the E-ADOME WFMS • HKUST PhD thesis (2000) • Information Systems (1999, 2001) • Distributed and Parallel Database (2002) • Information Technology & Management (2004) • Webscript • Agent Programming - DEXA INBOSA workshop (2001) • Workflow Automation - ICEC (2002) • WebXcript FYP at CUHK • COMPSAC conference (2003) • International Journal of Cooperative Information Systems (accepted, to appear) Dickson Chiu (2004)
WebXcript Architecture • Redefined in XML syntax (easy parsing!!) • Java 2 platform • Sun JWSDP • Interpreted by server • XSLT frontend customization Dickson Chiu (2004)
Conceptual Model for Legacy Web Site Automation Web Service Interface Target Legacy Site WebXcript Environment Dickson Chiu (2004)
Overall WebXcript Language Design • Minimal core language • Based on HTML features • Automate HTTP messages • Simulates a user browsing a target web page, entering information and pressing buttons • Carry out delegated actions and/or extract relevant information from pages • Interfacing to back-end databases or storing data to files • Raising exceptions and alerting Dickson Chiu (2004)
Basic Language Constructs • Variables / Parameters - String type and structured type • Structured type based on db table / class definition • Simple control flow primitives • Java expression and functions • Subroutines Dickson Chiu (2004)
Interfacing & Information Extraction • Connect to db (ODBC, MySQL, Postgres) • Send db statements (SQL) to obtain results (with cursor) • Insert a tuple / object from a structured variable • Download URL for processing • Save to file or as objects in host db • Extract information by matching regular expressions Dickson Chiu (2004)
HTML Form Dialogue • From script variables and expressions, fill in fields, select check-box / radio-buttons / pop-up list, etc. • Press Buttons Dickson Chiu (2004)
Example 1 – Database Driven Script for Checking Registration Price Dickson Chiu (2004)
Example 2 – Online Domain Name Registration <form num=”1” method=”post”> <url>https://www.dynodomain.com/cgi-bin/domain_search.cgi</url> <fillform name=”Dname”>o.domain</fillform> <button>Search</button></form> <expect> <in_page>available</in_page> <exception>domain_not_avaiable</exception> </expect> <form num=”1” method=”post”> <url>https://www.dynodomain.com/cgi-bin/registrate.cgi</url> <fillform name=”Dname”>default</fillform> <button>Register</button></form> </checkpoint> … <WebXcript name=”regdomainame”> <declare type="parameter">n</declare> <declare type="result">o</declare> <!-- input form number --> <db dbtype="MySQL" host="localhost" username="OrderClerk" password ="pwd" dbname="services"> <dbcommand id="0" result="o"> select * from order_form where order_num=$n </dbcommand> </db> <exceptionretry checkpointid="1" timeout="5000" maxRetry="5"> <exception>domain_not_available</exception> <exception>page_changed</exception> </exceptionretry> <checkpoint checkpointid="1"> <url>http://www.dynodomain.com/registrate.html</url> <expect> <in_title>*regist*</in_title> </expect> Dickson Chiu (2004)
Example 3 – XML Messaging Script <?xml version=”1.0”?> <purchaseOrder Date=”11.23.2001” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.niceg.com po.xsd”> <customerInfo> <name>eadome company ltd</name> <address>18 Tai Wah Street, Kowloon, Hong Kong</address> <phone>85226813744</phone> <fax>85226813660</fax> </customerInfo> <subscription Dname=”eadome.biz” Period=”1”/> … </purchaseOrder> … <!-- send direct xml message with get method --> <URL>http://www.xmlreg.com/registrate2.xml? <?xml version=\”1.0\”?> <purchaseOrder Date=\”11.23.2001\” “ xmlns:xsi=\”http://www.w3.org/2001/XMLSchema-instance\” “ xsi:schemaLocation=\”http://www.niceg.com po.xsd\”> <customerInfo> <name>order_form.company</name> <address>order_form.address</address> <phone>order_form.phone</phone> <fax>order_form.fax</fax> </customerInfo>” + <subscription Dname=\””+order_form.domain ”\” Period=\”” order_form.period “\”/> ... </purchaseOrder> </URL> … Dickson Chiu (2004)
Wrapping into Web Services • Mapping of the input and output parameters of the Web Service into those of the WebXcript • Input parameter mapping – WebXcript parameter, table row, none • Output parameter mapping – WebXcript paramater, SQL, success / fail • WSDL generation Dickson Chiu (2004)
Experiences Learned • Achieve connectivity with as many web sites as possible • Section tracking - cookie support • Secured Internet connection - Java Secure Socket Extension (JSSE) • Web page redirection HTTP-EQUIV="Refresh" meta-tag • Imitate the behavior of a browser as closely as possible • Web sites try to eliminate possibility of robot access to (critical) functions • referrer address • HTTP user agent field • Techniques motivated by advanced Internet download managers for web users (e.g., Getright, Net Transport) Dickson Chiu (2004)
Concluding Remarks • Flexible script language WebXcript • Simple, application oriented • Tailor-made primitives for db access, web dialogue and exception handling • Suitable for E-commerce environment • Easier to develop, understand, debug and maintain • WebXcript support can be plugged in other information systems or as standalone productivity tool • Wrapper for automatic Web Services provision Dickson Chiu (2004)
Continuing Work • Enhance the quality and scope of the software to become at least publicly available shareware products (FYP with HKU) • Implement full functionality and ensure reliability of the WebXcript engine • Identify other necessary extensions to the language • Script development tools • Recording, monitoring and debugging tools • Displaying form and db fields for drag-and-drop • Script development methodology • Large showcase applications • Mobile extensions • Applicability survey of popular Web sites • Efficiency and performance issues Dickson Chiu (2004)
Q&A For further details, see: • D.K.W. Chiu, Danny Kok, Alex Lee, and S.C. Cheung, “Integrating Heterogeneous Web Services with WebXcript,” Proceedings 27th Annual International Computer System and Applications Conference (COMPSAC 2003), Dallas, Texas, Nov 2003, pp 272-277. • Accepted paper for International Journal of Cooperative Information Systems (IJCIS). Dickson Chiu (2004)