310 likes | 475 Views
Central Data Exchange Environmental Information Exchange Network. Exchange Network Enhancements By David Fladung April 19, 2006. Agenda. CDX Overview Open Source Utilization Data Transformation (Mapper) Business Process Execution Language (BPEL) Rich User Interface (RUI) client
E N D
Central Data ExchangeEnvironmental Information Exchange Network Exchange Network Enhancements By David Fladung April 19, 2006
Agenda • CDX Overview • Open Source Utilization • Data Transformation (Mapper) • Business Process Execution Language (BPEL) • Rich User Interface (RUI) client • Geographic Data Interaction
Open Source Utilization • CDX utilizes about 50 open source products/frameworks • JBoss (Wind River Node application server) • PostgreSQL (Wind River Node database) • Struts (Model View Controller [MVC]) • Hibernate (Object Relational Mapping [ORM]) • Axis (WS engine and libraries) • Maven (build and release management) • AspectJ (quality of service) • StAX (streaming parsing of large XML) • Velocity (templating/mapping) • Quartz (job scheduling) • ActiveBPEL (business process management)
Open Source Utilization Yellow – current open source implementation Grey – potential for open source implementation White – not applicable
Open Source Utilization • Advantages • Low Total Cost of Ownership (TCO) • Rich user community • Adequate documentation • Proven performance • Promotes rapid development • Easy to integrate • Disadvantages • Potential that product may no longer be supported • Advanced support may require cost
Data Transformation • Convert from one data format to another • XML • Flat file (i.e. delimited) • Database • Handle large file sizes • Use streaming approach rather than in memory • Provide a robust and reusable interface • Standard configuration files • Standard APIs • Reusable across multiple tiers
Data Transformation • TRI OUT – flat file to XML • NC Node – database to XML for Beaches and NEI data • Puerto Rico Node – flat file to XML for AQS data • Wind River Node – database to XML for AQS • Geo Toolkit for Region 5 – XML to XML for Geo data • EnviroFlash – flat file to unstructured email (text) • TRIME (XML to database) • Water Sentinel (database to XML, XML to database) • GLNPO (database to Excel, database to XML)
Data Transformation Yellow – current use of mapper implementation White – not applicable
Data Transformation • Architecture • Mapping engine • Run the transformation process • Built on the Velocity open source project • Configuration files • Mapping instructions • Location of the data sources and data targets • Conditional logic, custom methods • Custom Java methods - provides the custom transformation such as data formatting. • Pluggable readers • Pluggable writers
Data Transformation • Mapping steps • Logical mapping • The process of analyzing the data source and the data target and creating the document that specifies the relations between the source and target fields. • If the data source is relational database, this process includes developing the query to extract the data from the database. • Physical mapping - the process of creating the configuration files to implement the logical mapping specifications. • Custom methods (if needed)
Data Transformation • Database to XML (Puerto Rico Node) • ## Database Query • #set ($sqlQuery = "select distinct TRANSACTION_TYPE, ACTION_CODE, STATE_CODE, COUNTY_CODE, SITE_ID from ${tableName}RA where ACTION_CODE = 'D' and TRANSACTION_TYPE = 'RA'") • ## Set Reader properties • #set ($tmp = $MapperEngine.setMapReaderProperty('SQL_COMMAND', $sqlQuery ) ) • #set ($tmp = $MapperEngine.setMapReaderProperty('ENCODING', 'XML_ENCODING') ) • ## Loop for each record in result set • #foreach($row in $MapperEngine.getIterator()) • ## Write XML • <aqs:ActionRawDataDelete> • <aqs:SiteIdentifierDetails> • ## Use value from record as a variable • <aqs:StateCode>$!row.STATE_CODE</aqs:StateCode> • <aqs:CountyCode>$PRFunctions.getNumberDigitStr($!row.COUNTY_CODE , 3)</aqs:CountyCode> • <aqs:SiteNumber>$PRFunctions.getNumberDigitStr($!row.SITE_ID , 4)</aqs:SiteNumber> • </aqs:SiteIdentifierDetails> • ## Call subsequent execution • #set( $config = $MapperEngine.createMapperConfiguration() ) • #set ($tmp = $!config.ContextConfig.put( 'SITE_ID', $!row.SITE_ID )) • #set ($tmp = $!config.ContextConfig.put( 'tableName', $tableName )) • #set ($tmp = $!config.ContextConfig.put( 'subs', 'PRMonitorDeleteRAMap' )) • $MapperEngine.subExecute('MapperServices/PR/PRDBReadConfig.vm', 'MapperServices/PR/PRMonitorDeleteRAMap.vm', $config) • </aqs:ActionRawDataDelete> • #end
Data Transformation • Flat file to unstructured text through custom Java (EnviroFlash) • ## Column names for delimited text file • $MapperEngine.setMapReaderProperty('COL_NAMES_LIST',['CITY','COUNTY','STATE','UV_INDEX','UV_ALERT']) • ## Delimiter • $MapperEngine.setMapReaderProperty('DELIMITER','\|') • ## Loop for all records in text file • #foreach($row in $MapperEngine.getIterator()) • #if($templateCallback.isCitySubscribedTo($row.STATE, $row.CITY, $row.COUNTY)) • ## Use values from record as variable • #set( $config = $MapperEngine.createMapperConfiguration() ) • #set ($tmp = $!config.ContextConfig.put( 'CITY', $row.CITY ) ) • #set ($tmp = $!config.ContextConfig.put( 'COUNTY', $row.COUNTY ) ) • #set ($tmp = $!config.ContextConfig.put( 'STATE', $row.STATE ) ) • #set ($tmp = $!config.ContextConfig.put( 'UV_INDEX', $row.UV_INDEX ) ) • #set ($tmp = $!config.ContextConfig.put( 'UV_ALERT', $row.UV_ALERT ) ) • #set ($tmp = $!config.ContextConfig.put( 'subscriberURL', $subscriberURL ) ) • #set ($tmp = $!config.ContextConfig.put( 'environmentName', $environmentName ) ) • #set ($tmp = $MapperEngine.subExecute('gov/epa/cdx/enviroflash/uv/templates/writeUVMailConfig.vm', 'gov/epa/cdx/enviroflash/uv/templates/writeUVMailMap.vm', $config) ) • #set ($outMail = $!MapperEngine.getObjectCacheMap().get('OUT_MAIL') ) • #set ($tmp = $templateCallback.sendEmail($outMail, $row.STATE, $row.CITY, $row.COUNTY, $row.UV_ALERT) ) • #end • #end
Data Transformation • Advantages • Provides an ability to concentrate mapping logic within the configuration file and custom methods. • Provides ability to handle several data source types. • Provides an ability to decouple readers and writers. • Provides streaming capabilities to handle large size files (tested against 680 MB). • Provides an ability to use custom Java methods. • Does not require license fee. • Requires minimum coding. • Superior performance compared to commercial tools (XAware, BEA Liquid Data) - 30 times faster on large data sets. • Uses streaming approach for low memory overhead.
BPEL • BPEL is a standard for orchestrating Web Services. • XML based description of a business process • Contains references to supporting WSDL files • Portable between BPEL engines • BPEL allows for a formal specification of business processes. • BPEL meshes well with Service Oriented Architectures (SOA). • BPEL provides several useful constructs • Transaction context management • Synchronous and asynchronous web service invocation and response • Conditional branching • Parallel flow activities • Fault handling and exception invocation
BPEL • BPEL within CDX • Motivations • Can it simplify the design of existing dataflows? • Can it reduce the cost of dataflow development? • Can it speed up the process of integrating CDX Web and Node applications? • Can it provide better visibility into existing flows? • Goals • Identify a target platform. • Demonstrate feasibility of deployment/integration. • Demonstrate ability to reuse existing CDX services. • Determine if BPEL allows for quick development of dataflow components.
BPEL • Prototype specifics • Exposed generic CDX services (Java) as Web Services • XML validation • Retrieval of transaction/document metadata • Created a CDX Services project to host the web services • Model existing National Emissions Inventory (NEI) dataflow. • Enhance CDX infrastructure to support use of BPEL orchestration. • Configure a production-like environment to host the services. • Deploy ActiveBPEL engine (deployed within Tomcat) • Set up persistence of processes (Oracle DMBS)
BPEL • Findings • BPEL prototype demonstrates feasibility in the EPA environment. • Appears that cost savings could be realized for future flows as the CDX service suite increases, however, it is not yet clear what the savings are. • Learning curve is not insignificant • Tools have not yet reached full maturity.
RUI Client • Guidelines • Provide more features/capabilities than a web application is capable of delivering. • Provide flexible configuration for interaction with multiple Nodes. • Support all existing Exchange Network Web Services and dataflows. • Provide pluggable transformation/visualization for multiple dataflows (Mapper, XML binding). • Use NAAS for authentication/authorization.
RUI Client • Current capabilities • Supports submit, download, and transaction history search • Supports configurable data transformation • Supports NAAS authentication/authorization • Future capabilities • Support query and data visualization • Add ability to sign/encrypt documents (CROMERR)
Geographic Data Interaction • Some dataflows have geographic data (e.g. FRS) • Provide the capability to visualize data • Provide the capability to update the data • API’s exist for addressing geographic data • Google Maps • ESRI products suite • CDX approach • Integrate Google Maps API into CDX web applications • Provide end to end solution for querying and updating data