160 likes | 280 Views
“An automated tool designed to ease the pain of test creation and maintenance .”. Nil Weerasinghe Bryan Robbins Mohamed Ibrahim. Providing independent, vigorous regulation. Inviting active industry involvement & input. Educating & informing investors.
E N D
“An automated tool designed to ease the pain of test creation and maintenance.” Nil Weerasinghe Bryan Robbins Mohamed Ibrahim
Providing independent, vigorous regulation Inviting active industry involvement & input Educating & informing investors Actively supportingfirms’ compliance efforts About FINRA • Financial Industry Regulatory Authority • Largest independent regulator for all securities firms doing business in the U.S. • ~4,500 brokerage firms • ~163,500 branch offices • ~634,400 registered securities representatives Our Mission: Investor Protection. Market Integrity. Arial Body Copy Computerized certification and continued education. Series 7, 63 …etc. 1
FINRA Open Source Projects • Increase Community Involvement • FINRA Open Source Projects • http://finraos.github.io/ • DataGenerator • http://finraos.github.io/DataGenerator/ • JTAF-ExtWebDriver • http://finraos.github.io/JTAF-ExtWebDriver/ 2
How to get involved. • Use it • Extend it • Fork it • Discuss idea • Open ticket • Google group discussion • opensource@finra.org • Commit • DCO and ApacheV2 • Report bugs • Help document http://finraos.github.io/DataGenerator/ https://github.com/FINRAOS/DataGenerator 3
Agenda • What is the DataGenerator? • Demo. • Dependency Modeling • Pairwise Data Generation. • Current Limitations. • Re-architecture plan. • Questions 4
Video http://finraos.github.io/DataGenerator/ http://www.youtube.com/watch?v=Wxa1T0gp56k 5
Current Approach • Two ways to describe and generate datasets • Equivalence Classes + Combinations • Dependency Model + Graph Coverage • Both use Apache Velocity to generate output from templates DataSpec Outputs Model Datasets 6
Demo • Pairwise Combinations • Uses equivalence classes from DataSpec to populate datasets • All Paths • Uses annotations from graphical model to populate datasets DataSpec Model 7
Limitations of Current Approach • Limited set of graph annotations • Can only set variable values within model • No support for logic, pos/neg equivalence classes in current version • We need more powerful annotation • Logic often split across spec, model, and templates • Anything dynamic must be injected into Velocity template, as model and spec are both static • We need more dynamic evaluation • Performance considerations • Breadth-first enumeration doesn’t scale well as domain becomes more complex • We need more performant implementation 8
Re-architecting Data Generator • Replacing Visio with SCXML, an open standard to represent the state machine. <scxmlxmlns="http://www.w3.org/2005/07/scxml" xmlns:cs="http://commons.apache.org/scxml" version="1.0" initial="start"> <state id="start"> <transition event="RECORD_TYPE" target="RECORD_TYPE"/> </state> <state id="RECORD_TYPE"> <!-- Mandatory --> <onentry> <assign name="var_out_RECORD_TYPE" expr="set:{a,b,c}"/> </onentry> <transition event="REQUEST_IDENTIFIER" target="REQUEST_IDENTIFIER"/> </state> . . . 9
Re-architecting Data Generator • SCXML Allows for complex modelling using embedded EL <state id="PRODUCT_TYPE_CODE"> <!-- Mandatory --> <onentry> <assign name="var_out_PRODUCT_TYPE_CODE" expr="#ProductTypeCode_Cycle"/> </onentry> <transition event="OPTIONS_SYMBOLOGY_IDENTIFIER" target="OPTIONS_SYMBOLOGY_IDENTIFIER" cond="${var_out_PRODUCT_TYPE_CODE=='Derivatives-Options'}" /> <transition event="OPTIONAL_SECURITY_SYMBOL" target="OPTIONAL_SECURITY_SYMBOL" cond="${var_out_PRODUCT_TYPE_CODE!='Derivatives-Options'}" /> </state> . . . 10
Re-architecting Data Generator • SCXML Allows for complex modelling: A state can be written as a state machine itself • We’re using apache commons-scxml in out POC 11
Re-architecting Data Generator • Overcoming memory issues by enhancing the all-paths algorithm, use DFS with minimal memory overhead 12
Re-architecting Data Generator • Short demo: <scxmlxmlns=http://www.w3.org/2005/07/scxmlxmlns:cs=http://commons.apache.org/scxml version="1.0" initial="start"> <state id="start"> <transition event="RECORD_TYPE" target="RECORD_TYPE"/> </state> <state id="RECORD_TYPE"> <onentry> <assign name="var_out_RECORD_TYPE" expr="set:{a,b,c}"/> </onentry> <transition event="REQUEST_IDENTIFIER" target="REQUEST_IDENTIFIER"/> </state> <state id="REQUEST_IDENTIFIER"> <onentry> <assign name="var_out_REQUEST_IDENTIFIER" expr="set:{1,2,3}"/> </onentry> <transition event="MANIFEST_GENERATION_DATETIME" target="MANIFEST_GENERATION_DATETIME"/> </state> <state id="MANIFEST_GENERATION_DATETIME"> <onentry> <assign name="var_out_MANIFEST_GENERATION_DATETIME" expr="#{nextint}"/> </onentry> <transition target="end"/> </state> <state id="end"> </state> </scxml> 13
Re-architecting Data Generator • Restructure the code to allow Hadoop Map Reduce and Giraph to operate on it. • Data Generator won’t itself directly depend on Hadoop or Girpah, but will abstract the following: • Input: Allow input from files • Execution: Allow the execution from a middle state provided input variables • Output: Allow outputs to different formats text files, several files, gz. The user will be able to extend the output to support: sequence files, redshift, hbase 14