440 likes | 582 Views
EDD Background. Electronic Data Discovery (EDD) is the systematic collection, processing and review of electronic files to support the litigation process. EDD is used in:Alleged stock-back datingGovernment reviews of mergers and acquisitionsOther dirty deals e.g. blackmail, fraud, embezzlement
E N D
1. Project PresentationDocument Optimization11 May 2007
Team members:
Chris Catalano
Chun-Yu Chang
Chris Joson
David Matthes
Hello… We’re the Forensic Doc Optimization group.
Introduce team membersHello… We’re the Forensic Doc Optimization group.
Introduce team members
2. EDD Background Electronic Data Discovery (EDD) is the systematic collection, processing and review of electronic files to support the litigation process.
EDD is used in:
Alleged stock-back dating
Government reviews of mergers and acquisitions
Other dirty deals e.g. blackmail, fraud, embezzlement
The current processing system was designed for component flexibility and variability.
The market place is shifting to an environment that holds speed and automation paramount.
3. Project Objectives Evaluate the current EDD system against two alternatives.
Client: Huron Consulting Group
Evaluate SysML as an effective modeling language for systems engineering.
Client: Aerospace Corporation
4. Approach
Modeled and compared three EDD systems in SysML.
Evaluated the EDD systems from a capital budgeting perspective
Evaluated quantitatively our experience with SysML.
5. Agenda Approach - SysML Model
Analysis - Trade Study
Evaluation - SysML Usability Thank you Chris.
Hi I’m Chun-Yu and I will be discussing the approach we took to meet the objectives given by our sponsors.
As Chris mentioned earlier, one of the team’s objectives is to analyze and compare the performance of 3 different EDD systems.
Because many of the team members did not have prior domain knowledge on electronic data discovery systems, we needed to find an approach to help us understand, capture, and communicate the intricacies of the EDD systems to each other as well as to the sponsors.
The approach we took is to capture and model the details of the EDD systems using SysML. Thank you Chris.
Hi I’m Chun-Yu and I will be discussing the approach we took to meet the objectives given by our sponsors.
As Chris mentioned earlier, one of the team’s objectives is to analyze and compare the performance of 3 different EDD systems.
Because many of the team members did not have prior domain knowledge on electronic data discovery systems, we needed to find an approach to help us understand, capture, and communicate the intricacies of the EDD systems to each other as well as to the sponsors.
The approach we took is to capture and model the details of the EDD systems using SysML.
6. Approach - SysML Model SysML is a modeling language that allows system designers to define, analyze and communicate different view points of a system with various diagrams in a model driven environment.
The document optimization team used several SysML diagrams to model and gain insights into the EDD systems.
I will be introducing the requirement diagram, the use case diagram, and the block definition diagram.
My team mate Chris Joson will discuss the activities diagrams a little bit later.
SysML is a modeling language that allows system designers to define, analyze and communicate different view points of a system with various diagrams in a model driven environment.
The document optimization team used several SysML diagrams to model and gain insights into the EDD systems.
I will be introducing the requirement diagram, the use case diagram, and the block definition diagram.
My team mate Chris Joson will discuss the activities diagrams a little bit later.
7. The Requirements perspective:
One of the first steps of systems engineering is to understand the requirements:
To capture the requirements of the EDD system. It provides the following information:
Hierarchy or requirements (requirements can be decomposed)
Traceability of requirements (requirements can be allocated to EDD components)
This allows us to verify if we covered all the EDD specifications.
The Requirements perspective:
One of the first steps of systems engineering is to understand the requirements:
To capture the requirements of the EDD system. It provides the following information:
Hierarchy or requirements (requirements can be decomposed)
Traceability of requirements (requirements can be allocated to EDD components)
This allows us to verify if we covered all the EDD specifications.
8. Contextual Perspective:
We want to understand how the EDD systems is used:
Provides us with a basic understand of the context of EDD. Shows how external entities interact with EDD.
It shows the following:
Customer provides data
Processing Team processes the data
Review Team reviews the results of the processed data
Contextual Perspective:
We want to understand how the EDD systems is used:
Provides us with a basic understand of the context of EDD. Shows how external entities interact with EDD.
It shows the following:
Customer provides data
Processing Team processes the data
Review Team reviews the results of the processed data
9. Structural Perspective:
We want to understand the role of each component in EDD.
To group and describe the characteristics and behaviors of components in the EDD System. This provides us with a understanding of the roles and functionalities each component plays in the EDD System.
Structural Perspective:
We want to understand the role of each component in EDD.
To group and describe the characteristics and behaviors of components in the EDD System. This provides us with a understanding of the roles and functionalities each component plays in the EDD System.
10. Activity Perspective:
This is our Activity model that shows the flow of activities performed in the current EDD Process. It starts off with getting the native files and creating a source inventory. Then if there’s email, an Extraction program extracts all the email from Lotus Notes, Microsoft Outlook, etc. All the email and edocs are compiled into working data (WD). Then any archived data are unarchived, an inventory is taken, and any duplicate emails, documents, files, pictures are culled. A worker copies the WD to servers and another software program searches and indexes all of the data. The searchable/indexed WD is then checked to see if there are any excel files. Then a Format XLS macro makes all the excel spreadsheets have the same format (i.e. comma separated). Express then extracts TIFFs and Texts. Then Genjob, Gen Output, Infomatik, and Branded all format the TIFFs and Texts into the final delivery format.
describes the activities of the EDD System
Activity Perspective:
This is our Activity model that shows the flow of activities performed in the current EDD Process. It starts off with getting the native files and creating a source inventory. Then if there’s email, an Extraction program extracts all the email from Lotus Notes, Microsoft Outlook, etc. All the email and edocs are compiled into working data (WD). Then any archived data are unarchived, an inventory is taken, and any duplicate emails, documents, files, pictures are culled. A worker copies the WD to servers and another software program searches and indexes all of the data. The searchable/indexed WD is then checked to see if there are any excel files. Then a Format XLS macro makes all the excel spreadsheets have the same format (i.e. comma separated). Express then extracts TIFFs and Texts. Then Genjob, Gen Output, Infomatik, and Branded all format the TIFFs and Texts into the final delivery format.
describes the activities of the EDD System
11. Partitions
13. Advantages of Alternative Process Fewer manual steps
Reduced probability of error
Simpler to maintain
Easier to train
Less rigid process
Shorter time to process documents
14. Agenda Approach - SysML Model
Analysis - Trade Study
Evaluation - SysML Usability We’re going to present the following topics.We’re going to present the following topics.
15. Net Present Value Probability Distribution The goal was to model the financial impact of each alternative over three years using Net Present Value (NPV).
NPV is a capital budgeting technique used to estimate and compare cash flows for competing systems and projects.
For each system the Net Cash Flow was decomposed, modeled, and run in a Monte Carlo simulation to generate NPV estimates.
The results are NPV probability distributions for each alternative
16. Net Present Value Compared to the baseline, the alternative systems increase the processing speed and the ability to accept projects. The trade off is increased costs.
Autonomy:
$2,000,000 initial cost
$250,000 annual maintenance cost
Attenex:
$500 per gigabyte processed operational cost
How does the increased ability to accept new projects and the increased costs impact the profitability of the systems?
17. NPV – Results
18. Conclusions & Recommendations The model shows that by increasing the opportunity to accept new projects the alternative systems can overcome the increased costs!
The future system for Huron will be a hybrid of the alternatives.
The process used for a particular project will be dependent on the clients’ requirements.
The baseline system, while slower, provides a reliable and cost effective solution.
For clients who choose higher speeds at higher costs Attenex would be an ideal fit. (Huron already owns licenses for the software!)
It is critical to spread the costs of Autonomy across the three EDD groups. In effect distributing the responsibility for recouping the investment!
19. Agenda Approach - SysML Model
Analysis - Trade Study
Evaluation - SysML Usability We’re going to present the following topics.We’re going to present the following topics.
20. Purpose of the SysML Evaluation Aerospace asked us to evaluate SysML to determine how effectively SysML and Rational System Developer worked
Evaluate SysML as a modeling language for designing systems
Evaluate SysML maturity
Determine how useful SysML is for systems engineering design and evaluation
Evaluate IBM Rational System Developer
Determine how well it supports SysML usage
21. Approach: Survey Created a Multi-Attribute Utility Assessment Evaluation Hierarchy survey
Survey contained 41 questions developed to assess the strengths and weaknesses of SysML and Rational System Developer
Questions were answered on a 1 to 5 Likert scale with 5 indicating a positive response
Surveyed 8 OR680 Students using SysML
Electronic Data Discovery (EDD)
Tactical Surveillance Satellite (TSS)
22. Multi-Attribute Utility AssessmentEvaluation Hierarchy Dr. Adelman provided the team with a multi-attribute utility assessment evaluation hierarchy. We are currently refining questions to make a questionnaire with ratings of 1 to 5 for all the SysML users to take so that we can attempt to measure SysML performance and usability.Dr. Adelman provided the team with a multi-attribute utility assessment evaluation hierarchy. We are currently refining questions to make a questionnaire with ratings of 1 to 5 for all the SysML users to take so that we can attempt to measure SysML performance and usability.
23. Utility Results
24. Survey Analysis SysML
Strengths
Overall respondents felt SysML was a good language
Scored well in usability and flexibility
Weaknesses
The main weakness in SysML is that it is difficult to learn
Respondents took 20-40 hours to become a functional user
Rational System Developer
Strengths
Rational System Developer scored highest in usability
Survey indicates that people found Rational System Developer fairly easy to use
Weaknesses
Survey indicted low scores for ease of training
The Interface and product quality also scored lower than other areas
25. Recommendation to Aerospace SysML
SysML is difficult to learn and will require investment in training and time
May not be practical for smaller systems or processes with limited complexity
However, if people are already trained, SysML diagrams ensure consistency and provide effective communication across multiple disciplines
Rational System Developer
Rational supported the creation of models and helped maintain consistency
Process descriptions were created and analysis performed using Rational and SysML
SysML is well suited for complicated systems with significant hierarchical decomposition, systems common in the National Security Space domain
26. Summary Huron asked us to evaluate their current EDD system and two alternatives
Used SysML and NPV to perform the analysis
Determined that the best solution is a mix of the current system for most clients and Autonomy for clients that require faster processing and can afford the increased cost
Aerospace asked us to evaluate SysML to determine how effectively it can support system engineering design and analysis
Conducted a survey to help answer this question.
The survey found that SysML is a useful tool,
but the learning curve is steep
27. Acknowledgements Heather Howard, Shana Lloyd, and Julie Street, Aerospace Corporation
Chris Genter, Huron Consulting Group
Professor Laskey, George Mason University
Sanford Friedenthal, Lockheed Martin
Professor Adelman, George Mason University
The TSS Team
David Alexander, Kevin Sadeghian, Siroos Sekhavat, and Tom Saltysiak
28. Future Work Optimize Parametric Diagram to make the model executable
Run executable model
Compare executable model results with results obtained from Microsoft Excel
Distribute SysML survey to future students for a larger sample and further analysis
29. Questions? Questions????Questions????
30. Backup Questions????Questions????
31. Decomposition of components to provide more detail.
What are the detail activities in the operations?
How the components in the systems performs the operations or activities will be described with the next set of diagrams (by chris).
What are the attributes used for?
We may choose to optimize some of the attributes for in the subcomponents of EDD.Decomposition of components to provide more detail.
What are the detail activities in the operations?
How the components in the systems performs the operations or activities will be described with the next set of diagrams (by chris).
What are the attributes used for?
We may choose to optimize some of the attributes for in the subcomponents of EDD.
32. Parametric Diagram Parametric Diagrams were created to express constraints between value properties and allow to perform an executable model.
Executable model used to provide analysis for performance, safety, reliability, throughput, weight, cost, etc.
High Learning Curve
Lack of Time (Estimation of >20+ additional hours to learn SysML limitations)
Inexperience with Simulation Toolkit (Estimation of >30+ hours to execute with toolkit)
Inexperienced team with Java (Estimation of >70+ hours to learn Java)
33. Questions focus on either SysML as a language or IBM Rational System Developer as a tool
Most questions will be rated on a scale of 1 to 5
Responses will be averaged together to determine a score for each category
Sample Questions
Overall, SysML improves the system design process.
Rational System Developer provides feedback when processing user commands.
SysML was easy to learn.
I can easily add model elements to the System model. Sample Survey Questions
34. Survey will have participant answer a series of questions
35. Webpage mason.gmu.edu/~cchang7 Transition to webpage development.Transition to webpage development.
36. This is a screen shot of our website. We have it running on our PCs. We just have to do the work to post this to a host site for the public. Demo of website to be performed in class from local PC.This is a screen shot of our website. We have it running on our PCs. We just have to do the work to post this to a host site for the public. Demo of website to be performed in class from local PC.
37. General Status
38. Schedule We’re still on schedule.
Attenex model needs more workWe’re still on schedule.
Attenex model needs more work
39. NPV Backup
40. NPV - Formula Where:
t – time
n – total project time
r – discount rate
Ct – net cash flow
Co – Initial capital expenditures at time zero
41. NPV - Assumptions Number of Projects Limitations: The number of projects entering into the system can not be greater than the maximum level of availability.
Projects Start and Completion Time: All projects started in a month are assumed to be completed within that month. In practice this assumption can be interpreted as larger scale projects are started early in the month while smaller projects are started later in the month.
Minimum Revenue: $2500 is the minimum amount of revenue accepted for a job.
Autonomy Costs: The Autonomy system has an initial cost of $2 million dollars and an operational cost of $250,000 annually.
Attenex Costs: The Attenex system has an operational cost of $500 dollars per GB processed.
Prospective Projects: The level at which prospective projects are found is consistent for all systems.
Availability Parameter: The availability parameter is being used to model the size and availability of the queue for incoming projects.
Pricing Scheme: The pricing scheme is constant for each system over the three year period. No adjustments have been made to the pricing schemes of the higher cost alternatives.
Migration Costs: With the exception of initial software costs, all migration costs are ignored in this model.
42. NPV- Revenue Inputs (1) Annual Revenue: The annual revenue is the sum of twelve monthly revenue estimates.
Monthly Revenue: The monthly revenue is the sum of the revenue for each job accepted and completed in a month.
Revenue per Project: The revenue per project is the amount of revenue in dollars that a generated by a project.
Projects Accepted: This value is the total number of projects entered into the system each month.
43. NPV- Revenue Inputs (2) Maximum level of System Availability: The maximum level of system availability is the largest number of projects that can enter into the system each month.
Number of Prospective Projects: The number of prospective projects describes the number of projects that are available to be entered into the system.
Number of Staff: The number of staff plays a critical role in limiting the number of jobs that can be entered into the system each month.
Processing Speed: Processing speed describes the rate at which projects can be pulled through the system.
44. NPV- Cost Inputs Initial Costs: The costs used to procure new software and equipment for the alternative systems at the onset of the migration. The initial costs are incurred once at the beginning of the project.
Maintenance Costs: Monthly costs associated with maintaining the software and hardware systems. The maintenance costs include repairing machines, software upkeep and spare parts.
Salary Costs: Monthly costs related to employee salaries.
Operational: Monthly costs related to procuring additional equipment, software and the overhead costs related to the building and facilities.
45. NPV – Parametric Diagram