1 / 26

CIS 895 – MSE Project

CIS 895 – MSE Project. KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on March 31 st , 2009 Naga Sowjanya Karumuri sowji@ksu.edu. Outline. Project Data Flow Diagram Action Items Architectural Design Test Plan Formal Inspection Checklist Project Plan

chun
Download Presentation

CIS 895 – MSE Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS 895 – MSE Project KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on March 31st , 2009 Naga Sowjanya Karumuri sowji@ksu.edu

  2. Outline • Project Data Flow Diagram • Action Items • Architectural Design • Test Plan • Formal Inspection Checklist • Project Plan • Prototype Demonstration • Questions / Comments

  3. Project Data Flow Diagram:Numerical Entity Searcher

  4. Modules in the Project • Webpage (JSP):For requesting and receiving information from the service. • POS Tagger (Java): Stanford POS Tagger • Numerical Phrase Extractor (Java): Implemented using Shallow Parsing Technique • Number-Unit/Date Pattern Recognizer (C++): Implemented based on the Numerical Quantifier developed by Benjamin Sapp, UIUC.

  5. Action Items • Implemented Numerical Phrase Extractor • Detailed Description of Test Plan • Wrote Formal Specification using USE • UML Representation of the System

  6. Architectural Design Service Oriented Architecture

  7. Package View Overall Package View Class Descriptions, Attributes and Operations are contained in Architecture Design Document

  8. Sequence Diagram

  9. Class Diagram(npe package)

  10. Class Diagram(ndpr package)

  11. Implementing Numerical Phrase Extractor • Input: Tagged Text • I/PRP lost/VBD thirty-three/JJ dollars/NNS in/IN 1998/CD • Regular Expressions are used to determine the numerical patterns in the input. • thirty-three/JJ dollars/NNS • in/IN 1998/CD • Output: Numerical Phrases • thirty-three dollars • in 1998

  12. Tagset

  13. Some Patterns • "\\d+-\\d+(/JJ|/CD) [a-zA-Z]+/NN" parses • "(between|Between|from|From|In|in|since| Since|during|During)/IN ..../CD (([a-zA-Z]+/CC|[a-z]+/TO) ..../CD)?” parses 'between 1987 and 1997', 'in 2007 and 2008’

  14. Assigning Bounds • Words that will be detected so as to set the bounds like >, <, ~, = • “ = ” is used if no words are mentioned

  15. Some Patterns • [a-zA-Z0-9]+/CD( percent/NN)?( out/IN)? of/IN( the/DT)?( [a-zA-Z]+/CD)?( [a-zA-Z]+/JJ)? [a-zA-Z]+(/NN|/NNS|/NNP) parses one of the five people two of the groups one of the rare cases 89 percent of people five of the seven former employees 3 out of 5 people

  16. Phrases that can be parsed

  17. Phrases that are not Currently Parsed Future Work: These phrases can also be parsed by adding more patterns to the current system but for now the most important and commonly occurring patterns are considered. Current goal is to develop a basic idea of numerical phrase extraction.

  18. Formal Specification • Created and validated using USE 2.3.1. • All Classes are specified • All important attributes and methods are specified • Constructor methods are not specified • Contained at the end of the Architectural Design Document

  19. Test Plan • Outputs are checked at each module by the developer by matching them to the results manually calculated • Check if the POS tagger has given the tagged text. • Check if the numerical phrases are extracted • Check if the numerical phrase is explained to Value, Unit and Unit-Type. • UML diagrams and the required specifications will be checked for consistency by two fellow MSE students • User interaction will be tested by the developer and the technical inspectors.

  20. Formal Inspection Checklist • The following items are to be checked: • The symbols used in the class diagram conform to UML standards • The symbols used in the sequence diagrams conform to UML standards • The classes in the class diagrams have corresponding descriptions provided in the Architecture Document • The descriptions of the classes in the Architecture Document are clear and concise • The classes in the USE model are consistent with those in the Architecture Document • All the requirements in the Software Requirements Specification have been covered in the Architecture Document • The multiplicities in the USE model have been depicted in the class diagram

  21. Project Schedule • Key Dates • Presentation 1: February 24th, 2009 • Complete Numerical Sub-Chunker • Presentation 2: March 31st , 2009 • Complete Numerical Phrase Extractor • Presentation 3: April 10th, 2009 • Patch up the modules • Develop a GUI • Set them up on the server • To completely submit the documents by April 13th, 2009 to the committee • Final Portfolio submitted by April 15th , 2009

  22. Project Schedule

  23. Prototype Demonstration • POS Tagger working • For now it works on the local machine • Numerical Pattern Extractor • For now it works on the local machine

  24. Phase 3 Deliverables • Action items • Component Design • Assessment Evaluation • Project Evaluation • User’s Manual • Formal Technical Inspection Checklists • Presentation 3 • Executable Project • Source Code

  25. To-Do List • Revise the Documents • Revise Project Schedule • Work on the Phase3 deliverables • Final Demo

  26. Questions?? Suggestions!! THANK YOU

More Related