1 / 21

EventCube

EventCube. Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08 / 1 0 / 13. The data we focus:.

dorjan
Download Presentation

EventCube

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, JiaweiHan 08/10/13

  2. The data we focus: Following a normal approach and landing to runway 4 in roc; aircraft was taxied clear of the end of runway to the gate .several ground snow removal vehicles were operating to left of aircraft so we moved to the right side of ramp.…. Huge Collection of Logs Each Document

  3. Power of Text-Rich Data Cubes More than 10 dimensions Contains huge details Most dimensions have hierarchy Each report tells a story Topic Analysis Keyword Search Sentiment Analysis Powerful Summarization! Real-time for huge data! Hierarchical Data Cube Text Analysis

  4. Power of Text-Rich Data Cubes Data Cube Rich Text Powerful Text Mining Efficient Summarization More than Simply Integration! Data Cube and Rich Text can mutually enhance each other!

  5. Power of Text-Rich Data Cube Slice Roll-up Drill-down Dice …

  6. Other features Hierarchical Dimension Selection : support multiple choices Multi-gram Summarization Contextual Search Keyword Frequency Distribution Similar Document Finding : based on Contextual Search

  7. Contextual Search • Motivation: • Every word/concept may have equivalent word/concept • “SVM” = “Support Vector Machine”, “Alt” = “Altitude” • Connections between words • “Kernel Method” - “SVM”, “altitude” – “flight level”

  8. Contextual Search • We develop a contextual search framework to build the word-net • Contains 4 different relationships: • A “Use” B: Equivalent terms, B is more common • A “RT” B: Related terms, not hierarchical • A “BT” B: B is the broader word • A “NT” B: B is the narrower word

  9. Contextual Search • Step 1: Generate word-net when uploading dataset. • Step 2: Return the related terms when inputing. • Step 3: Automatically include the equivalent terms when searching. • Step 4: Operator Support “AND”/”OR”/”NOT”

  10. Hierarchical Dimension Support • Multiple Choice Support • Each Dimension can support several levels • Powerful examples: • “B-737” VS. “B-747” • “Boeing” VS. “Airbus”

  11. Document List Result • Using the default Mysql“natural language full text search” • Extract the title based on the most relevant part. • Show tags of dimension values for target dimensions • Highlight the keywords

  12. Similar Document • Also contextual search • Step 1: Extract meaningful terms from the original report • Step 2: Using these terms as input, conduct contextual search.

  13. Top Cells • Search all the cells in the targeted dimensions, find the most relevant cells • A multi-dimensional cell ranking

  14. Single Dimension Distribution Based on Keywords

  15. Single Dimension Distribution Based on Keywords • Using a offline + online framework to calculate the distribution. • If Offline: • Combination of keywords are exponential • If Online: • Retrieve the whole corpus every time. • Strategy: • Store the single keyword distribution in the database. [Offline] • Combine the single ones to a new distribution online. [Online]

  16. Single Dimension Distribution Based on Keywords • Offline process: • Step1: Map equivalent terms into one. • Step2: Build both keyword reverse index and cell reverse index based on report • Step3: Compare these two reverse indexes and calculate the single term distribution. • Online process [with a list of terms and dimensions] • Step1: match each term into it’s equivalent term. • Step2: Calculate the combined distribution based on the independent assumption, for each dimension • Val(t1..tn) = 1 –π(1-val(ti));

  17. Topic Distribution • Based on Topic Cube • Applying topic model. • Support comparison between different cells

  18. Unigram/Multigram description • Based on Qiaozhu’s paper, “Automatic Labeling of Multinomial Topic Models” • Find multi-gram candidate from the whole text • Scoring it based on unigram • Adjust it based on it’s length

  19. Thinking • Data Cube: • Efficient Summary • Highly Structured Data. • Rich Text: • Topic Analysis, keyword search • Common: ASRS, IMDB, Publication-Net, News… • Network (HIN) • Good at mining, contains structural information. • No information loss

  20. Motivation of EventCube • Combine Data Cube with Rich Text. • Combine Summary with Keyword Search • Build a general search/analysis system for rich text cube data. • 1. Aviation Safety Reporting Data • Time, Weather, Location, Model…Flight logs • 2. Publication Data • Author, Conf, Time, Field, Affliation…Abstract • 3. IMDB • Time, Country, Style, Director…Description

  21. Thanks

More Related