460 likes | 471 Views
Querying, Exploration, and Analytics of Entity Data Graphs. Chengkai Li Dept. of Computer Science and Engineering University of Texas at Arlington Nanjing University, Aug. 20th, 2012. Chengkai Li. Assistant Professor Department of Computer Science and Engineering
E N D
Querying, Exploration, and Analytics of Entity Data Graphs Chengkai Li Dept. of Computer Science and Engineering University of Texas at Arlington Nanjing University, Aug. 20th, 2012
Chengkai Li Assistant Professor Department of Computer Science and Engineering University of Texas at Arlington http://ranger.uta.edu/~cli cli@uta.edu • Research Areas Databases, Web Data Management, Data Mining, Information Retrieval • Specific Topics computational journalism, database exploration, database testing, entity search and query, OLAP and data warehousing, query processing and optimization, ranking and skyline queries, and Web search/mining/integration
Ph.D. Work • University of Illinois at Urbana-Champaign, 2007 (advisor: Kevin Chang) • Ranking and Top-k Queries • RankSQL system and ranking algebra • ranking aggregates • integration of ranking and clustering • Deep Web Data Integration • XML Query Processing
The Innovative Database and Information Systems Research (IDIR) Lab • Faculty Chengkai Li • PhD students Naeemul Hassan, Nandish Jayaram, Afroza Sultana, Ning Yan, Gensheng Zhang • MS students Mahesh Gupta, Jijo Philip • BS students Raju Karki, Feifan Meng, Khuong Nguyen • Alumni Mahbubur Rahman, Xiaonan Li, Avinash Bharadwaj (MS, 2011, Copper Labs), Aditya Telang (Ph.D., 2011, co-advised with Sharma Chakravarthy, IBM Research India), Jared Ashman (MS, 2010, Ambit Energy), Ebrahim Cutlerywala (MS, 2010, Google), Quazi (Sunny) Hasan (MS, 2010, Dematic), Angus Helm (BS, 2010), Rakesh Ramegowda (MS, 2010), Aakash Tuli (BS, 2010), Muhammad Safiullah (MS, 2008, Microsoft) • Collaborators Pankaj K. Agarwal (Duke), Sharma Chakravarthy (UTA), Sarah Cohen (Duke), Christoph Csallner (UTA), Gautam Das (UTA), Chris Ding (UTA), Ramez Elmasri (UTA), Leonidas Fegaras (UTA), Bin He (IBM Almaden), Ping Luo (HP Labs), Min Wang (HP Labs), Xifeng Yan (UCSB), Jun Yang (Duke), Cong Yu (Google), Nan Zhang (George Washington U.)
Projects • WebEQ: Querying and Exploration of Web Text • Entity-Relationship Queries • Faceted Search • Usability of Query Systems over Entity Data graphs • Graph Query by Example • Faceted Exploration of Data Graphs • Graph Query Algebra • Computational Journalism • Prominent Streak Discovery • One-of-the-Few Objects • Significant Fact Finding • Database Testing • Dynamic Symbolic Testing of Database Applications • Testing MapReduce Programs • Misc. • Skyline Groups • Set Queries • Ranking in Web Databases
Demo http://idir.uta.edu/facetedpedia
Afroza Sultana, Quazi Hasan, Ashis Biswas, Soumyava Das, Habibur Rahman, Chris Ding, Chengkai Li: Infobox Suggestion for Wikipedia Entities. CIKM 2012, poster paper. • Xiaonan Li, Chengkai Li, Cong Yu: Entity-Relationship Queries over Wikipedia. ACM Transactions on Intelligent Systems and Technology (TIST), 2012. • Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, Gautam Das: Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia. WWW 2010: 651-660 • Ning Yan, Chengkai Li, Senjuti B. Roy, Rakesh Ramegowda, Gautam Das: Facetedpedia: Enabling Query-Dependent Faceted Search for Wikipedia. CIKM 2010: 1925-1926. Demonstration Description. • Xiaonan Li, Chengkai Li, Cong Yu: EntityEngine: Answering Entity-Relationship Queries using Shallow Semantics. CIKM 2010: 1927-1928. Demonstration Description.
Demandata: Data-Driven Computational Investigative Journalism
Demandata: Data-Driven Computational Investigative Journalism • Explore the young field of computational journalism • Build sites/apps/systems with societal impact. • Apply and invent techniques of database systems, text/data mining, Web database, visualization, social computational systems, cloud computing 60 99 "When I was mayor of New York City, I encouraged adoptions. Adoptions went up 65 to 70 percent..." … the Kings became just the third of those 96,000-plus teams to have a game in which they produced both so few points (60) and such a low shooting percentage (25.6%)… • We will build systems to automatically discover such fascinating facts and narrate them. • We will build systems to automatically check if his statement is reliable.
Prominent streaks • “This month the Chinese capital has experienced 10 days with a maximum temperature in around 35 degrees Celsius – the most for the month of July in a decade.” • “The Nikkei 225 closed below 10000 for the 12th consecutive week, the longest such streak since June 2009.” • “He (LeBron James) scored 35 or more points in nine consecutive games and joined Michael Jordan and Kobe Bryant as the only players since 1970 to accomplish the feat.” • “Deron williams was the first player in NBA history that achieved 20+ points 10+ assists in the first 5 games of a series.”
Example Prominent Streaks • “In Melbourne, Australia, during the years between 1981 and 1990, the weather had been pleasant. There had been more than two thousand days with minimum temperature above the zero point, and the streak was not ending. (We do not have data beyond 1990.) The longest streak during which the temperature hit above 35 degrees Celsius is six days. It was in the summer of the year 1981.” • “More than half of the prominent streaks we found in the traffic data of the Lady Gaga Wikipedia page were around September 12th, when she became a big winner in the MTV Video Music Awards (VMA) 2010. During that time, the page had been visited by at least 2000 people in every hour for almost four days.”
“One of the Few” Claims Sports: Karl Malone is ONE OF THE ONLY TWO players in NBA history with 25,000 points, 12,000 rebounds, and 5,000 assists in one’s career Politics: He is ONE OF THE ONLY THREE candidates who have raised more than 25% from PAC contributions and 25% from self-financing • Do these claims really hold water? • How do we find truly interesting claims or individuals?
One-of-the-Few => k-Skyband He is ONE OF THE ONLY TWO players with 25,000 points, 12,000 rebounds, and 5,000 assists
One-of-the-Few => k-Skyband 1-skyband 2-skyband
Chamberlain Pettit Baylor Abdul-Jabbar Bird James Johnson Robertson Jordan Stockton
You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, Cong Yu: On “One of the Few” Objects. KDD 2012. • Xiao Jiang, Chengkai Li, Ping Luo, Min Wang, Yong Yu: Prominent Streak Discovery in Sequence Data. KDD 2011, pages 1280-1288. • Sarah Cohen, Chengkai Li, Jun Yang, Cong Yu: Computational Journalism: A Call to Arms to Database Researchers. CIDR 2011, pages 148-151.
Chengkai Li, Nan Zhang, Naeemul Hassan, Sundaresan Rajasekaran, Gautam Das: On Skyline Groups. CIKM 2012, short paper.