1 / 11

Web Data Management Dr. Daniel Deutch

Web Data Management Dr. Daniel Deutch. Web Data. The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of challenges Web data is huge, unstructured, heterogonous, partially incorrect.. Just the ingredients of a fun topic!. Goals.

gay
Download Presentation

Web Data Management Dr. Daniel Deutch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Data ManagementDr. Daniel Deutch

  2. Web Data • The web has revolutionized our world • Data is everywhere • Constitutes a great potential • But also a lot of challenges • Web data is huge, unstructured, heterogonous, partially incorrect.. • Just the ingredients of a fun topic!

  3. Goals • Searching for relevant web-pages • E.g. given keywords • Understanding the results • Ranking the results • Combining results from different sources • E.g. Social networks +Search history • Combining rankings • Recommendations • Movies, restaurants..

  4. Types of Data On the Web • Text • XML • Tables • Hyperlinks • Semantic tags • …

  5. Challenges • Scale • The web is huge.. • Heterogonous sources • Different models and analysis techniques need to be designed • Uncertainty • A lot of errors (intentional or not) in data • A lot of errors in understanding data • Probabilistic modeling will be needed

  6. Ingredients (Unordered) • Web Data Types • Semi-structured • Structured • Unstructured • Modeling & Storage • XML, text and relational DB representation • XML Typing & querying • Text models • Search and Retrieval • Crawling • Querying • Information Retrieval and Extraction (basics)

  7. Text Analysis • POS tagging • Ranking • HITS algorithm • Google PageRank • Rank Aggregation and Top-K algorithms • Recommendations • Collaborative Filtering • The NetFlix Million Dollars Challenge

  8. Semantic Web • Onthologies • Data Integration • Deriving semantic information • Wikipedia as an example • Web Services and Business Processes • BPEL, WSDL standards • Orchestration • Mashups • Analysis

  9. Advanced Topics (time permitting) • Querying the deep web • Online advertisements • Models • Algorithms • Distributed Data Management • MapReduce and PigLatin

  10. Resources • Web-site • Accessible from http://cs.tau.ac.il/~danielde • Slides, exercises, links.. • Book • http://webdam.inria.fr/Jorge/index.php • Free full version available online • Papers • Links will be available when relevant

  11. Your Duties • 70% Final Exam • 30% Exercises • Including programming tasks

More Related