1 / 22

INF 141 Course summary

INF 141 Course summary. Crista Lopes. Lecture Objective. Know what you know. Problem Space of this course. “Big Data” How to collect it index it search it for relevant information. Industry segment of this course. Search engines Google MS Bing nameless others

mendel
Download Presentation

INF 141 Course summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INF 141Course summary Crista Lopes

  2. Lecture Objective Know what you know

  3. Problem Space of this course • “Big Data” • How to • collect it • index it • search it for relevant information

  4. Industry segment of this course • Search engines • Google • MS Bing • nameless others • Web information retrieval is big $$$

  5. Technical content of this course Engineering Math

  6. Lecture 2 • Search engines history • Search & advertising on the Web • Web corpus

  7. Lecture 3 • Characteristics of the Web • duplication, linkage, spam • how big • rate of change • evolution • Characteristics of Web search Users • reasons for searching • characteristics of queries used • behavior towards results • the need behind the query

  8. Lecture 4 • Search Engine Optimization

  9. Lectures 5 and 6 • Web crawling • architecture of a crawling infrastructure • algorithms • constraints

  10. Lectures 7, 8 and 9 • Index construction • what index is • efficient data structures • efficient algorithms for constructing it

  11. Lecture 10 • Map Reduce • Index compression

  12. Lecture 11 • Retrieval • boolean • zones • TF metrics

  13. Lecture 12 • Ranked Retrieval • weighting fields

  14. Lecture 13 • Better scoring • TF-IDF • Corpus-wide statistics

  15. Lecture 14 • Vector Space model • Score by magnitude (euclidian distance) • Score by angle (cosine distance)

  16. Lecture 15 • Language statistics • Language processing • tokenizing • stemming • stopping • Link analysis • PageRank

  17. Lecture 16 • Hadoop

  18. Lecture 17 • Retrieval performance: precision and recall • Latent Semantic Analysis • Singular Matrix Decomposition

  19. Lecture 18 • Retrieval on LSI • Use of Latent Dirichlet Allocation (LSA)

  20. All together • Search engines history • Search & advertising on the Web • Web corpus • Characteristics of the Web • Characteristics of Web search Users • Search Engine Optimization • Web crawling • Index construction • Index compression • Map Reduce • Boolean retrieval • Parametric retrieval • Scored retrieval • TF-IDF and corpus-wide statistics • Language statistics • Language processing • Link analysis (PageRank) • Hadoop • Precision and Recall • LSI (and LDA)

  21. Big Data jobs • plenty… • not just traditional search • making sense of data • search on google

  22. Where to go from here • Data mining • Machine learning

More Related