Notes on Final Project of MIR Course

Notes on Final Project of MIR Course Part I: Crawling Phase Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • Crawling the Dmoz directory • It has as taxonomic structure (Tree-like) • Each subdirectory by a group Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • This tree-like structure has two important components: • Internal Nodes (also known as “topics”) • Leaves (also known as “pages”) Topics Pages Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • Then each topic has a: • list of children (subtopics) • unique path to root node (supertopics) • description • list of related pages • And each page has: • A topic Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase Description of Current Topic The Current Topic (Node) • Each topic has some characteristics List of super topics List of subtopics List of Related Pages (Leaves) Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • Deliveries for first phase: • TopicNames.txt • Each line contains a topic number and the full name of that topic, separated by a tab character (i.e. 46 Top/Science/Agriculture ) • TopicDescs.txt • Each line contains a topic number and the description of that topic, separated by a tab character. For some topics, the description is a zero-length string. • TopicHierarchy.txt • Each line contains a pair of topic numbers (separated by a tab character). The first of these two topics is the parent of the second topic. Each topic has exactly one parent, except for the root (topic 0), which has no parent. Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • Deliveries for first phase: • DocUrls.txt • Each line contains a document number and its URL, separated by a tab character • DocTitles.txt • Each line contains a document number and its title, separated by a tab character • DocTopics.txt • Each line contains a document number and a topic number, separated by a tab character. This indicates that the document belongs to the given topic. Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • Deliveries for first phase: • Documents.zip • The contents of the documents seperately • A list of samples for each output file have been added to the Assignments page (for “Science” Subdirectory) Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • Naming contraction: • Names in each subdirectory start with a special character: Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • Then for each sub tree , generate numeric names for children in BFS search order. • i.e. in Science Subdirectory: Sample Topic Sample Page 1 L1 5 L4 L3 2 L2 4 L8 3 L5 L7 L6 Modern Information Retrival Course, Semantic web Research labratory

Crawling Phase • Assignments of subdirectories to groups: Modern Information Retrival Course, Semantic web Research labratory

Notes on Final Project of MIR Course

Notes on Final Project of MIR Course

Presentation Transcript

Final Project

HCI Course - Final Project

Final Exam ~ notes

Final Project

Final project

Final Project

Planning 10 Final Project: Course Reflection

CSCE 313: Embedded Systems Final Project Notes

Course Notes

Course Notes

FINAL YEAR PROJECT ON

MIR on Windows XP

Final Project

Final project

Some Ideas on Final Project

Course Notes

Discussion of Final Project

Final Notes on Growth and Saving

Project Planning for your End-of-Course FINAL PROJECT

FIN 390 WEEK 7 FINAL COURSE PROJECT