1 / 11

Programming HW2

Programming HW2. Web Retrieval and Mining spring 2014. PAGERANK . Until ( Euclidean distance ) In this assignment, use in your program For those nodes with no outgoing links Assume they connect to all other nodes

yelena
Download Presentation

Programming HW2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming HW2 Web Retrieval and Mining spring 2014

  2. PAGERANK • Until (Euclidean distance) • In this assignment, use in your program • For those nodes with no outgoing links • Assume they connect to all other nodes • You need to avoid adding edges explicitly to every node for those nodes with zero out degree

  3. DATASET Note that the given datasets are very sparse, you can’t implement the algorithm storing them in dense matrix

  4. Input graph example • #maxnode 6 • 1:3 2 4 6 • 2:3 4 5 6 • 3:1 4 • 4:2 1 6 • 5:2 4 6 • Each line is node_id:out_degreenode_a, node_b .. • The third line means (node 3)->(node 4)

  5. Output Example • 1:1.03486 • 2:0.695335 • 3:0.402125 • 4:1.48879 • 5:0.599137 • 6:1.77971 • Each line is node_number:pagerank_score

  6. Evaluation • We release our answer for data “stanford.edu (Aug. 2003)” • We provide a program to evaluate the L1-norm and Spearman’s rho taking two pagerank vector as inputs. • $ ./eval-pagerankanswer.pagerankstudent.pagerank • L1 norm: 0.332361 • Spearman's rho: -0.561746

  7. Requirement(PageRank) • Implement the PageRank algorithm • Run your programon CS stanford.edu (Oct. 2004), stanford.edu (Aug. 2003), and 03-2006-wk3 and upload results • Damping factor

  8. Report • How do you implement the algorithm on sparse graph • What do you find in this task

  9. Upload Format • Zip all your files with your student id as file name • R00922XXX.zip • +---R00922XXX •  +---REPORT.pdf •  +--- R00922xxx.pagerank1 // CS stanford.edu (Oct. 2004) •  +--- R00922xxx.pagerank2 // stanford.edu (Aug. 2003) •  +--- R00922xxx.pagerank3 // 03-2006-wk3 •  +---compilePR.sh //script file to compile PageRank •  +---executePR.sh //script file to execute PageRank • ./executePR.sh [-d DAMPING FACTOR] [-e EPSILON] [-o OUTPUT] INPUT •  All the other files and source codes

  10. Grading • PageRank – 10% • Report – 5% • Deadline – two weeks

  11. Contact • Ptt2: WebMining • E-mail: • 韓鯤偉(r01944031@csie.ntu.edu.tw) • 林哲毅(r01922027@csie.ntu.edu.tw) • 劉彥均(r01922033@ntu.edu.tw) • Lab 302網路資訊檢索與探勘

More Related