COMS 6998-06 Network Theory Week 4: September 29, 2010

Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010. (27) Self-similarity. Similarity and self-similarity. Sierpinski Gasket.

COMS 6998-06 Network Theory Week 4: September 29, 2010

  1. COMS 6998-06 Network TheoryWeek 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010

  2. (27) Self-similarity

  3. Similarity and self-similarity

  4. Sierpinski Gasket See also Koch’s snowflake: http://en.wikipedia.org/wiki/Koch_snowflakehttp://www.arcytech.org/java/fractals/koch.shtml

  5. The Cantor set

  6. Measuring a fractal’s dimension • In the Sierpinski gasket example, we need at the first step 4 triangles of side ½, at the second step we need 3 such triangles, then at the third step we need 9 triangles of side ¼. • Let N(e) be the number of triangles with side 1/e . Then the fractal dimension is:

  7. Box counting N(1) = 1 N(1/2) = 3N(1/4) = N((1/2)2) = 9 = 32 N(1/8) = N((1/2)3) = 27 = 33 … N((1/2)n) = 3n. http://classes.yale.edu/fractals/FracAndDim/BoxDim/GasketBoxDim/GasketLogLog.html

  8. Effective fractal dimension • For a compact triangle: • At the beginning, D = ln4/ln2 • After one iteration, D = ln16/ln4 = 2 • For the Sierpinski gasket: D = ln3/ln2 = 1.5850 • For the Koch curve: D = ln4/ln3 = 1.2618 • For the Cantor set: D = ln2/ln3 = 0.6309

  9. A self-similar fern

  10. (7) Small world networks

  11. The idea of a small world • Milgram’s experiment (1960s) • Send a package to a stockbroker n Boston • 296 senders • 20% reached target • Chain length (avg) = 6.5 • Recent reenactment by Dodds et al. (2003) with 18 targets, 13 countries, 60K participants, only 384 reached the target with path length of 4.

  12. The Watts-Strogatz model • How to keep the diameter of a growing random graph small? • Simple model: starts with a regular lattice. • Two parameters: • Coordination number z: how many neighbors each node has • Shortcuts probability p: for an existing edge, the probability to draw a shortcut between two random nodes • Total number of shortcuts is mp=nzp/2

  13. The Watts-Strogatz model

  14. Diameter • Example (Amaral and Barthelemy, 1999): d=1, N=1000, z=10, p=0.25: d=3.6 • If p=0.016 (=1/64), the diameter d=7.6

  15. Clustering coefficient • It mirrors the underlying lattice structure. • According to (Barrat and Weigt, 2000) • In the limit, C=3/4

  16. Properties For lattices For random graphs

  17. Degree distribution From (Barrat and Veigt, 2000)

  18. Kleinberg model • Use geographical distance (e.g., p ~1/d2)

  19. HW 1 • Analyze a network data set • Submit a PR-style 6 page paper • Check class home page for examples and instructions • Model papers • How to become a superhero, P. M. Gleiser, J. Stat. Mech. (2007) P09020 http://arxiv.org/abs/0708.2410 • The Political Blogosphere and the 2004 U.S. Election: Divided They Blog (2005) http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf • Patterns in syntactic dependency networks, Ramon Ferrer Cancho, Ricard V. Solé, and Reinhard Köhler, PHYSICAL REVIEW E 69, 051915 (2004) http://complex.upf.es/~ricard/syntaxPRE51915.pdf • Network properties of written human language, A. P. Masucci and G. J. Rodgers, Phys. Rev. E 74, 026102 (2006) http://arxiv.org/abs/physics/0605071 • An evaluation of human protein-protein interaction data in the public domain, BMC Bioinformatics 2006, 7(Suppl 5):S19http://www.biomedcentral.com/1471-2105/7/S5/S19/abstractDatabase: This database is hand-curated. There are around 25,000 proteins and 35,000 interactions http://www.hprd.org/download

  20. Examples • program committees of conferences in NLP/CL or IR or ML • Skitter (http://www.caida.org/tools/measurement/skitter/) • syntactic dependencies • mentions of named entities in text • wikipedia • social networking sites such as myspace, facebook, linkedin, etc.. • product recommendations for sites such as amazon, ebay, clothing sites etc.. • youtube related videos • adjective/noun network • Two words are connected if one appears in the directory definition of another. • analyze the AAN author network, collaboration network, or title network (two paper titles are connected if they share a non-stop word) • people or locations that are mentioned in the same news story • collocation networks (Dorogovtsev and Mendes) • co-occurrence or other sentence graphs • concept, thesaurus, and association graphs • citation • Web Related • similarity-based (e.g., cosine) • http://www.nd.edu/~networks/resources.htm • http://deim.urv.cat/~aarenas/data/welcome.htm • http://www-personal.umich.edu/~mejn/netdata/ • http://www.sciencemag.org/cgi/content/full/302/5651/1727

