1 / 23

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London rui.carvalho@ucl.ac.uk m.batty@ucl.ac.uk http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL). Motivation.

Download Presentation

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London rui.carvalho@ucl.ac.uk m.batty@ucl.ac.uk http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI),Isaac Councill (PSU),Chris Brunsdon (Leicester),Ben Gimpert (UCL)

  2. Motivation • Why Geography? • Scientists: who can I collaborate with in my city/country? • Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal? • Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL); • Preprint server challenges: • [USA] NIH-funded investigators are required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005); • [UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication; • Data mining challenges: • Processing of large databases give promise to uncover knowledge hidden behind the mass of available data; • Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided; • Statistical Challenges: • Conventional wisdom holds that (geographical) spatial point processes have characteristic scales... • Yet most “real world” phenomena are often far from equilibrium. PNAS, 6 April 2004

  3. Plan • Open Archives Datasets: • Citeseer (Computer Science); • arXiv.org (mainly Physics, but also Maths and CS) • Geographical Datasets: • The US census bureau makes available on the web datasets for geocoding, but Europe lacks a unified ‘open-access’ database; • Plan: • Extract ZIP codes from authors’ addresses; • Map research centres geographically; • Questions about the research centres: • How productive are they? • Are there non-trivial spatial structures at a geographical scale?

  4. Plan • Open Archives Datasets: • Citeseer (Computer Science); • arXiv.org (mainly Physics, but also Maths and CS) • Geographical Datasets: • The US census bureau makes available on the web datasets for geocoding, but Europe lacks a unified ‘open-access’ database; • Plan: • Extract ZIP codes from authors’ addresses; • Map research centres geographically; • Questions about the research centres: • How productive are they? • Are there non-trivial spatial structures at a geographical scale? Can Statistical Physics Help?

  5. What is Citeseer? • Founded by Steve Lawrence and C. Lee Giles in 1997 (NEC); • Now at Penn State http://citeseer.ist.psu.edu/ • Archive of computer science research papers harvested from the web and submitted by users; • Currently (Dec 2005) contains over 730,000 documents; • Citeseer was developed as a model for Autonomous Citation Indexing, i.e. citation indexes are created automatically; • Can search content in postscript and PDF files.

  6. Data Collecting and Parsing • Citeseer metadata: • 525,055 computer science research papers; • 399,757 (76.14%) of which are unique; • 103,172 (25.81%) of the unique papers have one or more US authors; • 2,975 different ZIP codes in the unique papers belong to the US conterminous states (48 states, plus the District of Columbia); • 5 most productive ZIP codes: • Count: 3950 Zip: 15213 Carnegie Mellon Univ, Pittsburgh PA; • Count 3403 Zip: 02139 MIT, Cambridge, MA; • Count: 2954 Zip: 94305 Stanford Univ, CA; • Count: 2691 Zip: 94720 Univ California at Berkley, CA; • Count: 2309 Zip: 61801 Univ Illinois at Urbana Champaign, IL

  7. Q1: How productive are the research centres?

  8. Q2: Non-trivial spatial structures?

  9. The Geography of Citeseer

  10. Cartograms Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004) Density-equalizing map projections: Diffusion-based algorithm and applications Michael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)

  11. Cartograms Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004) Density-equalizing map projections: Diffusion-based algorithm and applications Michael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)

  12. Cartograms Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

  13. Cartograms Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

  14. Spatial Point Processes • Moments: • First moment: ρ, expected number of points per unit area; • Second moment: Ripley’s function. ρK(r) is the expected number of points within distance r of a point. • For a Poisson process, ; • But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.

  15. The Two-Point Correlation Function • The two-point correlation function describes the probability to find a point in volume dV(x1) and another point in dV(x2) at distance r = |x1-x2|; • For a Poisson process g(r)=1; • Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.

  16. Computation of the Two-Point Correlation Function Intersection with border gives more than one polygon Geographical range at which the two-point correlation function can be approximated by a power-law

  17. Two-Point Correlation Function

  18. Speculation: knowledge diffusion?

  19. Speculation: Universality?

  20. To find out more • http://www.casa.ucl.ac.uk/ • Spatially Embedded Complex Systems Engineering (SECSE): http://www.secse.net/ members: UCL, Leeds, Southampton, Sussex • rui.carvalho@ucl.ac.uk m.batty@ucl.ac.uk

  21. Plot of state R&D expenditure (NSF) vs population

  22. Poisson Point Process • We say that a spatial process is completely random iff: • The number of events in any planar region A with area |A| follows a Poisson distribution with mean λ |A|, where λ is the density of points; • For any two disjoint regions A and B, the random variables N(A) and N(B) are independent.

More Related