250 likes | 364 Views
Dipak Gupta (Political Science) Brian Spitzberg (Communication) Ming-Hsiang Tsou (Geography ) Li An (Geography) Jean Mark Gawron (Linguistics ) San Diego State University. Mapping Cyberspace to Realspace.
E N D
Dipak Gupta (Political Science) Brian Spitzberg(Communication) Ming-Hsiang Tsou (Geography) Li An (Geography) Jean Mark Gawron(Linguistics) San Diego State University Mapping Cyberspace to Realspace
Funded by NSF Cyber-Enabled Discovery and Innovation (CDI) program award (# 1028177). (4 years 2010-2014)http://mappingideas.sdsu.edu/ PI: Dr. Ming-Hsiang (Ming) Tsou (Geography) mtsou@mail.sdsu.edu Co-PI: Dr. Dipak K Gupta (Political Science), dgupta@mail.sdsu.edu Co-PI: Dr. Jean Marc Gawron (Linguistic), gawron@mail.sdsu.edu Co-PI: Dr. Brian Spitzberg (Communication) spitz@mail.sdsu.edu Senior Personnel: Dr. Li An (Geography) lan@mail.sdsu.edu Graduate Research Assistants: Ick Hoi (Rick) Kim, Sarah Wandersee, Sri Tulasi Peddola, Kellen Stephens, Jennifer Smith, Amit Nagesh, VickieMellos,& Ting-Hwan Lee
The spread of ideas in the age of the Internet is a double-edged sword; it can enhance our collective welfare as well as produce forces that can destabilize the world. • This project aims at understanding the process by which the impact of a single event or idea diffuses throughout the world over time and space. Realspace vs. cyberspace
The world has seen four “waves” of violent actions, energized by a core idea • 1880 – 1920 The Anarchist movement • 1920 – 1960 Anti-colonial movements • 1960 – 1990 New Left movements • 1990– ???? Religious fundamentalism Ideas and Violent Actions
Steps toward mapping ideas • Identify exemplars • potentially significant event episodes (e.g., Jihadi terrorism, hate group/militia activities, natural disasters, disease outbreaks, etc.) • Develop a semantic map • identify words and phrases that characterize relevant sites. Computational linguistics becomes critical at this point. • Collect web data • on how these phrases spread over time and space. Data are converted to Excel file with their relevant web sites, geolocation, and time.
Steps toward mapping ideas • Spatio-temporal analyses • Statistical analyses and interpretation seeks reasons for particular trajectories along which an idea spreads (i.e., identify factors that to account for diffusion “susceptibility” to and “immunity” from particular concepts). • Pattern analysis • By plotting chronological geographic paths, we test the hypothesis that the spread of ideas is not random. That is, there are places, which are more prone to host these sites (and accept and spread an idea) than others over time.
GEOSPATIAL MAP VISUALIZATION Spatial Web Automatic Reasoning and Mapping System (SWARMS ) flowchart
Web search engine & semantic databases • Microsoft SQL server with Web-based GeoLocatingservices. • Access Bing and Yahoo search engines(search for 1000 results)
Converting urls to geolocations • ‘WHOIS’ databases host registrant street address latitude/longitude
exemplar • “White Power” keyword search in Yahoo (Nov. 5, 2010)
Creating Information Landscapes Kernel point density function was performed in the ArcGIS. using 3 map unit threshold (radius) and 0.5 map unit output scale. 1 map unit =~ 50 miles. Search results ranking serve as the "popularity" and the "population" in the kernel density algorithm.Population = (1001 - rank#). A website ranked #1 will be assigned to "1000" (1001 - 1) for its population parameter. Compare two keywords: e.g…. Jerry Sanders (San Diego Mayor) Antonio Villaraigosa (L.A. Mayor)
Creating Information Landscapes RED: comparatively higher web page density for “Jerry” BLUE: comparatively higher web page density for “Antonio” • Map Algebra (Raster-based): Differential Value = (Keyword-A/Maximum-Kernel-Value-of-Keyword-A) - (Keyword-B/Maximum-Kernel-Value-of-Keyword-B) • Red hotspots indicate that "Jerry Sanders" is more popular than "Antonio Villaraigosa" whereas and the blue color areas indicate that "Antonio Villaraigosa" is more popular than "Jerry Sanders“ • The differential information landscape map illustrates geospatial fingerprintshidden in the text-based web search results depending on the context of selected keywords.
Spatial Scale dependency • The following settings of kernel density thresholds for detecting spatial fingerprints at different map scales were used. • 6 - 8 map units for detecting the State level spatial fingerprints. • 2-3 map units for detecting the County level spatial fingerprints. • 1-0.5 map units for detecting the City level spatial fingerprints. • 0.2 - 0.1 map units for detecting the Zipcode level spatial fingerprints.
Exemplar: Global web page density map for “Osama bin Laden” (English version).
Exemplar: Different language search top 1000 hits for “Osama bin Laden” English “Osama bin Laden” Chinese (S) 奥萨马本拉登 Arabic "أسامة بن لادن“
Exemplar: • “Osama bin Laden” (Geronimo) –(minus) Background Constant • Note 1: Hotspots in San Francisco and New York. • RED: high density of web pages related to “Osama bin Laden” (comparing to the average web page density in U.S.) • BLUE: low density of web pages related to “Osama bin Laden” (comparing to the average web page density in U.S.)
Exemplar: • “Ayman al-Zawahiri” (Al -Quaeda 2nd) – (minus) Background Constant • Note 1: Hotspots in New York & Washington DC • RED: High density of web pages related to Zawahiri(compared to the average web page density in U.S). • Blue: Lower density of web pages related to Zawahiri(compared to the average web page density in U.S). • Different pattern: only New York & D.C. are interested. Most other areas are not interested in this keyword (person).
Exemplar: “Burn Koran” • Yahoo search (1.30.11): The kernel density of “burn Koran” keyword search results and 1000 associated websites (red dots) with weighted ranks (radius: 3.0 map units, output grid: 0.5 map units). • Standardize information landscapes: • Compare two similar keyword maps. • Standardized by the population density (U.S. maps).
Exemplar: “Burn Koran” (1.30.11) The U.S population density map was used to standardize the popularity density. After standardization, the red color hot spots indicate San Jose, Houston, and the middle of Kansas are the popular areas of "burn Koran" keywords. The blue color hot spots indicate the negative value (less popular). WHY the hotspot in Kansas? Near the City of Topeka, after the original event happen in the church located in Gainesville, FL (green symbol), another church in the city of Topeka, KS claimed that they will continue the action of “burn Koran.” )
Exemplar: “Burn Koran” Time Comparison: Compared “burn Koran” (1.30.11) map to (4.3.11) immediately after Florida Koran burning incident. Hot spots: Saint Louis, Pittsburgh, Philadelphia – NEW trends? RED: Increased density of web pages on April 03, 2011 (compared to 1.30.11) BLUE: Decreased density of web pages on April 03, 2011 (compared 1.30.11)
Exemplar: “Faisal Shahzad” (Time Square bomber) • Background Constant (300 random keywords) • Note 1: Hot spots for “Shahzad”: New York & Chicago. • Note 2: Why Chicago? (link to David Headley?)
Exemplar: “Faisal Shahzad” (Time Square bomber) Querying the link between Chicago & Faisal Shahzad…
Exemplar: “Faisal Shahzad” (Time Square bomber) GLOBAL VIEW: Keyword search on 5.6.11 for “Faisal Shahzad” – Background Constant (300 random keywords)
summary: Project Website: http://mappingideas.sdsu.edu This innovative, multidisciplinary project has wide application in many fields from security studies to the spread of epidemics. It can also be used to track marketing of a new product.