520 likes | 625 Views
Before We Start. I am not here to persuade you about the usefulness or limitations of Neogeography or User Generated Content I am here to share my views on issues relating to the topic of spatial data quality and neogeography
E N D
Before We Start • I am not here to persuade you about the usefulness or limitations of Neogeography or User Generated Content • I am here to share my views on issues relating to the topic of spatial data quality and neogeography • Disclaimer - In general, my observations derive from my familiarity with mapping, navigation and local search blog.telemapics.com
My Background • PhD in Geography, specializing in Cartography • Attended AutoCarto 1 in 1974 (and gave the keynote in 2008) • Associate Professor of mapping and geography at SUNY Albany (1972–1985) • Associate at Spad Systems • Chief Cartographer, Chief Technologist and VP of BizDev for Rand McNally (1986-1999) • CTO and EVP of Engineering for go2 Systems (YP over cell phones) • Now run a consulting business focused on geospatial, especially local search, mapping and navigation applications blog.telemapics.com
Data Quality and Neogeography Dr. Mike Dobson President TeleMapics LLC mwd@telemapics.com
Spatial Data Quality? • Overall concern regarding the “fitness” of data for a particular use • Accuracy of position • resolution • Accuracy of Attribution • Logical Consistency • Completeness • Including spatial coverage • Temporal relevance • Metadata blog.telemapics.com
World of spatial data is exploding Accessibility to spatial data increasing Availability of spatial data increasing Today’s online environment provides Easy-to-use tools for collecting spatial data Easy-to-use tools for analyzing spatial data Easy-to-use tools for presenting spatial data Spatial Data’s Emerging Popularity blog.telemapics.com
Why Is This of Concern? • The quality of spatial data mitigates the success of communicating spatial concepts • Could this explosive growth have an influence on the quality of spatial data? blog.telemapics.com
Why Data Quality Is Key blog.telemapics.com
No Integrity! blog.telemapics.com
Neogeography Neogeography • “new” geography using non-traditional tools • Neogeographers • Want to communicate/share their interests in geography and are willing to do something about it blog.telemapics.com
NeoGeos • What Roles do Neogeographers play in the process of communicating spatial data? • Data collectors – database creators • Data analyzers • Data Presenters • While all three roles impact or are influenced by “data quality”, today I will focus on neogeographers and data collection /database creation blog.telemapics.com
Spatial Data Quality and Neogeography • In order to help you understand my persuasion on data quality and neogeography, I would like to explore User Generated Content • UGC is one of the primary means that neogeographers use to express their interest in Geography • On this journey we will loop outside of geography and then fall back in through mapping and other uses of spatial data. blog.telemapics.com
Content that is produced by users of web sites and digital media Contrasted with traditional media producers such as broadcasters, production companies publishing companies and map database companies User Generated Content? blog.telemapics.com
So What’s Important About UGC? • Equality of opportunity to publish • Coupled with one of the most significant demographic trends in the last century: • “It’s about me” (e.g. use of YouTube, MySpace, Facebook) • “Especially in respect to the streets, roads and trails I travel, as well as the POIs I frequent and the spatial topics of interest to me” blog.telemapics.com
Social Networking blog.telemapics.com
How Did This Happen? • Technology that allows you to be “connected”, as well as to communicate and collaborate on your own terms • Internet • Cellular telephony • Development of comprehensive spatial databases • Pushing geospatial into the mainstream -Neogeography blog.telemapics.com
How Did This Happen? • Networks provide for • Collective intelligence – the hive mentality or perhaps the Borg • Aggregated knowledge from decentralized sources (Wikipedia – Wikinomics) • Low cost collaboration blog.telemapics.com
UGC Potential Benefits • Linus’s law • With enough eyes all bugs (spatial errors) become trivial • Contributors exhibit • Self selection • Focus • Self benefit • Numerousness • There should be more interested spatial data contributors than professional map editors • Spatial distribution • The distribution of UGCers is more ubiquitous than that of professional map editors. blog.telemapics.com
Criticisms Of UGC • Some error situations are too complex to be understood real-time • Usability may be low • May require extensive error checking • User priorities may lead to unreliability • Prejudice in responses blog.telemapics.com
Lake What Road? blog.telemapics.com
Not enough Contributors -Data Points? blog.telemapics.com
User Priorities - Oooops blog.telemapics.com
Prejudice in Response? blog.telemapics.com
Prejudice in Response blog.telemapics.com
UGC And Spatial Databases blog.telemapics.com
Spatial Database Creation blog.telemapics.com
What’s Being Optimized In The Previous Process? • spatial data quality • Accuracy of position • resolution • Accuracy of Attribution • Logical Consistency • Completeness • Including spatial coverage • Temporal relevance • Metadata blog.telemapics.com
How Optimized? • Data Quality is an integral part of the process • Initially • Data collected according to specifications • Bad data re-collected or placed in the update queue • Ongoing • Every year significant spatial changes are accommodated. • Areas of high change are identified and updated. • Other changes are found by systematically working research teams through the entire coverage over time • The overall assignment is designed to maximize the time value of money, while increasing the integrity of the database. blog.telemapics.com
Harmonization • It is this attempt to actively harmonize all data that distinguishes database building efforts. • Important Issues • Who directs crowdsourced data from an editorial perspective? • Who sets standards for crowdsourced data? • Who Quality Controls crowdsourced data? • What external guidance exists in crowdsourced systems ? blog.telemapics.com
Three Categories of Spatial Data • Controlled data • OS, Navteq, TeleAtlas, INFOusa • Hybrid (a mix of controlled and uncontrolled data) • Google, Yahoo, MSN, TomTom • Crowdsourced (uncontrolled) • OSM, Flickr, etc blog.telemapics.com
Issue • It is possible to manage controlled data quality to meet specific requirements • It is possible to manage hybrid data quality to meet specific requirements • But can you manage crowdsourced data quality to meet specific requirements on a reliable basis? • Let’s look at database compilation for some insights blog.telemapics.com
Commercial Training in compilation Specialization Staff size limited Research limited Sweat of the brow But salaried sweat of the brow Wiki Self Selection Local experience Staff size potentially unlimited Research hours potentially unlimited Avocation Compilation blog.telemapics.com
Commercial What are my coverage goals? What are my accuracy goals? How Much can I spend on updating? What size of capable staff can I afford? How well can I pay them? How can I otherwise incent them to create the best database possible? WIKI How many people will contribute? How many are capable? Where are they located? Does this match areas of weak coverage? How long will it take to get good results over large coverages? How to motivate these collaborators over long periods? Compare and Contrast blog.telemapics.com
What Are The Potential Weaknesses of WIKI? • Common issues • Not enough data gatherers to validate the data • or a method to redeploy them • Not enough coverage to meet the need (the distribution of the UGCers) • Or a method to redeploy them • Lack of Standards • Lack of Quality Control • But all of these limitation can be accommodated blog.telemapics.com
Getting Around Some UGC Issues blog.telemapics.com
Are Other Types of Spatial Databases Superior? • Even with the benefits of Moolah ($) -Major navigation databases are • Out of date • Inaccurate • Non-comprehensive • Variable quality • Too expensive to maintain • Navteq database extension and update costs in 2007 were over $300,000,000 blog.telemapics.com
www.refnum.com/osm/gmaps/Haywards Heath blog.telemapics.com
And That’s Why UGC and Neogeographers • Will become an integral part of building spatial databases • Hybrid data collection systems using UCG and controlled data are where geospatial is going • Let’s look blog.telemapics.com
Old Information Sharing blog.telemapics.com
New Information Sharing blog.telemapics.com
What’s The New Process blog.telemapics.com
Social Networking Tools Of Interest in Compilation blog.telemapics.com
Spatial Data Collection • Some UGC will be active • User connects to an app and enters relevant spatial data for updating or extending a spatial database • Some UGC will be passive • Device tracks and reports (anonymously) user paths, builds database by merging path information over time • Passive is particularly useful in building navigation databases blog.telemapics.com
Relative Cost blog.telemapics.com
Relative Accuracy blog.telemapics.com
Summing UP • Data Collection Systems • Closed – commercial compilation efforts, no UGC • Open – WIKI approaches, no proprietary data • Hybrid – where geospatial is going • Advantages spatial data accuracy by contributing the best of both approaches. blog.telemapics.com
Raises These Questions • Will the winners be • Established commercial companies that capitalize on UGC to augment their data? • New competitors that commercialize UGC and augment these data to compete with established commercial systems? blog.telemapics.com
PND Data Flow – A Winner blog.telemapics.com
UGC Open Street Data Flow – No Medal blog.telemapics.com
Commercializing UGC blog.telemapics.com
Relative Benefits Of Types Of UGC By Device blog.telemapics.com