450 likes | 596 Views
Developing open source GIS: what are the challenges?. Gilberto Câmara INPE – Brasil www.terralib.org. Institute for Geoinformation – TU Wien – 16 June 2004. The Promise of Open Source. When an OSS project reaches a “critical size” we obtain many benefits Robustness
E N D
Developing open source GIS: what are the challenges? Gilberto Câmara INPE – Brasil www.terralib.org Institute for Geoinformation – TU Wien – 16 June 2004
The Promise of Open Source • When an OSS project reaches a “critical size” we obtain many benefits • Robustness • ``Given enough eyeballs, all bugs are shallow.'' • Cooperation • ``Somebody finds the problem and somebody else understands it'‘ (Linus Thorvalds) • Continuous Improvement • “Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging”
Naïve view of open source projects • Software • Product of an individual or small group (peer-pressure) • Based on a “kernel” with “plausible promise” • Development network • Large number of developers, single repository • Open source products • View as complex, innovative systems (Linux) • Incentives to participate • Operate at an individual level (“self-esteem”) • Wild-west libertarian (“John Waynes of the modern era”)
Idealized model of OS software Networks of committed individuals
The Reality of Open Source • Previous existence of conceptual designs of similar products (the potential for reverse engineering) • Design is the hardest part of software (Fred Brooks) • Problem granularity (the potential for distributed development) • Effective peer-production requires high granularity
Potential for Reverse Engineering • Post-mature • A private company develops a software product. • Product becomes popular and it becomes part of the “public commons”. • Others develop a public domain equivalent (e.g.,Open Office) • Standards-led • Standards consolidate a technology • Allow compatible solutions to compete in the marketplace. • SQL database standard (e.g.,mySQL and PostgreSQL). • POSIX standard (guidance to Linux) • OpenGIS specifications (e.g.,Degree, MapServer, GeoServer)
Potential for Distributed Development • Parts of a software product • kernel and additional functions that use it (its periphery). • Operating systems (Linux) • well-defined kernel for process control • periphery consisting of programs such as device drivers, applications, compilers and network tools. • Database management systems • strong kernel of highly integrated functions (such as the parser, scheduler, and optimizer) • much smaller periphery.
Potential for Distributed Development • Each type of software product - periphery/kernel ratio • constrains the potential for distributed development • Kernel • a tightly-organized and highly-skilled programming team. • Periphery • More widespread programmers of various skills • Example • Out of more than 400 developers, the top 15 programmers of the Apache web server contribute 88% of added lines [Mockus, 2002 #2293].
Four Types of Open Source Software • High reverse engineering, high distribution potential • High reverse engineering, low distribution potential • Low reverse engineering, high distribution potential • Low reverse engineering, low distribution potential
Type 1 – High-High • High reverse engineering, high distribution potential: • Archetypical open source projects • The “Linux” model. • Developers • May have a separate job • Time allocated in agreement with their employer. • community-led projects.
Type 2 – High-Low • High reverse engineering, low distribution potential • Large number of projects • Databases, office automation tools, web services. • Large presence of private companies • products similar to market leaders. • reduced risk in reverse engineering. • main design decisions take place within the institution • Examples • mySQL and PostgreSQL DBMS, • GNOME from Ximian • corporation-led projects.
Type 3 – Low/High • Low reverse engineering, high distribution potential • Stable kernel, innovative periphery • usually there is no commercial counterpart • share a relatively simple software kernel • Origin • academic environments • Examples • GRASS GIS software and the R suite of statistical tools. • collaborative projects
Type 4 – Low/Low • Low reverse engineering, low distribution potential • Innovative kernel, small periphery • Small teams under a public R&D contract • addressing specific requirements • aiming to demonstrate novel scientific work. • High mortality rate • most of them are restricted to the lifetime of a research grant. • innovative products.
High-Low High-High mySQL OpenOffice Potential Rev Eng Linux PostgreSQL perl Apache GRASS Postgres R NCSA browser Low-Low Low-High Potential Distrib Develop
High-Low High-High Potential Rev Eng corporate communitary innovative collaborative Low-Low Low-High Challenges? Potential Distrib Develop
Lessons from Open Source Projects • “It's fairly clear that one cannot code from the ground up in bazaar style . One can test, debug and improve in bazaar style, but it would be very hard to originate a project in bazaar mode. Linus didn't try it. Your nascent developer community needs to have something runnable and testable to play with” (Eric Raymond)
Moving from the Low-Low Quadrant • Software in the “Low-Low” quadrant • Unsustainable in the long run • Moving from an innovative to a collaborative project • Sharing innovation • Transforming a crude prototype into a modular, well designed system • How do you build innovation into a modular design?
Moving from the Low-Low Quadrant • “Perfection in design is achieved not when there is nothing more to add, but rather when there is nothing more to take away”. (Saint-Exupery) • How do you achive perfection in information science? • Good scientific foundation • Usually, sound mathematical abstractions • What is the situation in GIS?
Do we have a solid foundation for GIS? selection projection cartesian prod union difference id name year SELECT name FROM faculty WHERE year > 1960 relations relational algebra SQL query language Operations on ST types ? Spatio-temporal data types Spatial algebra GIS language
Challenges for geoinformation Source: Gassem Asrar (NASA)
The Road Ahead: Smart Sensors SMART DUST Autonomous sensing and communication in a cubic millimeter Source: Univ Berkeley, SmartDust project
Knowledge gap for spatial data source: John McDonald (MDA)
What’s the Current Status of Open Source GIS? • High-Low products • Standards-based • Spatial DBMS: mySQL, PostgreSQL • OpenGIS + Web: MapServer, Degree • Low-high products • Stable kernel, innovation at the periphery • GRASS and R • What about GIScience challenges? • spatio-temporal data models, geographical ontologies, spatial statistics and spatial econometrics, dynamic modelling and cellular automata, environmental modelling, neural networks for spatial data
TerraLib: Open source GIS library • Data management • All of data (spatial + attributes) is in database • Functions • Spatial statistics, Image Processing, Map Algebra • Innovation • Based on state-of-the-art techniques • Same timing as similar commercial products • Web-based co-operative development • http://www.terralib.org
TerraLib DBMS Oracle Spatial Access API for Spatial Operations Spatial Operations Geographic Application Spatial Operations MySQL Postgre SQL Operational Vision of TerraLib TerraLib MapObjects + ArcSDE + cell spaces + spatio-temporal models
TerraLib applications • Cadastral Mapping • Improving urban management of large Brazilian cities • Public Health • Spatial statistical tools for epidemiology and health services • Social Exclusion • Indicators of social exclusion in inner-city areas • Land-use change modelling • Spatio-temporal models of deforestation in Amazonia • Emergency action planning • Oil refineries and pipelines (Petrobras)
TerraLib Structure Java Interface COM Interface OGIS Services C++ Interface Functions kernel Spatio-Temporal Data Structures File and DBMS Access Visualization Controls I/O Drivers DBMS External Files
Events time Near in space, near in time? y x
Dynamical Spatial Model f ( I (t) ) f ( I (t+1) ) f ( I (t+2) ) f ( I (tn )) F F . . “A dynamical spatial model is a mathematical representation of a real-world process when a location changes in response to external forces (Burrough)
S 2 S 3 Spatial Simulation Reality - Bauru in 1988
Regression with Spatial Data: Understanding Deforestation in Amazonia
Terra do Meio, Pará State South of Amazonas State Hot-spots map for new deforestation Future Deforestation Scenarios
Modelling anisotropic space Spatial relations in Amazonia are not isotropic!
Desigining for Extensibility • Algorithms • basic core of most successful GIS • large number of them do not depend on some particular implementation of a data structure • based a few fundamental semantic properties of the structure • properties can be - for example - the ability to get from one element of the data structure to the next, and to compare two elements of the data structure . • Spatial analysis algorithms • can be abstracted away from a particular data structure and described only in terms of their properties.
Generic GIS Programming • How to decouple algorithms from data structures ? • Idea: Iterators (“inteligent pointers”) • Algoritms are not classes !! • “Decide which algorithms you want; parametrize them so they work for a variety of suitable types and data structures” Algorithms Iterators Geometries
Scientific Challenges for Innovation in GIS • How can we design an algebra for ST types? • What are the spatial-temporal data types? • How do we design a language for spatial modelling? • Requires a caracterization of measurents • Cognitively meaningful interfaces • Representation of Space • How do we represent anisotropic space? • Extensibility of Models and Algorithms • How do we design for extensibility?
Why am I here today in TU-Wien? • Innovation in GISystems • Requires addressing challenges in GIScience • Cooperation with prof. Andrew Frank • Generic GIS Programming • Semantics of Geographical Measurements • Spatio-Temporal Types and Algebras • Methods for Representation of Anisotropic Space
Result of Sound Scientific Work High-Low High-High mySQL OpenOffice Potential Rev Eng Linux PostgreSQL perl Apache GRASS Postgres R NCSA browser TerraLib Low-Low Low-High Potential Distrib Develop
Conclusions • Open Source software model • The Linux example is not applicable to all situations • Moving from the individual level to the organization level • Geoinformation • Innovative open source GIS software has a large role • Sound research is needed to support innovation • Cooperation in GIScience is fundamental • The problem is enormous...requires a combination of R&D • We are few R&D groups • Cooperation is the only way to ensure a future for GIScience