240 likes | 386 Views
Introduction. Last day we looked at spatial data structures for both raster and vector models. Today we will look at how this data is managed. In the past it was organised into files; today we are more like to use a database management system.
E N D
Introduction • Last day we looked at spatial data structures for both raster and vector models. • Today we will look at how this data is managed. • In the past it was organised into files; today we are more like to use a database management system. • Burrough and McDonnell: 'to make data quickly available to a multitude of users whilst still maintaining its integrity; to protect the data against deletion and corruption; and to facilitate the addition, removal and updating of data as necessary.'
Files And Databases • Before we look at databases, we will look briefly at some of the more common file formats used in the past. • Different proprietary systems each had their own file formats, but over time some became de facto standards. However, they still required internal conversions to application’s local format. • Databases provided more flexibility. • Files are still used today – useful for data exchange, and may be more convenient for small single user projects.
Common Files Formats(1) • File formats you may come across (or hear about) include: • DIME (Dual Independent Map Encoding). Data structure developed by US Bureau of the Census. The data was distributed as GBF (Geographic Base Files) files. • TIGER (Topologically Integrated Geographic Encoding and Referencing). More advanced structure introduced by US Bureau of the Census to replace DIME. Distributed as Tiger/Line files (and other formats). • DLG (Digital Line Graph). Used by US Geological Survey. Includes topology. • DXF (Drawing Exchange Format). AutoCAD uses DWG files for internal use, but DXF is used for export. Uses layers, but does not include topology. Traditionally favoured by OSI.
Common Files Formats(2) • Arc/Info Coverage. Include both vector and attribute data (‘arcs’ and ‘info’). The vector data contains topological information. Each coverage is stored in a directory, containing several indexed binary files to speed up access times. • Shapefile. Introduced in ArcView 2.0 in 1990. Each shapefile is actually a set of at least three files (often more). Do not contain any topology, but are more space and time efficient than coverages. Normally need to be joined to external attribute data.
Databases • Definition: ‘a unified computer-based collection of data, shared by authorised users, with the capability for controlled definition, access, retrieval, manipulation, and presentation of data’. • Interaction is indirect through a DBMS – e.g. using SQL (Structured Query Language) to retrieve information.
Database Considerations • Databases may have multiple users. They therefore need to: • Minimise redundancy • Maintain consistency • Guarantee security • Provide frequent backups • Manage concurrence • Need to decide on: • Distributed / centralised • Data independence – i.e. how to separate storage for interrogation
Levels Of Abstraction • Three levels of abstraction: • Conceptual model. May be formalised using entity relationship modelling, semantic modelling, etc. • Logical (or internal) model. • Physical model. • Conceptual schema are defined using a data definition language (DDL). The data can be manipulated using a data manipulation language (DML) • SQL (Structured Query Language) provides both a DDL and a DML.
Older Logical Models • Originally the three most favoured models were: • Hierarchical • Network • Relational • Relational models became the norm (ca. 95 per cent) • Will look briefly at the other two types. • Hierarchical models support quick searches, but entail a lot of redundancy. • Network models have high maintenance overheads.
Relational Models • Data are organised into relations (i.e. tables). • Each row (or tuple)corresponds to a particular entity – e.g. person, place or thing. • Each of the columns contains a field or attribute. • Flexibility is introduced because the tables can be joined using common join field (primary key, foreign key). • Relational databases are managed by a RDBMS. • SQL emerged as the almost universal query language.
Relational Database Properties • Tables in a relational database have various characteristics: • The tuples in a relation are all distinct from one another; • Each tuple has the same number of attributes (i.e. columns); • The ordering of the tuples in a relation has no significance; • The ordering of the columns in a relation has no significance (although the data values must be in the correct column); • Each cell contains only one data value (as opposed to a set, array or list).
Newer Logical Models • Deductive databases. Use rules to deduce additional facts or relationships from the stored data. • Object-orientated databases (ODBMS). Similar to object-orientated programming languages. Entities contains attributes which define the state and also the methods for doing various things. • Object-relational databases (ORDBMS). Hybrid system. Basically relational, but cells can hold complex objects.
Data Management In GIS • Conventional RDMS can not be used with spatial data: • Polylines must record x,y coordinates in a fixed sequence. Could not use 2 columns in a relation (cf. rule 3 above). • Alternatively, treating all the points in a polyline as a single tuple would infringe rules 2 and 4. • Diagram shown earlier is ‘faked’: • The polygon and line tables in the normalised version would normally have varying numbers of columns. The sequence in both tables is also important. • The same is true of the polygon table on the left.
Georelational Models • A lot of the data used in GIS is non-spatial attribute data. • This can be managed very efficiently by a RDBMS. • Many GIS traditionally used a hybrid georelational model – i.e. attribute data was managed using a RDBMS, but the locational data was managed using proprietary software.
Georelational Structures • Burrough and McDonnell discuss 4 hybrid structures: • Arc/Node - RDBMS. This is the most widely used hybrid structure. Approach used by Arc/Info. • Object - RDBMS. The spatial features are treated as discrete spatial objects. Approach used by ArcView. • Compact Raster - RDBMS. Similar except the geographical features are defined in raster mode. Cells in the raster contain a feature identifier which is used to link to a relational database. The Database Workshop option in Idrisi uses this type of structure. • Quadtree - RDBMS. Similar but use quadtrees.
Field Data Models • Field data can be modelled in either raster mode or vector mode, but in each case there is only one attribute per layer. • Raster mode GIS facilitates very simple data structures – i.e. one layer per file.
Object Relational Models • Georelational models do not permit attribute data to be managed by a RDBMS. • Similar restraints apply to field data models. • Trend is towards ORDBMS. • Major enterprise DBMS such as IBM DB2 and Informix, Microsofts SQL Server, and Oracle are ORDBMS. • They all provide spatial extensions, as does open-source PostgrSQL. • ESRI introduced the geodatabase in ArcGIS 8.
The Geodatabase Model • ArcGIS can still handle traditional file-based data, such as coverages, shapefiles, grid data etc. but an object-relational model (geodatabase) was introduced in ArcGIS 8. • A geodatabase is a collection of datasets. • Three main types of dataset: • Feature classes • Rasters • Tables • Related feature classes may be grouped to form a feature dataset.
Benefits • A single geodatabase can contain different types of feature class, rasters, TINS and locators (e.g. addresses). • A personal geodatabase can be saved in a single file.
Feature Classes • In a georelational data model the spatial and attribute details are contained in separate files (e.g. shape and dbf files) with unique identifiers used to join the data. • In a geodatabase a feature class can store the spatial details and the attribute data in the same table. • Shapes are held in a field either as a binary large object (BLOB) or as one of the extended spatial types supported by one of the enterprise ORDBMS (e.g. Oracle Spatial).
Types Of Geodatabase • There are three types of geodatabase: • 1. Personal geodatabases. Single-user. Managed by the Microsoft Jet Engine. Restricted to a maximum size of 2 GB. No versioning or topology. Windows only. • 2.File geodatabases. Single-user. Stored in a file folder. Unlimited capacity. No versioning or topology. Any platform. • 3. ArcSDE geodatabases. Multiuser. Require ArcSDE to provide an interface with a standard enterprise DBMS. Handles versioning, topology etc. Any platform.
User Data are treated as objects ArcGIS DBMS Data treated as tables Data Physical storage