1.33k likes | 1.52k Views
Principles of Information Systems. Session 06 Systematisation and Construction. Systematisation and Construction. Chapter 5. Overview. Learning objectives Introduction Repositories for data Describing things and collections Data modelling Data structures
E N D
Principles of Information Systems Session 06 Systematisation and Construction
Systematisation and Construction Chapter 5
Overview Learning objectives • Introduction • Repositories for data • Describing things and collections • Data modelling • Data structures • Data organisation for the real world • Systematics: another way of organising the world • Summary
Learning objectives • Explain why recorded information needs to store information about both things and types of things • Describe some different types of information repository • Describe how information repositories can be built from simple propositions • Describe how data modelling is used to design an information repository
Learning objectives • Explain how measurement and scaling affect information in systems • Explain some of the issues to do with modelling data about space and time • Describe the main features of several different data structures • Explain the principles of classification • Describe four types of classification structures
Introduction • Repositories for data • Describing things and collections • Data modelling • Data structures • Data organisation for the real world • Systematics • Summary Introduction • Informatics is primarily concerned with collecting details about the world to use for a variety of practical purposes. • So far we have looked at: • How we identify and name things, • Which then become represented and recorded, • And how language, perception and memory help build maps of the world that order the details that will remain of interest over time.
Introduction • In this chapter we show how the recorded information itself can be organised for specific practical purposes, by lasting arrangements and structures that humans can understand and find useful. • To do this we must formalise what we know of both things in the world, and types of things in the world.
Things… and types of things My dog Patch Spaniels, beagles, pointers, terriers, setters …
Things and types of things • To organise things, we have ideas of data repositories made up of various data structures • To order typesofthings, we have principles of classification and systematisation • We need both these in order to be able to organise and record the world effectively.
Things and types of things • Information about things enables individual occurrences to be stored and used effectively • Information about types of things allows these occurrences to be categorised within larger schemes of understanding.
Recap Recorded information needs to be organised for specific practical purposes, by lasting arrangements and structures that humans can understand and find useful. To do this we need to consider both things, and types of things.
Introduction • Repositories for data • Describing things and collections • Data modelling • Data structures • Data organisation for the real world • Systematics • Summary Repositories for data • We can identify three major types of data repository: • Databases • Spreadsheets • Knowledge repositories
Databases – the classic data repository • Databases are a modern incarnation of a system of recording details about the world that goes back as long as recorded history. • Databases have also been a major idea throughout the history of computing. • The word database refers both to a particular set of facts, but also to the physical presence on a computer of those facts, as well as to its logical location in an organisation’s hierarchy. • The term database system refers to the database itself and the software application used to manipulate it.
Entities and tables • Data is essentially a set of observations: when we create a database, we begin by working out what entities (that is, things) we are describing. • For example, we might have made some observations about Australian football teams that we wish to record… • We can store this information in the form of a table.
Field Record Table We represent a set of observations about an entity in a table. Each row is an individual record, corresponding to one club’s details The value in the column for a particular record is called a field or attribute. step
Schema • Since each footy club has the same sort of attributes, the types in the record for each one will be congruent: • Column 2 is always a name (of a city); • Column 5 is always a URL (of a website) • Databases aim to have a consistent structure for similar objects with congruent types. • The specification describing this is called a schema. • Deciding on the correct schema for a database involves a process called data modelling
Multiple tables • Complex situations may need more than one entity to describe it (and hence more than one table to collect it) • For example, we might also store information about matches :
Database queries • Once the data is in a suitable structure, queries can be asked of it • The result of the query is another set of records, with the required information.
Database queries “Who won on their home ground, and by how much?”
Who won On their home ground? -- didn’t win on home ground, so not included in result
Who won On their home ground? -- didn’t win on home ground, so not included in result
Who won On their home ground? And by how much?
Who won On their home ground? And by how much?
Who won On their home ground? And by how much?
Spreadsheets • Spreadsheets are arrays of values that are laid out in a grid on a computer screen • They are conceptualised as rows and columns of values • Spreadsheets can store and manipulate the data dynamically in various ways • If a change is made in one part of the spreadsheet, the entire array can be recalculated automatically • Spreadsheets are very flexible and have numerous applications in informatics
Total column is calculated automatically, using formulae • Labels, numbers and formulae can all be entered into the spreadsheet grid • Notice that the spreadsheet structure can be irregular, unlike a database table
Spreadsheets • The term spreadsheet refers to the details being stored, the physical file they are stored in, and also the software application • Sometimes the distinction is made between workbooks and worksheets (single sheets within a workbook) • Most spreadsheet applications have features such as charts, different types of built-in formulae and analysis tools, and templates for common tasks
Spreadsheet template for expense claims (above) and completed for a particular trip (below)
Knowledge repositories • When information ‘about’ the data is stored, rather than the data itself • For example, manuals or procedures documents, emails, notes, multimedia, and other ‘unstructured’ information • In this situation the data must be catalogued and indexed in some way, and the catalogue itself stored in an easily searchable form
Structured, unstructured and semi-structured data stores • Herb Simon introduced the idea of three different levels of structure into informatics • (we meet this again in Chapter 6) • We can use it to describe stores of information as: • structured • semi-structured (somewhere in between) • unstructured
Structured data set – where there is a regular structure shared by all records in the data set which can be expressed as a schema. A conventional database is an example Semi-structured data set – where there is an irregular structure among records in the data set. For example, bibliographic data. Unstructured data set – where there is no regular structure among the records in the data set, such as in a collection of documents or emails.
Recap Databases, spreadsheets and knowledge repositories are three of the most common types of information repository.
Introduction • Repositories for data • Describing things and collections • Data modelling • Data structures • Data organisation for the real world • Systematics • Summary Describing things and collections • When we build information repositories, we are describing collections of things that exist in the world • A collection is a meaningful grouping, and can be represented by a list: • Shopping lists • Lists of football players • Top ten music lists • FBI most wanted lists • Lists of terms, along with coding systems and formal descriptions, are the basis of any systematic organisation of data
Words and terms • To have an informatic system, we need consistent and verifiable descriptions of the world – • we need to be able to use words that have a consistent shared meaning • Any word that is agreed to have a consistent shared meaning is called a term • We can also use numbers, dates, colour and so on to describe the world, • but these must also share the criteria for terms
A term must have: • A clearly-defined context of usage • This defines the term’s community of users, which in turn provides the framework for understanding that term • A clearly-defined range of permitted usage • This means that once a term is used as applied to a concept, it must be adhered to by the community of users • A clearly-defined and unambiguous definition of meaning within that context.
Birds Flighted Flightless Aquatic Land Terms • A term has to stand for something that exists in the world, and also for an agreed meaning. • Thus, terms are parts of lists and also part of coding schemes and descriptions Duck Bird classification List of animals I saw yesterday
Combing words or terms into phrases and sentences • A phrase is a group of words that has a single semantic meaning: • My left foot • Green tea • On the floor • Phrases combine into semantically complete clusters of words or sentences. • Sentences convey ideas and observations in various ways as statements, questions and orders
Sentences convey ideas and observations in three different ways step Question There is a sock on the floor. Who left the sock on the floor? Statement Pick that sock up off the floor! Order
In informatics these are formalised as QUERY There is a sock on the floor. Who left the sock on the floor? PROPOSITION Pick that sock up off the floor! COMMAND
Sentences in informatic systems • Propositions are sentences that make descriptions about the world, which can be proven true or false. • These are suitable for storing in informatic systems. • Queries are requests given to informatic systems for propositions that match certain criteria. • Commands are instructions given to informatic systems.
The way we build up sentences of meaning in ordinary language (left) corresponds directly to equivalents in informatics (right).
Information systems as lists of propositions • Lists of propositions make up information repositories: The cat sat on the mat The dog sat on the mat • Finding information from these repositories involves matching a query to the stored propositions: • What sat on the mat? The ___ sat on the mat Answer: cat, dog
Recap Observations of things in the world can be formalised as statements, and these statements or propositions are what is collected into various types of information repository.
Introduction • Repositories for data • Describing things and collections • Data modelling • Data structures • Data organisation for the real world • Systematics • Summary Data modelling • Much of informatics involves modelling. • Models are simplifications of a perceived reality where some things in the world get described, measured, represented, and put together. • A data model describes the structure that will contain the actual data in the repository.
Data modelling • A data model describes the structure that will contain the actual data in the repository • Notice that although we are making a container before we have the data, we are already aware of the kinds of statements we want to store, and can plan the structure accordingly • We go from a description that is semi-structured or unstructured, pick out the entities and other things of interest and prepare a structured schema that allows querying.
Steps in data modelling • Investigate the kinds of statements that will be recorded • Identify the terms the statements use • Identify what kinds of questions are going to be asked • Identify the extent and frequency of changes to the statements
Investigate the kinds of statements that are going to be recorded • Identify or elicit statements and propositions about the area of interest. ‘A car is expensive’ ‘A car is boxy’ are propositions about a car
Identify the terms the statements use • The things of interest have properties or attributes • In general a subject (S) will have several properties (P1, P2, …) Car is red, boxy, expensive Book is red, boxy, expensive Frisbee is yellow, round, cheap Jelly is red, shapeless, cheap • Here we see that all our statements have properties in common: Thing has colour, shape, price
Identify the kinds of questions that will be asked • Matching the pattern of the required query against the stored data provides a check that the information can be retrieved: __ is red, __, __ Retrieves everything except frisbee
Identify the extent and frequency of changes to the statements • Some information in your database will change very rarely (states of Australia, capital cities) • Others will change more frequently (mailing addresses) • Your model needs to take into account whether changes to the data are expected, and what you want to do about it, e.g. • Overwrite the old information • Keep a log of old information • Store infrequently changing information separately, so it is easier to keep consistent • The decision depends on the particular system and what it is used for