140 likes | 284 Views
Data, Information & Knowledge 2. Introduction. Previous presentation covered what data is* In this presentation we cover where data comes from and factors we need to take into account when gathering data for processing. * Should really be data “are” but nobody talks like this!. Data Sources.
E N D
Introduction • Previous presentation covered what data is* • In this presentation we cover where data comes from and factors we need to take into account when gathering data for processing * Should really be data “are” but nobody talks like this!
Data Sources Data can be collected either: • DIRECTLY • Gathered from an original source or • INDIRECTLY • Gathered from an another source or as a by-product of another operation • In the world of business these would be described as primary and secondary sources of data
Direct (Original) Data Sources • Sale of an item in a supermarket recorded at EFTPOS terminal • Data from sensors e.g. a weather station • Data collected in a survey e.g. a questionnaire or an interview
Indirect Data Sources 1 • Data collected for one purpose and used for another • A credit card company collects data about your spending in order to bill you each month. However, a secondary use of this data is to build up a “profile” of your spending habits. This data can then be used to send you direct marketing about goods and services that may appeal to you. Direct Use of Data Customer Billing Credit Card Transaction Indirect Use of Data Direct Marketing
Indirect Data Sources 2 • Purchased data/data passed on • There are a number of ways data can be acquired from 3rd parties and then used for a different purpose • A good example is the electoral roll. Its main use is to gather data about who is eligible to vote. However, marketing companies make extensive use of the roll to target customers.
Coding Data • Before being stored in a computer information can be coded as data e.g. • M or F • Mo, Tu, We, Th, Fr, Sa, Su • I, II, IIIM, IIIN, IV, V • S, M, L, XL, XXL • In the picture shown we can see the date code for the tyre This represents the eighth week of 2006
Benefits of Coding • Less storage space is required • M and F require less storage space than male and female • Faster data input • See above • Validation is easier • With a limited number of codes it is easier to match them against rules to check they are entered correctly
Drawbacks of Coding • Precision of data can be lost (coarsened) • In the example all shades of blue are coded as “blue” • The user needs to know the codes used • How many of these top level domains do you know? • au, ch, de, ie, pk, fr, il, lk, es Data in Pink Blue Black Blue Stored data
Coding Value Judgements • Coding value judgements can be a particular problem as they are subject to personal opinion • What do you think of this presentation? • Good? Average? Poor? • One person’s good may be another person’s poor!!! • Value judgements are very difficult to encode without some coarsening (loss of detail) • How would you improve the analysis? What are the time/cost implications?
Quality of the Data Source 1 • GIGO (Garbage In Garbage Out) • If data input is poor the resulting information output will be poor i.e. corrupt, inaccurate etc. • Can you think of any “real life” examples? Garbage In Garbage Out
Quality of the Data Source 2 Examples of GIGO can include: • Unreliable questionnaires/surveys • e.g. inappropriate samples, badly worded questions etc. • Incorrectly calibrated instruments • e.g. an incorrectly calibrated balance will give incorrect measures of mass • Human error • e.g. transcription errors when entering data • Incomplete data sets • e.g. failing to account for “shrinkage” when measuring supermarket stock
Summary/Revision Topics • Data can arise from direct and indirect sources • Information can be coded as data • This has a number of benefits but can lead to coarsening • The source/accuracy of data has a major impact on the quality of information produced i.e. GIGO
Revision Tasks • Use your textbook/Internet sources to make your own notes on: • Sources of Data • Encoding Data • Quality of Data Sources • Try questions 18-24 on this worksheet http://www.teach-ict.com/as_a2/topics/data_info_know/data_worksheet.doc Diagram/example on slide 9 courtesy of teach-ict.com. See the original here.