Chapter 8: Data Management

Chapter 8: Data Management This chapter starts on page 366.

Chapter 8: Get Ready • Before starting Chapter 8, we need to review these concepts: • Display data • Box and Whisker plots • Measures of central tendency • Interpolate and Extrapolate

8.1: Scatter plots • Statistics Canada collects and organizes data to help Canadians better understand their country: its population, resources, economy, society and culture.

Scatter plots • A scatter plot is a graph of ordered pairs of numeric data. • A scatter plot is used to see relationships between 2 variables or quantities.

The line of best fit • The line of best fit is the line that passes through or near as many points as possible on a scatter plot.

An outlier • An outlier is a data point that does not fit the pattern of the other data. • An outlier seems to be very different from most of the data in the scatter plot.

Interpolating values • Interpolating data values from a graph means to estimate values between two known pieces of data.

Extrapolating data • Extrapolating data values from a graph means to predict values beyond the collected data.

The Independent variable • In a relation, the independent variable is the variable that determines the value of the dependent variable. • For example, with speed, distance/time, the time is the independent variable because the distance depends on time. • Usually, the independent variable is x

The Dependant variable • In a relation, the dependent variable is the variable whose value is determined by the independent variable. • For example, with speed, distance/time, the distance is the dependent variable because the distance depends on the time for its value. • Usually, the dependent variable is y

There are 2 types of data: Continuous data Discrete data The types of data

Continuous data • Continuous data is a set of data where a variable can be any real number. • When the data points are joined together as a line, this represents continuous data. • Examples of continuous data are speed and temperature.

Discrete data • Discrete data is a set of data where a variable must be a whole number. • When the data points are not joined together as a line, this represents discrete data. • Examples of discrete data are the number of pages in a book or the number of students in a class.

Correlations #1 • To better understand and organize a data set, Statistics Canada create scatter plots in order to determine a correlation between 2 variables.

Correlations #2 • A correlation is the measure of how closely the points on a scatter plot fit a line (i.e. the degree to which 2 quantities show a linear relationship)

The correlation between 2 variables can be: Strong Weak Positive Negative Non-existent Adjectives that describe correlations

Strong correlation • If most of the points are closely grouped around the line, then the correlation is strong.

Weak Correlation • If the points are spread out but show a general trend, then the correlation is weak.

Positive correlation • A positive correlation means that the relationship between the variables is positive. • As the independent variable increases, the dependent variable increases. • The slope of a line showing positive correlation is positive (the line goes up as you move left to right)

Negative correlation • A negative correlation means that the relationship between the variables is negative. • As the independent variable increases, the dependent variable decreases. • The slope of a line showing negative correlation is negative (the line goes down as you move left to right)

Non-existent correlation • If the points are spread out and show no general trend, then the correlation is non-existent.

A relationship • A relationship is a pattern between 2 sets of numbers.

In Data Management, there are two types of Math relationships: A linear relationship (it forms a straight line) A non-linear relationship (it does not form a straight line) The types of relationships

8.2: Assess data and make predictions • To assess and analyze data, it is useful to display your data set as a scatter plot. • Then, trace the line of best fit for the data by inspection (by eye)

The goodness of fit of a line • After drawing the line of best fit, it is necessary to judge its goodness of fit. • A correlation grid is a guide to indicate the goodness of fit for a line.

A correlation grid

Here are the 6 types of data displays: (Grade 9) A scatter plot A histogram A circle graph A stem-and-leaf plot A box-and-whisker plot A bar graph 8.3: Display data

A bar graph • A bar graph is a diagram that displays data visually with vertical or horizontal bars. • Bar graphs are used to compare categories.

A circle graph • A circle graph is a graph in which a circle representing the whole data is divided into sections. • Circle graphs are used to compare categories to each other and each category to the whole data set.

A stem-and-leaf plot • A stem-and-leaf plot is a way of organizing numerical data by representing part of each number as a stem and the other part as a leaf. • Stem-and-leaf plots organize data based on place value.

A histogram • A histogram is a connected bar graph that shows data organized into intervals. • Histograms organize data in intervals.

A box-and-whisker plot • A box-and-whisker plot is diagram that shows the median and range of a numeric data set.

The use of box-and-whisker plots • A box-and-whisker plot shows how data is dispersed or spread around the median of a data set.

The vocabulary of box-and-whisker plots • The box of the graph contains or represents at least 50% of the data. • The least and greatest data values are called the minimum and maximum or the lower extreme and the upper extreme. • The lower quartile is the median value of the lower half of the data. • The upper quartile is the median value of the upper half of the data.

How to choose the most appropriate way to display your data set • The most appropriate choice of data display depends on the type of data and the information you wish to convey.

Hints for choosing the best way to display your data #1 • Line graphs and scatter plots can be used to analyze trends. • Histograms, box-and-whisker plots and stem-and-leaf plots can be used to analyze the range of data spread, check where data is clustered and find the measures of central tendency.

Hints for choosing the best way to display your data #2 • Bar graphs and circle graphs are used to compare categories.

Measures of central tendency • The measure of central tendency is a value that represents the centre of a set of data.

There are 3 types of measure of central tendency: The mean The median The mode The types of measure of central tendency

The mean • The mean is the sum of a set of values divided by the number of values in the set. • The advantages of the mean: Information is given about the sum of the values. • The disadvantages of the mean: Influenced by extreme data values.

The median • The median is the middle value when a set of data is arranged in order from least to greatest. • Advantage of the median: Not greatly influenced by extreme data values. • Disadvantages of the median: No information is given about the sum of the values.

The mode • The mode is the most common value in a set of data. • Advantage of the mode: Easy to locate in frequency tables, graphs, bar graphs or histograms. • Disadvantage of the mode: May change greatly with new data values.

Chapter 8: Data Management

Chapter 8: Data Management

Presentation Transcript

Chapter 8 Network Management

Chapter 8 - Data Validation

Chapter 8 - Data Validation

Chapter 8: Network Management

Chapter 8: Quality Management

Chapter 8: Memory Management

Chapter 8 – Data Files

Chapter 8 Data and Knowledge Management

Chapter 8 Data and Knowledge Management

Chapter 8 Quality Management

Chapter 8 Data and Knowledge Management

Chapter 8 Quality Management

Chapter 8 : Power Management

Chapter 8: Memory Management

Chapter 8 File Management

Chapter 8: Memory Management

Chapter 8 Management Essentials

Chapter 8: Memory Management

Chapter 8: Data Abstractions