360 likes | 455 Views
PPAL 6200 Research Methods and Info Systems. Intro and Chapter 1. Class Outline. Intro to the Course Discussion of Software and Technical Issues Break Describing Data “Distributions” with Graphs. Introduction to the Course. What can you expect to learn in this class
E N D
PPAL 6200Research Methods and Info Systems Intro and Chapter 1
Class Outline • Intro to the Course • Discussion of Software and Technical Issues • Break • Describing Data “Distributions” with Graphs
Introduction to the Course • What can you expect to learn in this class • A Framework for Conducting and Evaluating Empirical Research • A Framework for Conducting and Evaluating Statistical Research • The challenges facing those who must deal with information systems as part of their jobs
What you should not expect to learn in this class • A professional capacity to conduct statistical work, at best you will be prepared to learn more about how to undertake statistical work (if you choose to do so) and to be a “knowledgeable consumer” of research prepared using statistics.
Some Key Concepts to Start Us Off *source unless noted Moore(2009) • Data • Numbers with a context (xxiv). The context including how data is collected can alter results. • Variable • An empirical property that can take on two or more values (Frankfort-Nachmias & Nachmias 1996:50) Don’t get suckered in by small and rapid changes, look at the big picture (xxvii) • Case • An individual, event or other thing for which we have data • Measurement • The assignment of numbers to objects, events or variables according to rules (ibid: 156-157)
Levels of Measurement • Nominal, Ordinal, Interval, Ratio • Validity • Are you measuring what you thought you are measuring? • Reliability • Are you measuring it accurately? • Spuriousness • Is there something else involved? Beware the lurking variable (xxvii) • Statistics • The science of learning from data (xxiv)
The Book Title Says It All… • This is a class in the “basic practice of statistics” with a little bit of practical advice thrown in regarding management of information systems • Inside the front cover of the book is a wonderful set of flow through figures that show how one can go about statistical thinking in a disciplined manner and three four step plans to guide your work
Some software and technical issues • For this portion of the class we will quickly review my website then leave power point to go look at the electronic resources available there to assist you
Please Note: The secure website will look different for you as I have access to page design resources you will not see • We will now leave power point to look at these resources
Describing Data Distributions with Graphs • As the introductory sections of the book noted, you really cannot go wrong to begin your work by visualizing the individual variables that comprise your data (and on occasion plotting them against another variable such as time). • The distribution tells you what values a variable takes and how often it does so
Ways we can Visualize and Explore Data • Exploratory analysis is not meant to allow us to reach any deep conclusions it is meant to help us better understand the data set and the relationships within it • We want to look both for an overall pattern (consistencies) and deviation from it (often called outliers) • Tables • Tables are effective tools for visualizing data, provided that we do not have too many variables, nor too many cases • At a certain point we need to graphically depict our data to make it understandable as a snapshot
Which Graph? • The graphic depictions we employ are dependent on: • The type of data we have • Level of Measurement • Whether Stationary or Chronological
Some Common Graphs • Pie Chart (good for showing percentages when few categories of a nominal or ordinal variable)
Bar Charts are equally useful for nominal and ordinal variables but have the benefit of allowing more flexibility
Histograms • Histograms can be confusing as they look like Bar Graphs sometimes. In fact you can make them by carefully specifying a Bar Graph. However they are really quite different. • They are meant for use with Interval and Ratio data where there is a lot of variability among cases because there are so many possible values for the data
Therefore we have to “group the data” to a certain extent to allow us to represent it • What a histogram shows is the percentage of cases that have a score within the groups represented by the bars
You will notice that this graph looks a bit different from the one in the book. • This is because the scaling that my software used is a bit different from that used by the person who did the examples in the book.
This brings up a good point • Be careful how you manipulate data as you will see in the next section of the talk. these two graphs portray the same information but one will give us a more interesting result.
Describing a Distribution • Once we get to developing histograms we can start to evaluate the shape of our data in a number of interesting ways (Shape, Centre, Spread) • What is the shape of the plot? Is it single peaked or multi-peaked? • Where is the peak? Is it at the centre or off-centre (skewed)? When the tail of a distribution heads off to one side unevenly we say it is skewed to that side (this is confusing) • What about outliers? Any unusually high or low scores?
As you can see below: Regrouping the data makes one figure more symmetrical than the other
A stemplot is not so elegant • Granted it is not so elegant but it does allow us to figure out what is happening inside of those bars….
Stemplots(Stem-and-Leaf Plots) • For quantitative variables • Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number) • Write the stems in a vertical column; draw a vertical line to the right of the stems • Write each leaf in the row to the right of its stem; order leaves if desired
1 2 Weight Data
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Weight Data:Stemplot(Stem & Leaf Plot) 192 5 152 2 135 Key 20|3 means203 pounds Stems = 10’sLeaves = 1’s 2 Chapter 1
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Weight Data:Stemplot(Stem & Leaf Plot) 5 2 Key 20|3 means203 pounds Stems = 10’sLeaves = 1’s 2 Chapter 1
10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Weight Data:Stemplot(Stem & Leaf Plot) Key 20|3 means203 pounds Stems = 10’sLeaves = 1’s Chapter 1
Extended Stem-and-Leaf Plots If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splitting the original stems. In other words, you can have more than one stem with the same base number. Chapter 1
151516161717 Extended Stem-and-Leaf Plots Example: if all of the data values were between 150 and 179, then we may choose to use the following stems: Leaves 0-4 would go on each upper stem (first “15”), and leaves 5-9 would go on each lower stem (second “15”). Chapter 1
Thinking about these Graphs • When we look at these graphs we have to keep in mind the questions we have started • Shape • Centre (other than time-series) • Outliers