940 likes | 1.08k Views
Week 1 September 1-5. Six Mini-Lectures QMM 510 Fall 2014 . Getting Started ML 1.1. Chapter 0. self-introductions (Moodle mini-biographies) course format, syllabus, projects grading, communication goals: short run vs long run.
E N D
Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014
Getting Started ML 1.1 Chapter 0 • self-introductions (Moodle mini-biographies) • course format, syllabus, projects • grading, communication • goals: short run vs long run You can watch the instructor’s introductory welcome video for MBA students (posted on Moodle)
Getting Started Chapter 0 Textbook David P. Doane and Lori E. Seward, Applied Statistics in Business and Economics, 4th edition (McGraw-Hill, 2013), ISBN 0077931505. This is an omnibus ISBN that includes several components (textbook, Connect access, MegaStat download). All four components are essential because this is an online course. The Oakland University campus book center (248-370-2404) has this package ISBN in stock (and can ship to you if necessary).
Getting Started Chapter 0 Online Resources Homework, testing, and grading will utilize McGraw-Hill's Connect Plus. The Online Learning Center (OLC) has downloadable data sets for exercises and examples, as well as Big Data Sets, PowerPoint slides, self-graded practice quizzes, and step-by-step guided examples. The instructor will post mini-lectures on Moodle.
Getting Started Chapter 0 Course Organization Unless otherwise indicated, online quizzes, exercises, and written projects are due by midnight on Monday of the week shown in the syllabus. Use e-mail (doane@oakland.edu) or call me (cell 248-766-7605) Note: Instructor is in the Pacific time zone (please use judgment when calling). Post questions on Moodle forum.
Getting Started Chapter 0 Grading Students will complete several written projects (50% weight, graded by instructor) and several Connect assignments with online feedback (50% weight). Basically, you will submit one assignment (Connect or Project) per week except for weeks 9 and 13. Grades will be posted on Moodle.
Getting Started Chapter 0 Homework using Connect C-1 Chapters 2-3 (Sep 8) C-5 Chapter 8 (Oct 20) C-2 Chapter 4 (Sep 15) C-6 Chapter 9-10 (Nov 3) C-3 Chapters 5-6 (Sep 29) C-7 Chapter 15 (Nov 10) C-4 Chapter 7 (Oct 6) C-8 Chapter 12 (Nov 17) Note: Connectassignments allow three attempts. Online feedback increases with each attempt. Assignments will be auto-submitted on due date. Your score will be the average of all three attempts, so it pays to try hard on each attempt. You may complete them in advance (they are accessible anytime up to due date). Be sure to save your work when you exit Connect.
Getting Started Chapter 0 Projects P-1 Describing a sample (Sep 22) P-2 Making forecasts (Oct 13) P-3 Regression modeling (Dec 3) Note: For each project, submit a concise (5-10 page) report (nota spreadsheet or PowerPoint) using Microsoft Word or equivalent that answers the questions posed along with your own comments and interpretations. Strive for effective writing (see textbook Appendix I). Creativity and initiative will be rewarded. In projects done with partners or teams, submit only one report.
Goals: Short Run / Long Run Chapter 0 Short Run Complete weekly assignments successfully Improve Excel and report-writing skills Balance this course against other responsibilities Enjoy learning and want to learn more Long Run Succeed in other MBA classes that use statistics Develop confidence and lose fear of quant methods Use resources to learn on your own (web, textbook)
Resources Available ML 1.2 Chapter 0 • textbook, e-book • OLC (http://www.mhhe.com/doane4e) • Connect (http://connect.mcgraw-hill.com/class/d_doane_qmm_510_-_fall_2014) • Moodle (https://moodle.oakland.edu/) • MegaStat (http://www.mhhe.com/megastat) • LearningStats (http://www.mhhe.com/doane4e)
Resources Available Chapter 0 Textbook, e-book • Basically, we will cover the first 14 chapters • Within chapters some topics get less weight • Focus on what you need for assignments Not covered in this class
Resources Available Chapter 0 Pre-paid registration code is required to use Connect Plus • Connect Plus(http://connect.mcgraw-hill.com/class/d_doane_qmm_510_-_fall_2014) E-book: In addition to textbook, you have an e-book Premium content: ScreenCam videos on Excel and MegaStat
Resources Available Chapter 0 • Connect Plus(http://connect.mcgraw-hill.com/class/d_doane_qmm_510_-_fall_2014) OLC (http://www.mhhe.com/doane4e) The OLCis available to anyone (without premium content) A pre-paid registration code is required to use Connect Plus and premium content Premium content: 5-minute tutorials on Excel and MegaStat
Resources Available Chapter 0 • Connect Plus(http://connect.mcgraw-hill.com/class/d_doane_qmm_510_-_fall_2014) A pre-paid registration code is required to use Connect Plus and premium content ScreenCam tutorials on Excel statistics – by Professor Doane (4 videos, 5 min each) if you need it
Resources Available Resources Available Chapter 0 No registration code required to use OLC OLC (http://www.mhhe.com/doane4e) Course: Big Data Sets, LearningStats, etc Click on a chapter: Quizzes, PowerPoints for that chapter
Resources Available Chapter 0 MegaStat (http://www.mhhe.com/megastat) Click to download: Pre-paid with code (with ISBN 0077931505)
Resources Available Chapter 0 Add-Ins tab: Click on this tab to see MegaStat drop-down menu Drop-down menu: Adds statistical capability to Excel MegaStat (http://www.mhhe.com/megastat)
Resources Available Chapter 0 OLC (http://www.mhhe.com/doane4e) Appendix A F Tables (346.0K) Appendix I Business Reports (1011.0K) Unit 01 Overview of Statistics (5925.0K) Unit 02 Data Collection (815.0K) Unit 03 Data Presentation (9572.0K) Unit 04 Describing Data (3337.0K) Unit 05 Probability (478.0K) Unit 06 Discrete Distributions (550.0K) Unit 07 Continuous Distributions (1409.0K) Unit 08 Estimation (2103.0K) Unit 09 Hypothesis Tests I (1135.0K) Unit 10 Hypothesis Tests II (420.0K) Unit 11 ANOVA (192.0K) Unit 12 Simple Regression (2245.0K) Unit 13 Multiple Regression (2756.0K) Unit 14 Time Series I (1519.0K) Unit 15 Chi Square Tests (627.0K) Unit 16 Nonparametric Tests (1385.0K) Unit 17 Quality Management (1329.0K) Unit 18 Simulation (1460.0K) Files are zipped: Download one chapter at a time LearningStats is a supplement – nice but not part of the textbook (demos, spreadsheets, slides)
Challenges for MBAs ML 1.3 Chapter 1 1.1 What is Statistics? 1.2 Why Study Statistics? 1.3 Uses of Statistics 1.4 Statistical Challenges 1.5 Critical Thinking
What is Statistics? Chapter 1 Statisticsis the science of collecting, organizing, analyzing, interpreting, and presenting data. Astatisticis a single measure (number) used to summarize a sample data set; for example, the average height of students in a university.
Big Data, Big Tools Chapter 1 • Data mining, neural tools, simulation, spreadsheet modeling, etc • Costly software • Specialized expertise required • Huge databases (millions of records, complex file structure, sparse or missing data, proprietary concerns, privacy issues)
Uses of Statistics Chapter 1 Descriptive statistics – the collection, organization, presentation, and summary of data. Inferential statistics – generalizing from a sample to a population, estimating unknown parameters, drawing conclusions, making decisions.
Statistical knowledge gives a company a competitive advantage against organizations that cannot understand their internal or external market data. Mastery of basic statistics gives an individual manager a competitive advantage as one works one’s way through the promotion process, or when one moves to a new employer. Why Study Statistics Chapter 1
The Ideal Data Analyst Chapter 1 • Is technically current (e.g., software-wise). • Communicates well. • Is proactive. • Has a broad outlook. • Is flexible. • Focuses on the main problem. • Meets deadlines • Knows his/her limitations and is willing to ask for help. • Can deal with imperfect information. • Has professional integrity.
Business Ethics Chapter 1 • Treat customers in a fair and honest manner. • Comply with laws that prohibit discrimination. • Ensure that products and services meet safety regulations. • Stand behind warranties. • Advertise in a factual and informative manner. • Encourage employees to ask questions and voice concerns. • Accurately report information to management.
Upholding Ethical Standards Chapter 1 • Know and follow accepted procedures. • Maintain data integrity. • Carry out accurate calculations. • Report procedures faithfully. • Protect confidential information. • Cite sources. • Acknowledge sources of financial support.
Critical Thinking Chapter 1 Pitfall 1: Big Conclusions from a Small Sample Pitfall 2: Conclusions from Nonrandom Samples Pitfall 3: Conclusions From Rare Events Pitfall 4: Poor Survey Methods Pitfall 5: Assuming a Causal Link Pitfall 6: Generalization from Groups Pitfall 7: Unconscious Bias Pitfall 8: Significance versus Importance
Using Consultants Chapter 1 Hire consultants at the beginning of the project, when your team lacks certain skills or when an unbiased or informed view is needed.
Collecting Data ML 1.4 Chapter 2 Chapter Contents 2.1 Definitions 2.2 Level of Measurement 2.3 Sampling Concepts 2.4 Sampling Methods 2.5 Data Sources 2.6 Surveys
Definitions Chapter 2 • Observation: a single member of a collection of items that we want to study, such as a person, firm, or region. • Variable:a characteristic of the subject or individual, such as an employee’s income or an invoice amount • Data Set: consists of all the values of all of the variables for all of the observations we have chosen to observe.
Time Series vs Cross-Sectional Data Chapter 2 • Time Series Data • Each observation in the sample represents a different equally spaced point in time (e.g., years, months, days). • Periodicitymay be annual, quarterly, monthly, weekly, daily, hourly, etc. • We are interested in trends and patterns over time (e.g., personal bankruptcies from 1980 to 2008).
Time Series vs Cross-Sectional Data Chapter 2 • Cross Sectional Data • Each observation represents a different individual unit (e.g., person) at the same point in time (e.g., monthly VISA balances). • We are interested in: - variation among observations or - relationships. • We can combine the two data types to get pooled cross-sectional and time series data.
Data Types Chapter 2 Caution: Ambiguity is introduced when continuous data are rounded to whole numbers so they seem discrete (e.g., round your weight from 166.4 to 166). When the range is large, it is usually best to treat integers as continuous data. (Figure 2.1)
Level of Measurement Chapter 2
Level of Measurement Chapter 2
Level of Measurement Chapter 2 • Nominal Measurement • Nominal data merely identify a category. • Nominal data can be coded numerically (e.g., 1 = Apple, 2 = Toshiba, 3 = Dell, 4 = HP, 5 = Other). • Only mathematical operation allowed is counting (e.g., frequencies) or calculating percent in each category. • Ordinal Measurement • Ordinal data codes can be ranked (e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely, 4 = Never).
Level of Measurement Chapter 2 • Ordinal Measurement • Distance between codes is not meaningful (e.g., distance between 1 and 2, or between 2 and 3, or between 3 and 4 lacks meaning). • Many useful statistical tests exist for ordinal data, especially in social science, marketing and human resource research. • Interval Measurement • Data can not only be ranked, but also have meaningful intervals between scale points (e.g., difference between 60F and 70F is same as difference between 20F and 30F).
Level of Measurement Chapter 2 • Interval Measurement • Intervals between numbers represent distances, so math operations can be performed (e.g., take the average). • Zero point of interval scales is arbitrary, so ratios are not meaningful (e.g., 60F is not twice as warm as 30F). • Ratio Measurement • Ratio data have all properties of nominal, ordinal, and interval data types and also a meaningful zero. • Because of this zero point, ratios of data values are meaningful (e.g., $20 million profit is twice as much as $10 million). • Zero does not have to be observable; it is a reference point.
Likert Scales Chapter 2 • A special case of interval data frequently used in survey research. • The coarseness of a Likert scale refers to the number of scale points (typically 5 or 7). Responses are often coded as numbers (e.g., 1, 2, 3, 4, 5) but technically are ordinal measurements. • Researchers generally treat Likert scales as intervaldata (no true zero) so they can calculate the mean and standard deviation.
Use the following procedure to recognize data types: Level of Measurement Chapter 2
Changing Data By Recoding Chapter 2 • In order to simplify data or when exact data magnitude is of little interest, ratio data can be recoded downward into ordinal or nominal measurements (but not conversely). • For example, recode systolic blood pressure as “normal” (under 130), “elevated” (130 to 140), or “high” (over 140). • Or recode your income (a ratio measurement) as ordinal (low, medium, high) by specifying cutoff points. • The above recoded data are ordinal (ranking is preserved), but intervals are unequal and some information is lost.
Sample or Census? Chapter 2 • A sampleinvolves looking only at some items selected from the population. • A censusis an examination of all items in a defined population. • Why sample instead of census? • Cost, time, budget constraints. • Accuracy may be better in a sample (training, etc). • For example, the United States Census cannot survey every person in the population (mobility, un-documented workers, budget constraints, incomplete responses, etc).
Sampling Concepts Chapter 2
Parameters and Statistics Chapter 2 • Statisticsare computed from a sample of n items, chosen from a population of N items. • Statistics can be used as estimates of parametersfound in the population. • Specific symbols are used to represent population parameters and sample statistics. Example: If you use the symbol s, the statistician assumes that you are referring to a sample standard deviation, whereas σ would denote a population standard deviation.
Parameters and Statistics Chapter 2 Rule of Thumb: A population may be treated as infinite when N is at least 20 times n (i.e., when N/n ≥ 20 or equivalently if n/N < .05).
Sampling Methods Chapter 2 Random Sampling
Sampling Methods Chapter 2 Non-random Sampling
Sampling Methods Chapter 2 With or Without Replacement • If we allow duplicates when sampling, then we are sampling with replacement. • Duplicates are unlikely when n is much smaller than large N. • If we do not allow duplicates when sampling, then we are sampling without replacement.
Sampling Methods Chapter 2 Computer Methods These are pseudo-random generators because even the best algorithms eventually repeat themselves.
Sampling Methods Chapter 2 Row – Column Data Arrays • When the data are arranged in a rectangular array, an item can be chosen at random by selecting a row and column. • For example, in the 4 x 3 array, select a random column between 1 and 3 and a random row between 1 and 4. • This way, each item has an equal chance of being selected.