290 likes | 433 Views
Estimation and Uncertainty. 12-706 / 19-702 Lecture 2. Announcements / Etc. Today’s slides posted after class No class on Friday, Monday (Labor Day) HW 1 Handed Out Another TA added - Aweewan Office Hours: XXXXXX In CEE alumni lounge (118) 1 session in non-HW weeks, 2 when HW.
E N D
Estimation and Uncertainty 12-706 / 19-702 Lecture 2
Announcements / Etc. • Today’s slides posted after class • No class on Friday, Monday (Labor Day) • HW 1 Handed Out • Another TA added - Aweewan • Office Hours: XXXXXX • In CEE alumni lounge (118) • 1 session in non-HW weeks, 2 when HW
Estimation in the Course • We will encounter estimation problems in sections on demand, cost and risks. • We will encounter estimation problems in several case studies. • Projects will likely have estimation problems. • Need to make quick, “back-of-the-envelope” estimates in many cases. • Don’t be afraid to do so!
Problem of Unknown Numbers • If we need a piece of data, we can: • Look it up in a reference source • Collect number through survey/investigation • Guess it ourselves • Get experts to help you guess it • Often only ‘ballpark’, ‘back of the envelope’ or ‘order of magnitude needed • Situations when actual number is unavailable or where rough estimates are good enough • E.g. 100s, 1000s, … (102, 103, etc.) • Source: Mosteller handout
Notes about Reference Sources • Some obvious: Statistical Abstract of US • Always check sources and secondary sources of data • Usually found in footnotes – also tells you about assumptions/conditions for using • Sometimes the summarized data is wrong! • Look in multiple sources • Different answers implies something about the data and method – and uncertainty
Estimation gets no respect • The 2 extremes - and the respect thing • Aristotle: • “It is the mark of an instructed mind to rest satisfied with the degree of precision which the nature of the subject permits and not to seek an exactness where only an approximation of the truth is possible.” • Archbishop Ussher of Ireland, 1658 AD: • “God created the world in 4028 BC on the 9th of September at nine o’clock in the morning.” • We consider it somewhere in between
In the absence of “Real Data” • Are there similar or related values that we know or can guess? (proxies) • Mosteller: registered voters and population • Are there ‘rules of thumb’ in the area? • E.g. ‘Rule of 72’ for compound interest • r*t = 72: investment at 6% doubles in 12 yrs • MEANS construction manual • Set up a ‘model’ to estimate the unknown • Linear, product, etc functional forms • Divide and conquer
Methods • Similarity – do we have data that can be made applicable to our problem? • Stratification – segment the population into subgroups, estimate each group • Triangulation – create models with different approaches and compare results • Convolution – use probability or weightings (see Selvidge’s table, Mosteller p. 181) • Note – example of a ‘secondary source’!!
Notes on Estimation • Move from abstract to concrete, identifying assumptions • Draw from experience and basic data sources • Use statistical techniques/surveys if needed • Be creative, BUT • Be logical and able to justify • Find answer, then learn from it. • Apply a reasonableness test
Attributes of Good Assumptions • Need to document assumptions in course • Write them out and cite your sources • Have some basis in known facts or experience • Write why you make the specific assumptions • Are unbiased towards the answer • Example: what is inflation rate next year? • Is past inflation a good predictor? • Can I find current inflation? • Should I assume change from current conditions? • We typically use history to guide us
How many TV sets in the US? • Can this be calculated? • Estimation approach #1: Survey/similarity • How many TV sets owned by class? • Scale up by number of people in the US • Should we consider the class a representative sample? Why not?
TV Sets in US – another way • Estimation approach # 2 (segmenting): • Work from # households and # TV’s per household - may survey for one input • Assume x households in US • Assume z segments of ownership (i.e. what % owns 0, owns 1, etc) • Then estimated number of television sets in US = x*(4z5+3z4+2z3+1z2+0z1)
TV Sets in US – sample • Estimation approach # 2 (segmenting): • work from # households and # tvs per household - may survey for one input • Assume 50,000,000 households in US • Assume 19% have 4, 30% have 3, 35% 2, 15% 1, 1% 0 television sets • Then 50,000,000*(4*.19+3*.3+2*.35+.15) = 125.5 M television sets
TV Sets in US – still another way • Estimation approach #3 – published data • Source: Statistical Abstract of US • Gives many basic statistics such as population, areas, etc. • Done by accountants/economists - hard to find ‘mass of construction materials’ or ‘tons of lead production’. • How close are we?
How well did we do? • Most recent data = 2004 • But ‘recently’ increasing < 2% per year • TVs - 125.5 tvs, StatAb – 268M TVs, • % error: (268M – 125.5M)/125.5M ~ 110% • What assumptions are crucial in determining our answer? Were we right? • What other data on this table validate our models? • See ‘SAMPLE ESTIMATION’ linked on web page to see how you are expected to answer these types of questions. • Also see “SAMPLE SPREADSHEET” for a suggested organization in Excel
Notes on Sample Estimation Files • Give the type and structure of documentation we expect when doing assumption-based analysis. Question like it on HW1 - make sure your answer looks like that. • The spreadsheet file suggests a framework for building assumptions into spreadsheets, i.e., placing them all at the top where you can see them. If needed, you can use the cell values as links in your equations. • Note the Excel plug-ins we will use later will want to see assumptions done like this.
Changing Assumptions • Statistical Abstract gave additional info: • Average TVs/HH = 2.4 (ours was 2.5) • Number of households: 100 million (ours 50) • Thus to redo our analysis, we should do a better job at estimating households
Significant Figures • We estimated 125,500,000 TVs in US • How accurate is this - nearest 50,000, the nearest 500,000, the nearest 5,000,000 or the nearest 50,000,000? • Should only report estimates to your confidence - perhaps 1 or 2 “significant figures” could be reported here. • Figures are only carried along to document calculations or avoid rounding errors.
Notes on References - Check and Double Check Sources (and dates) • Top 3 google sites for “US population”. • 281,421,906 (factmonster.com “2000”) • 302,510,402 (wikipedia.org, census, for July 2007) • 304,981,258 (census pop. Clock - live) • Note on secondary sources.. • Number of households • 114.3 million (2006, US census website) • US avg personal income: $38,611 • http://www.unm.edu/~bber/econ/us-pci.htm, but source is U.S. Dept. of Commerce, Bureau of Economic Analysis. Released March 26, 2008.
Avoiding Point Estimates • The tradeoff in this kind of work is getting away with a guess • And giving an informed-enough answer that doesn’t sound like a guess! • Really what we should be doing is making ranges of estimates • We will refer to these as lower bound, mean, and upper bound estimates • You might think of lower bound as “5th percentile” and upper as “95th percentile” • So they’re not true lower/upper bounds (which might be zero and infinity).
Uncertainty • Investment planning and benefit/cost analysis is fraught with uncertainties • forecasts of future are highly uncertain • applications often made to preliminary designs • data is often unavailable • Statistics has confidence intervals – we need them, too • We will talk in more detail about uncertainty in a few weeks.
Exercise #2: Estimate Annual Vehicle Miles Travelled (VMT) in the US • Estimate “How many miles per year are passenger automobiles driven in the US?” • Types of models • Similar to TVs: Guess number of cars, segment population into miles driven per year • Find fuel consumption data, guess at fuel economy ratio for passenger vehicles • Other ideas? Let’s try it on the board.
Estimate VMT in the US • Table 1084 of 2006 Stat. Abstract suggests 2003 VMT was 2.7 trillion miles (yes - twice as much as 1972 implied in the Mosteller handout)! • About 200 million cars • about 12,000 miles per car • Note the Dept of Transportation separately specifies “passenger car VMT” as 1.7 trillion miles - does better job of separating trucks • About 16k VMT per household • http://www.bts.gov/publications/national_transportation_statistics/2006/index.html (Table 1-32)
More clever: Cobblers in the US • Cobblers repair shoes
More clever: Cobblers in the US • Cobblers repair shoes • On average, assume 20 min/task • Thus 20 jobs / day ~ 5000/yr • How many jobs are needed overall for US? • I get shoes fixed once every 5 years • About 280M people in US • Thus 280M/4 = 56 M shoes fixed/year • 56M/5000 ~ 11,000 => 10^4 cobblers in US • Actual: Census dept says 5,120 in US
A Random Example • Select a random panel of data from the Statistical Abstract of the U.S. (1998) • Something not likely to have changed much • Can you formulate an ‘estimation question’? • Can you estimate the answer? • How close were you to the ‘actual answer’? • Let’s try this ourselves
Form Small Groups • Make groups of 3-4 • Pick one of the problems on the handout and work on it for 5-10 minutes • Finish for HW 1 (group)