Curious examples of statistical thinking. . .

Curious examples of statistical thinking. . . What is the probability that the sun will rise tomorrow morning? If that’s too hard to think about, consider a slightly bent coin that has been flipped K times and come up heads every time. What is the probability that the following flip will also be heads? This can also be thought of as a business process that has been done K times with perfect success.

Laplace asked this question about 1800. Using an argument that today would be called Bayesian, he got this answer: If Laplace claimed to have 10,000 years of human observation as his experience base, the number is

This rule is called “Laplace’s Law of Succession.” If your firm has hired a new supplier, and if that supplier has been on time with her first 15 orders, this rule says that she’ll be on time with the next order with probability and this is about 94%.

We can get some interesting predictions as well from the Copernican Principle. Copernicus argued that there was nothing special about the location of the earth in the cosmos. Yes, he got in a lot of trouble for this.

This gets interesting when applied to events in time. Suppose that something has a beginning time B and an end time E (unknown). At time X, we observe that this thing has been going on for duration X – B and we’d like to make a conjecture about its end time E. The Copernican principle says that there is nothing special about time X within the lifetime interval (B, E).

The statistician will respond by giving a 95% confidence interval. This is an interval which, you think, will enclose the value of B. You’d bet 95-to-5 that B will be in the interval. For this situation, the interval is

This works best for cosmological or other large-scale events for which you have no other information. Still, you could try this for business events. Suppose that a firm was founded in 1997. It’s been around for 10 years. How long do you think it will last? The 95% confidence interval for its date of demise is (2017.26, 2407).

Let’s think also about the leading-digit phenomenon. The leading digit of 246.22 is 2. The leading digit of 1.46 is 1. The leading digit of 0.0063 is 6. 0 is never a leading digit. Only 1, 2, 3, 4, 5, 6, 7, 8, 9 can be leading digits. What are the probabilities associated with leading digits? Are they each?

The observation which will lead us to the result is this: The probabilities P[ leading digit = k ] for k = 1, 2, …., 9 should be the same for all scales of measurement. (The “same” here refers to the scales of measurement, not k.) The leading digit probabilities for a set of measurements in inches should be exactly the same if the measurements are converted to centimeters. … or feet or yards or cubits or furlongs or smoots or anything else…

The consequence of this observation is the the mantissas (decimal parts) of the base-10 logarithms must be uniformly distributed on [0, 1). Now restrict consideration just to numbers between 1 and 10. P[ leading digit = 1 ] = = P[ 0 ≤ mantissa < 0.3010 ] = 0.3010 In a similar style, P[ leading digit = 2 ] = = P[ 0.3010 ≤ mantissa < 0.4771 ] = 0.1761

So 30% of all leading digits will be “1” and about 18% will be “2.” This phenomenon is known as Benford’s law. It has been known (sort of) since the 1930s. There are claims that this has been used in auditing, as forgers seem not to know about Benford’s law. Side note…. Is this the last legitimate use of base 10 logarithms?

Curious examples of statistical thinking. . .