This is my summary of the key ideas of How to Lie with Statistics written by Darrell Huff.
The book has a lot of real world examples in government, business, adverts, etc. I eliminated a lot of the examples because it was difficult to summarize them without loosing the gist.
The problem, as with anything based on sampling, is how to read it without learning too much which is not necessarily so.
sampling studyis no better than the sample it’s based on.
representative sample, which is one from which every source of
biashas been removed.
invisible sources of bias.
populationhave an equal chance to be in the sample?
purely random sampleis the only kind that can be examined with entire confidence through statistical theory.
purely random sampleis difficult and expensive to obtain. A more economical substitute is called
stratified random sampling.
stratified sampleyou divide your universe into several groups in proportion to their known prevalence.
stratified sampling: How do you know that your information about the groups’ proportion is correct?
When you are told that something is an average you still don't know very much about it unless you can find out which of the common kinds of average it is:
The United States Steel Corporation once said that its employee's average weekly earnings went up 107% from 1940-1948. So they did — but some of the punch goes out of the magnificent increase when you note that the 1940 figure includes a much larger number of partially employed people. If you work half-time one year and full-time the next, your earnings will double, but that doesn't indicate anything at all about your wage rate.
Knowing nothing about a subject is frequently healthier than knowing what is not so, and a little learning may be a dangerous thing.
law of averagesa useful description or prediction.
range; unless you are trying to hide something of course 😉.
Sometimes the big ado is made about a difference that is mathematically real and demonstrable but so tiny as to have no importance. This is in defiance of the fine old saying that a difference is a difference only if it makes a difference.
IQ testpurports to be is a sampling of the intellect. Like any other product of the sampling method the
IQis a figure with a
statistical error, which expresses the precision or reliability of that figure.
probable errorand the
IQsand many other sampling results is in ranges: "Normal" is not 100, but the range of 90-110, and there would be some point in comparing a child in this range with a child in a lower or higher range. But comparisons between figures with small differences are meaningless.
[…] The figures are the same and so is the curve. It is the same graph. Nothing has been falsified — except the impression that it gives.
Look with suspicion at any version of a bar graph in which the bars change their widths as well as their lengths while representing a single factor or in which they picture three-dimensional objects the volumes of which are not easy to compare.
A truncated bar chart has, and deserves, the same reputation as the truncated line graph in the last chapter.
You can't prove that your nostrum cures colds, but you can publish a sworn laboratory report that half an ounce of the stuff killed 31,108 germs in a test tube in 11 seconds. While you are about it, make sure that the laboratory is reputable or has an impressive name. Reproduce the report in full. Photograph a doctor-type model in white clothes and put his picture alongside.
1* A 1% return on sales2* A 15% return on investment3* A ten-million-dollar profit4* An increase in profits of 40% (compared with the 1935-39 average)5* A decrease of 60% from last year.
rateis more useful here than the number of fatalities.
There are two clocks which keep perfect time.
When the first clock points to the hour, the second clock strikes
Did the first clock cause the second to strike?
1* Correlation by chance. Given a small sample, you are likely to find some substantial correlation between any pair of characteristics or events that you can think of.2* The relationship between variables is real but it’s not possible to be sure which of the variables is the cause and which the effect. In some of these instances cause and effect may change places from time to time or indeed both may be cause and effect at the same time. A correlation between income and ownership of stocks might be of that kind. The more money you make, the more stock you buy, and the more stock you buy, the more income you get; it is not accurate to say simply that one has produced the other.3* Neither of the variables has any effect
"Buy your Christmas presents now and save 100%” advises an advertisement. This sounds like an offer worthy of old Santa himself, but it turns out to be merely a confusion of base.
The reduction is only 50%.
The saving is 100% of the reduced or new price, it is true, but that isn't what the offer says.
If your profits should climb from 3% on investment one year to 6% the next, you can make it sound quite modest by calling it a rise of 3 percentage points; With equal validity, you can describe it as a 100% increase 😉.
When you are told how Johnny stands compared to his classmates in algebra, the figure may be a percentile. It means his rank in each 100 students. For example, in a class of 300, the top 3 will be at the 99 percentile, the next three at the 98, and so on. The odd thing about percentiles is that a student with a 99-percentile rating is probably quite a bit superior to one standing at 90, while those at the 40 and 60 percentiles may be of almost equal achievement. This comes from the habit that so many characteristics have of clustering about their own average, forming the "normal" bell curve.
A report of a great increase in deaths from cancer in the last quarter-century is misleading unless you know how much of it is a product of such extraneous factors as these:
- Cancer is often listed now where "causes unknown” was formerly used;
- Autopsies are more frequent, giving surer diagnoses;
- Reporting and compiling of medical statistics are more complete;
- and people more frequently reach the most susceptible ages now.
And if you are looking at total deaths rather than the death rate, don't neglect the fact that there are more people now than there used to be.
- Who says so? Look for both conscious and unconscious bias and its possible manifestation.
- How do they know? Watch out for biased samples and reported correlations.
- What’s missing? Ask for the raw values when given percentages. Sometimes what is missing is the factor that caused a change to occur: This omission leaves the implication that some other, more desired, factor is responsible.
- Did somebody change the subject? When assaying a statistic, watch out for a switch somewhere between the raw figure and the conclusion, e.g: More reported cases of a disease are not always the same thing as more cases of the disease.
- Does it make sense? Many a statistic is false on its face. It gets by only because the magic of numbers brings about a suspension of common sense.