Posted June 3, 2001
People keep asking me about the normal distribution. Well, that's not quite true. In fact, nobody ever asks me about the normal distribution. But they probably should. Because it's fascinating stuff.
But first of all, what is a distribution? If you ask every person in Nelson how many phones they have, you can make a distribution graph with the number of phones on the horizontal axis, and the number of people with that many phones vertical axis. There are lots of people with two phones, and lots of people with one phone. A few people have no phones at all, and there's probably at least one person who has 8 phones for some reason or other. The number of people will drop off gradually as you increase the number of phones; and this pattern will be immediately obvious from the distribution graph.
Obviously, the shape of your graph depends on what you're measuring. If you measure the volume setting on all of the home stereo systems in Nelson, you'll find a lot of people listening at a medium volume, a few people who like to have the stereo really loud, and a few people who like it really soft. So the distribution graph will have a big hump at medium volume, and it will drop off gradually on both sides.
And that's the gist of the normal distribution. It's a certain shape of distribution graph: it has a hump in the middle, and it drops off on both sides. It turns out that if you have a large population and you measure something with some "normal" value, like "normal volume", but there's some organic activity happening that approximates normal instead of achieving it exactly, the result is a normal distribution.
Let's say you visited all the people who listen to KCR on analog radios. Presumably they've all made some effort to set the tuning dial at exactly 93.5, by moving it back and forth around 93.5 until they find the clearest signal. Hopefully you would find a pretty clear trend, with lots of people being very close to the correct frequency. But you'd also get a few people on the fringes. And the further you moved from 93.5, the fewer people you'd get on your graph.
This is a really interesting part of the relation between math and nature. Nature, at least the way we usually measure it, is fundamentally imprecise (or unpredictable, if you prefer). But there is a pattern to the imprecision, so there's a mathematical language to describe that pattern. Once you establish that you're measuring a part of nature that is closely approximated by the normal distribution, you can make use of a whole bunch of math goodies to analyze that part of nature. And those math goodies are collectively referred to as statistics.
One of these goodies is the idea of random sampling. If people tune their radios according to a normal distribution, then a sufficiently large random sampling of people will also follow that same normal distribution. If you want to know how close to 93.5 people are getting, you don't have to do a comprehensive survey; you can choose a few people randomly from the population, measure their tuning skills, and assume that they are representative of the population as a whole.
Here's an interesting question. Does the number of people you need for a good random sample, depend on how many people are in the entire population? If you're trying to determine the tuning skills of Americans, do you need to include 10 times as many people in your random sample as you would if you were studing Canadians?
If you want the answer to that question, it might help to consider what would happen if you were measuring the tuning skills of all the people in a household of 5. How many people would you need to ask in order to have a good idea of the trends in the total population of 5?