PIN » Probability » Probability vs. Statistics

Probability vs. Statistics

Sometimes people use the words statistics and probability when talking about the same things. Are these two words just different names for the same concept?
A scientific or mathematical dictionary gives these definitions:

Probability: 1. being probable 2. something that is probable 3. a ratio expressing the chances that a certain event will occur 4. a branch of mathematics studying chances of random events.

Statistics: 1. facts or data assembled and classified so as to present significant information 2. collection, calculation, description, manipulation, and interpretation of the mathematical attributes of large sets or populations 3. a branch of mathematics dealing with collection, analysis and interpretation of data.

Probability is a measure of chance. Specialists look at this meaning of probability in two different ways that are called Frequency View and Personal View (or Subjective View, as philosophers call it).

Example: To find the chances (probability) of getting 3 on a six-sided die, you roll the die 1,000,000 times. For 166,549 times, the roll is a 3.
You find the proportion of 3's by dividing:
166,549 / 1,000,000 = 0.166549

It is approximately 1/6, so you conclude that the probability of getting 3 on this particular die is 1/6.

Example: To find the chances (probability) of getting 3 on a six-sided die, you sit down and think. You reason that all the sides of the die are the same, and that you can believe that the die does not have holes or heavy objects inserted into it. You conclude that each side of the die should have the same chance of landing face up, and therefore, that when you roll the die, you have one chance in six to get a 3. Your answer is that the probability of getting 3 is 1/6.

Definition: Probability of an event in an experiment is the proportion (or frequency) of that event when the same exact experiment is repeated many times. vs. Definition: Probability of an event is what a person who studies it believes about the chances of the event. People who define probabilities use their knowledge about the world to make "the best possible guess."

The Frequency View is closer to statistics. An important aspect of the Frequency View definition is that you need to repeat the same exact experiment to find the probability. It is almost never possible where humans are concerned, for example, in sports or medicine.

Examples: these are several quotes in journals an TV with errors in them.

Example 1:

Quote: "Our team won about 3/4 of the games in every season so far. I tell you, the probability of us winning the next game is 3 out of 4!"

Each game is different from other games. Maybe the opposing team will be much stronger than usual next time. Maybe the weather will be different. Maybe a key player will be sick. And so on. Also, the team may always win against a particular team (the one that is going to play tomorrow), which will affect the chances.

Conclusion: "Our team won about 3/4 of the games in every season so far. If nothing major changes, I believe we are going to win about 3/4 of the games in this season, too."

Example 2:

Quote: "One out of eight women in the USA develops breast cancer during her lifetime. Therefore, if you are female, the probability of you having this form of cancer is1/8."

You are unique (just like everybody else). There is no way for a person to know her exact chances in anything that is connected with health. Studies show that body proportions, diet, weight, clothes preferences, number of pregnancies and breastfeeding all affect breast cancer rates in women. Even though "one out of eight" is the average for the USA, it does not tell much about each particular person.

Conclusion: "One out of eight women in the USA develops breast cancer during her lifetime. If we randomly select 1,000,000 women and look at their medical histories, we can expect about 125,000 (not exactly!) of them to develop breast cancer."

Example 3:

Quote: "On the average, drivers have accidents once every two years. Your last accident was 3 years ago, so you can expect an accident any time now."

Rates of accidents vary greatly with experience, car type, age and health of the driver, driving habits, and so on. National average says close to nothing about your chances of having an accident.

Conclusion: "On the average, drivers have accidents once every two years. If you randomly choose 1000 drivers, you can expect them all together to have had about 5000 accidents over the previous 10 years."

All these errors are of the same type. They take data about large numbers of people, and try to use it in personal cases. Collecting data about large numbers of people (or other objects), and using this data for studying other large groups of people belongs to statistics.

The only time it can be used for probability, that is, for studying chances in individual cases, is when all the experiments are the same (or almost the same).
For example, you can use data (statistics) from rolling a six-sided die one million times (in exactly the same manner) to find the chances (probability) of rolling 5 on your next try. You can not use data (statistics) from studying driving records of a million people to find the chances (probability) of yourself having an accident today.

Statistics deals with data that may or may not be useful for finding probability. Data can also be useful by itself, without any connection to probability. For example, you need to know, at least approximately, how many voters live in a particular city in order to prepare for elections. You might want to know the proportion of people who get the flu during each year in order to compare several years and to try to find out what may cause increases in flu rates.

Next Section: QUIZ

Previous : Representation of Probabilities