Sampling Distributions

Go back to Tutorial

Just as the distribution of a numerical variable describes its long-run behavior, the sampling distribution of mean provides information about the long-run behavior of mean when sample after sample is selected.

The sampling distribution of a sample proportion, p, provides information about the long-run behavior of the sample proportion that is necessary for making inferences about a population proportion.

The distribution that would be formed by considering the value of a sample statistic for every possible different sample of a given size from a population is called its sampling distribution.

Sampling Variability

Any quantity computed from values in a sample is called a statistic. The observed value of a statistic depends on the particular sample selected from the population; typically, it varies from sample to sample. This variability is called sampling variability.

Values of statistics such as the sample mean, the sample median, the sample standard deviation or the proportion of individuals in a sample that possess a particular property p, are primary sources of information about various population characteristics.

The sampling distribution of a statistic, such as mean, provides important information about variation in the values of the statistic and how this variation relates to the values of various population characteristics. For more realistic situations with larger population and sample sizes, the situation becomes even worse because there are so many possible samples that must be considered. The sampling distributions for some statistics enables, without actually having to look at all possible samples.

Sampling Distribution of a Sample Mean

When the objective of a statistical investigation is to make an inference about the population mean m, it is natural to consider the sample mean as an estimate of m. Sampling variability causes the sample mean to vary in value from one sample to another. The behavior of sample mean is described by its sampling distribution. The sample size n and characteristics of the population—its shape, mean value m, and standard deviation s—are important in determining properties of the sampling distribution of sample mean.

For any n, the center of the sample mean distribution (the mean value of sample mean) coincides with the mean of the population being sampled and that the spread of the sample mean distribution decreases as n increases, indicating that the standard deviation of sample mean, is smaller for large n than for small n.

Properties of the Sampling Distribution of sample mean

Let sample mean denote the mean of the observations in a random sample of size n from a population having mean m and standard deviation s. Denote the mean value of the sample mean distribution by /x and the standard deviation of the sample mean distribution by . Then the following rules hold

Rule 1. , states that the sampling distribution of sample mean is always centered at the mean of the population sampled.

Rule 2. , This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample. This rule not only states that the spread of the sampling distribution of sample mean decreases as n increases, but also gives a precise relationship between the standard deviation of the sample mean distribution and the population standard deviation and sample size.

Rule 3. When the population distribution is normal, the sampling distribution of sample mean is also normal for any sample size n.

Rule 4. (Central Limit Theorem) When n is sufficiently large (usually 30 or more), the sampling distribution of sample mean is well approximated by a normal curve, even when the population distribution is not itself normal.

Rules 3 and 4 specify circumstances under which the distribution is normal (when the population is normal) or approximately normal (when the sample size is large).

The Central Limit Theorem of Rule 4 states that when n is sufficiently large, the sample mean distribution is approximately normal, no matter what the population distribution looks like. This result has enabled statisticians to develop procedures for making inferences about a population mean m using a large sample, even when the shape of the population distribution is unknown. Application of the Central Limit Theorem in specific situations requires a rule of thumb for deciding whether n is indeed sufficiently large.

Sampling Distribution of a Sample Proportion

The objective of many statistical investigations is to draw a conclusion about the pro-portion of individuals or objects in a population that possess a specified property—for example, coffee drinkers who regularly drink decaffeinated coffee. Traditionally, any individual or object that possesses the property of interest is labeled a success (S), and one that does not possess the property is termed a failure (F). The Greek letter  denotes the proportion of successes in the population. The value of  is a number between 0 and 1, and 100p is the percentage of successes in the population. If  = .75, 75% of the population members are successes, and if  = .01, the population contains only 1% successes and 99% failures.

The value of  pie is usually unknown to an investigator. When a random sample of size n is selected from this type of population, some of the individuals in the sample are successes, and the rest are failures. The statistic that provides a basis for making inferences about  is p, the sample proportion of successes:

General Properties of the Sampling Distribution of p

Let p be the proportion of successes in a random sample of size n from a population whose proportion of S’s is . Denote the mean value of p by  and the standard deviation by . Then the following rules hold.

Rule 1.  This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample.

Rule 2. When n is large and  is not too near 0 or 1, the sampling distribution of p is approximately normal.

Thus, the sampling distribution of p is always centered at the value of the population success proportion , and the extent to which the distribution spreads out about  decreases as the sample size n increases.

The farther the value of  is from .5, the larger n must be for a normal approximation to the sampling distribution of  to be accurate. A conservative rule of thumb is that if both n>= 10 and n(1 – ) >=10, then a normal distribution provides a reasonable approximation to the sampling distribution of p.

 

Certified Inventory and Warehouse Analytics Professional

Go back to Tutorial

Share this post
[social_warfare]
Estimation
Sampling and Estimation

Get industry recognized certification – Contact us

keyboard_arrow_up