A143distributionofp


 * Distribution of Sample Proportion **

In the last section we introduced the idea of a sample proportion.

The ** sample proportion ** is the fraction of the sample which scores a success on the question being studied.

math . \qquad \text{We use } \hat{p} \text{ to represent the sample proportion} \\. \\ . \qquad \hat{p} \text{ is a sample statistic so it varies from sample to sample} \qquad. math

math . \qquad \hat{p} = \dfrac{\text{number of successes in sample}}{\text{sample size (n)}} \qquad. math

The usual process in statistics is to select one sample from the population and draw a conclusion about the population from the sample.

In this section, we will collect a significant number of samples from the same population (returning the sample to the population each time)

The sample proportion then behaves like a binomial distribution.


 * Caution: **
 * When we studied binomial distributions, we used n = the number of trials
 * In this topic, we use n = sample size
 * The two meanings are related: in one sample we are effectively performing n trials of a binomial variable


 * Example 1 **

It is known that 12% of students in a school of 1500 are left handed
 * the population proportion p = 0.12
 * the population size N = 1500

We will use a sample size of 20 students
 * n = 20

Let X be the variable which is the number of left handed students in each sample.

math . \qquad \hat{p} = \dfrac{X}{n} \qquad. math

We took 50 samples with n = 20 and produced the following frequency table (modelled using random numbers)


 * ie there were 5 samples with 0 left-handed students, 10 samples with 1 left handed student, etc
 * there were no samples with X > 6

We can find the mean and standard deviation of this set of data:

math . \qquad \textbf{Mean } \; \mu = \dfrac{\Sigma \hat{p} \times f}{\Sigma f} = \Sigma \hat{p} \times rf = 0.12 \qquad. math

math . \qquad \textbf {Variance } \; \sigma^2 = E \big( \hat{p}^2 \big) - \mu^2 = 0.0055 \qquad. math

math . \qquad \textbf{Standard Deviation } \; \sigma = \sqrt{0.0055} = 0.0742 \qquad. math

Despite having modelled this with random numbers, the mean sample proportion worked out to be exactly 0.12 which is the same as the population proportion.


 * Expected Value and Standard Deviation of Sample Proportion **

Larger samples give better estimates of the population proportion, p.

If the sample is sufficiently large, then
 * the distribution of X, the number of successes, can be treated as a binomial variable
 * the distribution of the sample proportion can therefore also be treated as a binomial variable

math . \qquad \text{We know that the sample proportion } \; \hat{p} = \dfrac{x}{n} \qquad. math

math . \qquad \text{For a large sample, the random variable } \; \hat{P} = \dfrac{X}{n} \qquad. math

Therefore: math . \qquad \text{E} \big( \hat{P} \big) = \text{E} \Big( \dfrac{X}{n} \Big) \\. \\ . \qquad \quad = \dfrac{1}{n} \text{E} \big( X \big) \qquad. \\ . \\ . \qquad \quad = \dfrac{1}{n} \times np \\. \\ . \qquad \quad = p math

Also math . \qquad \text{Var} \big( \hat{P} \big) = \text{Var} \Big( \dfrac{X}{n} \Big) \\. \\ . \qquad \qquad = \Big( \dfrac{1}{n} \Big)^2 \text{Var} \big( X \big) \qquad. \\ . \\ . \qquad \qquad = \dfrac{1}{n^2} \times np(1-p) \\. \\ . \qquad \qquad = \dfrac{p(1-p)}{n} math

hence math . \qquad \text{SD} \big( \hat{P} \big) = \sqrt{ \dfrac{ p(1-p) }{n} } \qquad. math


 * Example 1b **

In the example above of left-handed students, where p = 0.12, n = 20 we get

math . \qquad \text{E} \big( \hat{P} \big) = 0.12 \\ .\\ . \qquad \text{SD} = \sqrt{ \dfrac{ 0.12(1 - 0.12) }{20} } = 0.0727 \qquad. math

Compare these values with the experimental results obtained from 50 samples math . \qquad \mu = 0.12 \qquad \qquad \sigma = 0.0742 \qquad. math


 * Large Samples **

The above theory works best when sufficiently large samples are taken.

One definition of a large sample is that it fits the following 3 rules:
 * np __>__ 10
 * n(1 – p) __>__ 10
 * 10n __<__ N


 * Example 1c **

Consider the example above where we took samples of 20 students from a school of 1500 to test for left-handedness (p = 0.12). Is this sample sufficiently large? If not, how large should the sample be?

__**Solution:**__

math . \qquad \bullet \quad np = 20 \times 0.12 = 2.4 \quad \textit {which is NOT } \geqslant 10 \\. \\ . \qquad \bullet \quad n(1- p) = 20(1 - 0.12) = 17.6 \quad \textit{ which is } \geqslant 10 \qquad. \\ . \\ . \qquad \bullet \quad 10n = 10 \times 20 = 200 \quad \textit{ which is } \leqslant 1500 math

Hence n = 20 was __not__ sufficiently large according to this set of rules.

To find how large to make the sample, we need n such that np __>__ 10

math . \qquad n \times 0.12 \geqslant 10 \qquad. \\ . \\ . \qquad n \geqslant 83.3 math

Rounding 83.3 __up__ to the next integer gives n = 84

Check n = 84 against all 3 rules:

math . \qquad \bullet \quad np = 84 \times 0.12 = 10.08 \quad \rightarrow \quad 10.08 \geqslant 10 \qquad. \\ . \\ . \qquad \bullet \quad n(1 - p) = 84(1 - 0.12) = 73.92 \quad \rightarrow \quad 73.92 \geqslant 10 \qquad. \\ . \\ . \qquad \bullet \quad 10n = 10 \times 84 = 840 \quad \rightarrow \quad 840 \leqslant 1500 math

n = 84 meets all 3 rules, hence n = 84 is a sufficiently large sample


 * Theoretical Distribution of Sample Proportion **

When we know the population size (N) and the population proportion (p) we can perform the following calculations.

The total number of ways a sample of n can be selected from a population of N is given by N C n.

If the population proportion is p, then the number of successes in the population is Np and the number of fails in the population is N(1 – p)

If x is the number of successes in a sample of size n, there will be (n – x) fails.

Therefore, the total number of ways we can get
 * x successes out of Np possible
 * __and__ ... (n – x) fails out of N(1 – p) possible

is given by: ... Np C x × N(1-p) C n-x


 * Example 2 **

... ... A large tub contains 20 pieces of fruit of which 6 are apples. ... .. . If we consider selecting an apple as a success then p = 0.3

... ... Let X = number of apples in one sample.

... ... If we take a number of random samples where n = 5,

... ... a) construct a table of the possible number of samples for each value of X = x, together with the relative frequencies. ... ... b) construct a table of the sampling distribution ... ... c) calculate the Theoretical Expected Value and Standard Deviation for the sample proportion.

__**Solution**__

... ... The total number of possible samples is 20 C 5 = 15504

... ... If n = 5, in each sample the number of apples we could get is 0, 1, 2, 3, 4 or 5.

... ...

... ... The ** sampling distribution ** is the probability distribution for the sample proportion. ... ... Notice that the relative frequency from the above table becomes the Probability

... ...

math . \qquad \text{E} \big( \hat{P} \big) = p = 0.3 \\. \\ . \qquad \text{Var} \big( \hat{P} \big) = \sqrt{ \dfrac{ p(1-p) }{n} } = 0.2049 \qquad. math

If we take enough large samples from a population, the distribution of the sample proportion will approximate a normal distribution. For example, the histogram on the right was produced using 1000 samples of size 100 using a random number generator.
 * Approximation to Normal Distribution **

In the next section, Confidence Intervals, we will treat the sample proportion as a normal distribution

.

.