STAT 200 EXAMPLE FINAL EXAM

Answers

Note: In this example exam, there are no questions concerning a matched pair t-test or confidence interval nor a one sample test of proportions. You should study these methods as well.

1. A dietitian wants to compare the average calories in two kinds of loaves of bread, plain white and whole wheat. A simple random sample of 20 loaves of each type of bread from a local bakery results in the following:

	Plain White	Whole Wheat
sample mean	1800	1789
sample standard deviation	10	20
sample size	20	20

Assuming that the two population standard deviations are not equal, conduct a hypothesis test to determine if the average calories in the two types of bread are different. State the hypotheses, carry out the test, give the degrees of freedom, use a table to bound the P-value, and state your conclusion in terms of this particular problem using a significance level of 5%. (10 points)

H₀: _mu₁ = mu₂__ H_a: _mu₁ not = mu₂__

Test Statistic: t = (1800 - 1789)/sqrt[(100/20) + (400/20)] = 2.20

d.f. = _19___ __.04___ < P-value < _.05__ (two-sided test)

Conclusion in terms of this particular experiment:

There is good evidence (p-value < .05) that the caloric content of these two breads differ. In particular, whole wheat bread has fewer calories than white bread. Two sample t-test - see section 7.2 of text. An alternative question could ask for a confidence interval for the difference in means for a given data set.

2. What are p-values and how are they used in hypothesis testing? (5 points)

A p-value is the probability of observing a value of the test statistic as extreme or more extreme than the observed value if the null hypothesis is true. P-values are used to assess the evidence favoring the alternative hypothesis. In particular, if the p-value is small, say less than .05, then the observed data is unlikely to have occurred if the null hypothesis is true, therefore, we would conclude that the null hypothesis is false. See pages 457-458 of the text.

3. If we say a statistical inference procedure is robust, what does this mean? (5 points)

A statistical inference procedure is robust if the probability calculations required are insensitive to violations of the assumptions made. See page 516 in the text.

4. A recent newspaper article reported that 78% of people surveyed were opposed to televising trials. The article indicated that the study had a 3% margin of error. Assuming that this survey, like most such surveys reported, used a 95% confidence level, determine how many people were surveyed. Show your work. (6 points)

Answer _733___

m = z*sqrt[p hat ( 1- p hat)/ n] implies .03 = 1.96sqrt[.78(1-.78)/n] solve for n
or
n = (z*/m)² p hat (1 - p hat) = (1.96/.03)² (.78) (1 - .78) = 732.5 so use n = 733

5. Most of our hypothesis tests and confidence interval procedures require that randomization in sampling or experimentation has occurred. Why? (5 points)

Randomization reduces the risk of bias due to lurking variables and allows us to make probability statements (confidence and p-values). (See page 479 in text)

6. A manufacturer tests three different displays for selling a certain product by setting up a display in twelve different stores with similar overall sales so that each display type occurs in 4 stores. The number of units sold for one month is recorded for each of the 12 stores used. The manufacturer calculates the analysis of variance F test statistic and finds F = 4.63.

a. Give the degrees of freedom for this test and bracket the p-value for this test and interpret the result (in terms of this problem) using a 5% significance level. (6 points)

    Answer     d.f. are _2__ and _9__       _.025__< p-value < _.05__

    Conclusion: The average number of units sold differs significantly
    (p-value < .05) for the three display types. Another type of analysis of variance question would ask you to complete a ANOVA table from raw data. See Chapter 12.

b. What assumptions must hold for this test to be valid?   (5 points)

The number of units sold using each display type must be independent, normally distributed, and have the same variances.

7. Explain the difference between the terms standard error and a sample standard deviation. In answering this question, distinguish between these two terms using a specific statistic, e.g., the sample mean.
(6 points)

A sample standard deviation measures the variability in a collection of individual observations. A standard error estimates the variability in possible values of a sample statistic. For example, the standard error of the sample mean is an estimate of how variable sample mean values are.

8. Assuming that the population standard deviations are equal, a test of
H₀: µ₁ = µ₂ vs Ha: µ₁ > µ₂ with independent samples of sizes of n₁ = 10 and n₂ = 18 is completed and the calculated test statistic is found to be t = 2.101. Give the degrees of freedom and bracket the p-value for this test. (6 points)

Answer d.f. = _26__ __.020__ < p-value < _.025__
See pages 550 - 551 in text. An alternative type of question here would ask for a confidence interval for the mean difference for a given data set under the condition that the population standard deviations are equal.

9. Researchers speculate that the proportion of left-handers has grown with time because there is less tendency to force children into right-handedness. A sample of 100 individuals of three different age classes is taken to test this hypothesis. Use the results below to answer the following questions:

	Age Group
Handedness	under 21	21 - 40	41 or above
Left	20	10	3
Right	80	90	97

a. How many left-handers would we expect to observe in our sample of 100 individuals under twenty-one years of age, if the proportion of left-handers is the same for all age groups? (5 points)

Answer __11___ (row total x column total) / grand total = (33x100)/300 = 11

b. A test of the hypothesis that the proportion of left-handedness is the same for all age groups is conducted and a test statistic of chi squared = 14.91 is found. Give the degrees of freedom and p-value for this test statistic and state your conclusion in terms of this problem using a 1% significance level. (6 points)

Answer d.f. = _2__ __.0005__< p-value < __.001__

Conclusion in terms of this problem:

The proportion of left-handers is significantly (p-value < .001) different among individuals of these three age groups. See Chapter 9

c. Give a 95% confidence interval for the proportion of 21-40 year olds that are left handed. You need not carry out the calculations, simply put all the numbers in the correct places. (6 points)

p hat plus or minus z* sqrt[ p hat (1 - p hat) / n]
here p hat = 10/100 = .1, n = 100, z* = 1.96 so the confidence interval is
.1 plus or minus 1.96 sqrt [ .1 (1-.1)/100]

10. Suppose you have just completed a hypothesis test and rejected the null hypothesis. Which of the following statements is correct? Write the letter of the correct statement in the space provided.
(4 points)

a. You may have committed a type I error. Answer __a___
b. You may have committed a type II error. See section 6.4
c. Both a and b are possible.
d. No error is possible.

11. Define the power of a test and describe at least two methods to increase the power of a test.

Power = probability of rejecting the null hypothesis if it is false. The power of a test can be increased by a) increasing the sample size, b) increasing the significance level, c) decreasing the population standard deviation, sigma, through improved experimental control or measurement or working with a more restricted population or sub population, and d) using an alternative value of the parameter, say mu, which is further from the null hypothesis. See section 6.4 in the text; especially pages 484 and 486.

12. Black spruce trees grow slowly in Alaska. Suppose they grow an average of 32 mm in height per year. An experiment is conducted using a sample of 17 black spruce trees under experimental conditions to determine if the rate of growth can be increased. The annual growth in millimeters for the experimental trees are as summarized in the stem plot below:

0| 1
1| 9
2| 467
3| 34799
4| 0123468

a. Give a 5 number summary for the growth of the 17 experimental trees and identify any outliers. (5 points)

Min = 1 Q₁ = 26.5 Median = 39.0 Q₃ = 42.5 Max = 48
IQR = 42.5 - 26.5 = 16 1.5 x IQR = 24
Q₁ - 1.5 x IQR = 2.5 so 1 is a low outlier
Q₃ + 1.5 x IQR = 66.6 so there are no high outliers

b. Is it appropriate to use a one sample t procedure for testing H₀: µ = 32 here? Why or why not? (Hint: discuss sample size, normality, etc.) (5 points)

The sample distribution has an outlier and is clearly non-normal (skewed left). With this sample size, n = 17, and the outlier, the t-test would not be appropriate. See page 516 of text.

13. A breeder of horses wants to determine the relationship between x = the gestation period and
y = the length of life of a horse. The breeder collects this information for seven horses. The data are as follows:

Horse	1	2	3	4	5	6	7
y = Life length in years	24	25.5	20	21.5	22	23.5	21
x = Gestation period in days	416	279	298	307	356	403	265

For these data the fitted least squares regression line is = 18.89 + .0109x with estimated standard deviation s = 1.971 and the sum of squared deviations for x is 21752.

a. Is there a significant linear relationship between length of life and gestation period? Justify your answer and use a significance level of 5%. (10 points)

H₀: __Beta₁ = 0________ versus H_a: ___Beta₁ not = 0________

Calculate the value of your test statistic: Show your work.

SE_b1 = s/sqrt(sum of squared deviations of x) = 1.971/sqrt(21752) = .01336

t = b₁ / SE_b1 = .0109 / .01336 = 0.816

d.f. = _5__ _.40_ < p-value < _.50__ (two-sided hypothesis)

Conclusion in terms of this specific problem.

There is no significant (p-value > .40) linear relationship between gestation period and life length for this horse population.

b. Give the sample mean and sample standard deviation of y = life length. (5 points)

The sample mean is 22.5 and the sample standard deviation is 1.915

Go to Dana' s Index Page Go to UAF Home Page