1. Once a distribution has been displayed by a stemplot or histogram, what important features should we attempt to describe? (8 points)
2. A store takes a sample of 17 days of the year and records the number of boxes of a certain type of cereal sold. Use the data given below to answer the following questions:
- The overall pattern or shape of the distribution; symmetric or left or right skewed. How many modes does the distribution have?
- The center and spread; if symmetric, the mean and standard deviation, if skewed or outliers, the median and quartiles (or interquartile range)
- Deviations from the overall shape; gaps or outliers
19, 19, 7, 16, 17, 18, 17, 19, 20, 17, 16, 18, 13, 26, 18, 17, 16a. Complete the following 5-number summary for this data by finding the median, M. (6 points)
Minimum = 7 Q1 = 16 M = 17 Q3 = 19 Maximum = 2613 and 26 are outliers using the 1.5*IQR rule; they are labeled with an *Answer: You must first order the values.
7, 13, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 26
The middle of these ordered values is 17 which is the median. The middle of the 8 values below the median is 16 which is Q1
b. Use your completed 5-number summary to construct a modified boxplot, label any outliers with an *. (6 points)
8 10 12 14 16 18 20 22 24 26
__|___|____|___|____|____|___|____|____|___|___------- * +-----| | |-+ * BOXPLOT -------
3. There are three basic principles of statistical design of experiments.
Name them. (6 points)
Answer control , and randomization, and
replication
4. Use the attached copy of TABLE A to determine the proportion of
observations from a standard normal distribution and shade in the area
under the curve that corresponds to the inequality below: (6 points)
1.23 < Z < 2.64 Answer .9959 - .8907 = .1052
Answer: Inferring conclusions about some wider population from data on selected individuals.
8. Scores on final exams for a certain large course are typically approximately normally distributed with a mean of 72 and a standard deviation of 12. The professor says that the top 10% will receive an A grade. At least what score must you get on the final to get an A? Drawing a picture may help. (8 points)
10% above is the same as 90% below. 90% of the standard normal
distribution lies below 1.28 so 90% of the scores lie below 72 + 1.28*12
= 87.36 Thus, scores above 87.36 will receive an A.
Answer = 87.36
9. The Electroencephalogram (EEG) is a device used to measure brain
waves. Neurologists have found that the peak EEG frequency in children
increases with age. You will study this by completing a simple linear regression
of y = peak EEG frequency measured in hertz on
x = age in years. The ages and EEG peaks of four children are given
here. Use this information to answer the questions below.
Child | 1 | 2 | 3 | 4 |
X = Age | 2 | 6 | 8 | 10 |
Y = EEG peak (hz) | 5 | 6 | 6 | 7 |
a. Find the mean and standard deviation of the four children's ages. (10 points)Here r = 0.9562, Sx = 3.4157, Sy = 0.8165, x bar = 6.5, ybar = 6.00Answer mean = 6.5 and s = 3.4157
b. Find the slope of the least squares regression line of y on x using this data. Show all your work. (8 points)
Answer b = 0.2286
10. Use the attached copy of Table B Random Digits to select a simple
random sample of size n = 2 from the letters A through Z (number the letters
01 through 26). Begin on line 120 of Table B and go across the 8 columns
of numbers on the left hand side of the page then continue to line 121
as necessary to take your sample. Underline the sample values on Table
B and record them below. Cross out any values in the table that you skip.
(6 points)
Answer the two sample letters are D and P
The first two two-digit random numbers less or equal 26 (A-Z) on line 120 are 16 & 04. These correspond to the letters D and P.11. Each person in a group of 300 students was identified as male or female and then asked whether he or she preferred taking courses in math, social science, or natural sciences. Use the table of results below to answer the following question:
Favorite Subject Area | |||
Gender | Math | Social Science | Natural Science |
|
|
|
|
|
|
|
|
Give (in percentages) the conditional distribution of Favorite Subject Area for female students. (10 points)Answer: The total number of females is 35 + 72 + 71 = 178, the conditional distribution of subject for females then is:
Math 35/178 = 19.7%,
Social Science 72/178 = 40.4%,
Natural Science 71/178 = 39.9%
Note these figures add to 100% because this is a distribution.
y hat = -2.2 + 2.3x with r = 0.92
a. One child in the study had a flexibility score of 3 and a creativity score of 5. Give the residual for this observation. (6 points)Answer: residual = y - yhat
Here the observed y is 5 for x=3 and yhat = -2.2 + 2.3(3) = 4.7
so the residual = 5 - 4.7 = 0.3
b. If we had the data that was used for fitting this regression, what else should we do to assess whether linear regression is appropriate or not. (6 points)Answer: Construct several plots to assess the appropriateness of the regression fit. In particular, plot the residuals versus x, plot the residuals versus the time order of data collection, and plot the residuals versus x labeling the values of any other variable to see if lurking variables exist. (other descriptions of plots possible here).