# Stats151 Question

1. Jan 14, 2007

1. The problem statement, all variables and given/known data

The following table gives the survival time in days of 72 guinea pigs after
they were injected with tubercle bacilli in a medical experiment. Make a
suitable graph and describe the shape, center and spread of the distribution
of survival times. Are there any outliers

TABLE: Survival times (days) of guinea pigs in a medical experiment
43 45 53 56 56 57 58 66 67 73
74 79 80 80 81 81 81 82 83 83
84 88 89 91 91 92 92 97 99 99
100 100 101 102 102 102 103 104 107 108
109 113 114 118 121 123 126 128 137 138
139 144 145 147 156 162 174 178 179 184
191 198 211 214 243 249 329 380 403 511
522 598

2. Relevant equations

I know there are two ways to graph this data, either a stemplot or a histogram. I'd just like to know here how to set up the stemplot or histogram in order to graph this data. I'm still trying to figure it out since I'm a little new to this yet. Thanks a lot.

3. The attempt at a solution

I realize you split each number in a stemplot into it's stem and leaves but it is very cumbersome to list all the values from 4 to 59, so is there some shorter way to do this such as:

0 | 4455555
0 | 66777
0 | 888888888889999999
1 | 00000000000111
1 | 2222333
1 | 4445
1 | 6777
1 | 899
2 | 1144
3 | 28
4 | 0
5 | 129

I'm just taking a stab at it haha.

2. Jan 14, 2007

### sara_87

you could try putting the data in groups for the histogram
for example taking the ferquency of data for the valuse between 40 and 60, then 60 and 80 etc.
after doing that you need to find the density to draw the histogram

(there may be other better ways, but that's what i would do)

3. Jan 14, 2007

Ahhh that's true as well. I never thought about that. Thanks a lot. :D

4. Jan 14, 2007

### sara_87

you're welcome, go through it and if you still need more assistance post and i'll try and help (if i'm still online and i know the answer)

5. Jan 14, 2007

Yeah I'm pretty sure I figured it out. I followed the steps for making a histogram, but used the class intervals from 0 to 100, 100 to 200, etc. etc. because 40 to 60, 60 to 80, etc would have taken forever. But the distribution is skewed to the right, the spread is from 43 to 598. And in this instance there are no outliers, at least not from what I can see, unless I'm wrong on that. I also said the center is between 102 and 103, thus it must be 102.5. Hopefully that's right haha.

6. Jan 15, 2007

### sara_87

(bit late but) yes it is skewed to the right, and the center is 102.5

7. Jan 15, 2007

Hahaha alrighty thanks a lot sara. :D You've been a big help. Otherwise I would have still been screwing around with the stemplot of the distribution.
I'm sorry but I have another question here I'm working on that is bugging me. It asks:

Conservationists have despaired over destruction of tropical rainforest by logging, clearing, and burning.” These words began a report on a statistical study of the effects of logging in Borneo. Researches compared forest plots that had never been logged (Group 1) with similar plots nearby that had been logged 1 year earlier (Group 2) and 8 years earlier (Group 3). All plots were 0.1 hectare in area. Here are the counts of trees for plots in each group:

Group 1: 27 22 29 21 19 33 16 20 24 27 28 19

Group 2: 12 12 15 9 20 18 17 14 14 2 17 19

Group 3: 18 4 22 15 18 19 22 12 12

Give a complete comparison of the three distributions, using both graphs and numerical summaries. To what extent has logging affected the count of trees? The researchers used an analysis based on x(the mean) and s. Explain why this is reasonably well justified.

Now I'm going to try working out this problem here and I'll let you know what I find, but would it be smart to try to organize the numbers from smallest to largest and then using the 5-number summary, make a boxplot of the three groups to compare them. From that finding the mean and standard deviation is a breeze. I'd just like to know if I'm on the right track here. Give me a few moments and I'll try taking a stab at it. :D

8. Jan 15, 2007

### sara_87

making a box plot is a good idea, if you notice that your data is skewed i would advise you to find the interquartile and the median as these are less affected by skewness than the S.D and the mean...i think you know what to do all you needed is a bit of confidence (kind of like myself sometimes)!

also for the histogram earlier, you didn't have to make all the widths of the bars 20 and you can skip some out e.g. 40-70, then 70-100, 100-130, 130-160, 160-200, 200-250, 320-600 or whatever i'm sure you get the idea

also i should of mentioned this earlier but i only just noticed, your stem and leaf plot is not correct

for these sets of data:

43 45 53 56 56 57 58 66 67 73
74 79 80 80 81 81 81 82 83 83
84 88 89 91 91 92 92 97 99 99

the stem and leaf plot should be like this:

4 | 3 5
5 | 3 6 6 7 8
6 | 6 7
7 | ...etc
8 |
9 |

do you get the idea?

9. Jan 15, 2007