# Trying to analyze some data

1. Apr 1, 2009

### serdayne

I have a program for which I am trying to analyze its performance. My adviser recommended that I find the probability distribution for the data I have. However, I am not quite sure how to do this.

The data is something like:

Code (Text):

[U]Time[/U]      [U]Size[/U]
2.10 ms     2
2.30 ms     2
2.90 ms     3
3.10 ms     2
3.30 ms     4
4.10 ms     4
4.30 ms     4
5.30 ms     5
5.50 ms     6

..etc

He suggested I find the probability distribution between the times and the size. I am not really sure what he means by this.

What I've tried: I found the average and the standard deviation of the times. I then, in Excel, used the function:

Code (Text):
NORMDIST(Time[x], avg, std dev, true)
Where x is a Time on the above list. I do this for every single time.

I then plot the distributions (on the Y) vs. the Times (on the X). With this, I get a plot that resembles the one I've attached to this post.

The question is: is the plot of distributions for each value vs. Time a meaningful plot?

Also, I do not have Size factored in. What plot would allow me to compare Time vs. Size?

Thank you.

File size:
20.2 KB
Views:
38
2. Apr 1, 2009

### martix

Welcome to PF :P

My interpretation is that he wants you to tell him what is the chance that a given time belongs to a given size.

3. Apr 1, 2009

### serdayne

Thanks!

However, how can I show that? The above chart does not factor in Size at all. It is NORMDIST(TIME) vs. TIME.

Thanks.

4. Apr 1, 2009

### serdayne

Also, I should mention, that for me, the best way to interpret this data would be to figure out the average Time and measure that against the size. That way, for every sample that is Size 2 I'd have an Average Time, for Size 3 and Average Time, etc.

I would plot the Average Time for Each Size vs. Each Size. That way I'd know how long, on average, each Size sample took.

I'm not sure where the probability comes into play here.

5. Apr 2, 2009

### martix

Yes indeed.
Since size is a discrete variable and assuming a normal distribution, you can do a plot of the probability distribution for each size and then combine these to form the average probability of a given time offset from the average of the time for a particular size belonging to that size. In other words on Y you have average prob, and on X in the center you have average time for given size(with different sizes you put different average times there).
It may not be the most accurate approach, but it does condense all the information you have in on 2-axis plot.

Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook