Simple statistical question. Lets say that we have set of values: 45 40 30 36 11 10 75 102 113 125 137 140 149 you can see that it has a good tendency to increase (even if we have decrease in the middle). But in set: 68 33 21 31 7 7 13 123 21 33 67 64 29 9 5 87 15 20 5 9 we have all mixed up. I am looking for nice method to determine is set is increasing or decresing and how faster. Thanks
Sorry, I have posted it to a wrong thread. Moderators, please, fell free to move it to statistics. Thank you.
The first thing that comes to mind is counting inversions: that is, pairs of indices (i, j) where i < j, but the i-th element is larger than the j-th element. To be honest, I wouldn't really say the first set has much of a tendancy to decrease -- it actually looks like two disconnected pieces: a small decreasing portion and a large increasing portion!
About first set: if we will look to this as a whole, then it will increase from 45 to 149. I want to know is it really incresing that way (derive sort of a coeff of that increase/decrease). One solution could be to take first element and the last and compare them, but if set would be: 15 20 30 36 11 10 75 102 123 145 157 199 80 we see overall increase and fast decrease at the end (only 2 values vs 12), but I would say that set is increasing.
The reason I mentioned inversions is that I've seen them used as a measure of "near sorting" -- that is, to tell how good of a job an approximate sorting algorithm has done. It sounds like your problem is close to that: you want to tell how close to sorted (i.e. increasing) your data is. I'm not sure if you want to ascribe any relevance to the actual magnitudes of the numbers, or if their ranking is all that matters.
Maybe you are looking for the correlation coefficient between the data and the indices of the data. try mathworld.wolfram.com for correlation
Thank you for your support. The best thing that helps here is counting inversions. That way I can definetely say, is list in increasing order or not and calculate quality of that increasing: we have maximum number of inversions that could be (if list consist of unique items and sorted) and we have number of inversions in our case, then we can derive a percent.
Which method is better depends on what you are trying to do. What is the reason you want to find out if the ordered set is increasing or decreasing?
Inversions is a "qualitative" measure, it will not tell you how much each inversion sets back the sequence. Your orig. post sounded like you need something like the average difference ("difference" being the discrete version of the derivative).
Well, the average distance works out to simply being the difference of the first and last numbers, divided roughly by the number of numbers. So, it's not all that useful. (Taken literally anyways) My first thought on tweaking this measure was to weight an inversion by how far apart the numbers were. But of course you're right, we can't say for sure what the best thing is until we know the application. (Of course we still might not be able to, but we'll have a better idea) This I don't understand though. I mean, I get that the actual numbers don't matter, just their relative ranking. But the sequence 10 20 30 40 35 45 55 65 only has one inversion (40:35) , whereas 10 20 30 40 25 35 45 55 has three inversions. (30:25, 40:25, 40:35) So we do get a measure of how far "back" you get taken. By the way Alteran, if you have a big data set, there's an O(n log n) algorithm for counting the number of inversions. I don't know it off hand, but it's somewhere in Knuth's Art of Computer Programming.