Value of increasing or decreasing of the set

  1. Simple statistical question.

    Lets say that we have set of values:
    45
    40
    30
    36
    11
    10
    75
    102
    113
    125
    137
    140
    149

    you can see that it has a good tendency to increase (even if we have decrease in the middle). But in set:
    68
    33
    21
    31
    7
    7
    13
    123
    21
    33
    67
    64
    29
    9
    5
    87
    15
    20
    5
    9

    we have all mixed up. I am looking for nice method to determine is set is increasing or decresing and how faster.

    Thanks
     
  2. jcsd
  3. Sorry, I have posted it to a wrong thread. Moderators, please, fell free to move it to statistics.
    Thank you.
     
  4. Hurkyl

    Hurkyl 16,090
    Staff Emeritus
    Science Advisor
    Gold Member

    The first thing that comes to mind is counting inversions: that is, pairs of indices (i, j) where i < j, but the i-th element is larger than the j-th element.

    To be honest, I wouldn't really say the first set has much of a tendancy to decrease -- it actually looks like two disconnected pieces: a small decreasing portion and a large increasing portion!
     
  5. About first set: if we will look to this as a whole, then it will increase from 45 to 149. I want to know is it really incresing that way (derive sort of a coeff of that increase/decrease). One solution could be to take first element and the last and compare them, but if set would be:
    15
    20
    30
    36
    11
    10
    75
    102
    123
    145
    157
    199
    80

    we see overall increase and fast decrease at the end (only 2 values vs 12), but I would say that set is increasing.
     
  6. Hurkyl

    Hurkyl 16,090
    Staff Emeritus
    Science Advisor
    Gold Member

    The reason I mentioned inversions is that I've seen them used as a measure of "near sorting" -- that is, to tell how good of a job an approximate sorting algorithm has done.

    It sounds like your problem is close to that: you want to tell how close to sorted (i.e. increasing) your data is. I'm not sure if you want to ascribe any relevance to the actual magnitudes of the numbers, or if their ranking is all that matters.
     
  7. 0rthodontist

    0rthodontist 1,253
    Science Advisor

    Maybe you are looking for the correlation coefficient between the data and the indices of the data. try mathworld.wolfram.com for correlation
     
  8. Thank you for your support. The best thing that helps here is counting inversions. That way I can definetely say, is list in increasing order or not and calculate quality of that increasing: we have maximum number of inversions that could be (if list consist of unique items and sorted) and we have number of inversions in our case, then we can derive a percent.
     
  9. 0rthodontist

    0rthodontist 1,253
    Science Advisor

    Which method is better depends on what you are trying to do. What is the reason you want to find out if the ordered set is increasing or decreasing?
     
  10. EnumaElish

    EnumaElish 2,481
    Science Advisor
    Homework Helper

    Inversions is a "qualitative" measure, it will not tell you how much each inversion sets back the sequence. Your orig. post sounded like you need something like the average difference ("difference" being the discrete version of the derivative).
     
  11. Hurkyl

    Hurkyl 16,090
    Staff Emeritus
    Science Advisor
    Gold Member

    Well, the average distance works out to simply being the difference of the first and last numbers, divided roughly by the number of numbers. So, it's not all that useful. :smile: (Taken literally anyways)

    My first thought on tweaking this measure was to weight an inversion by how far apart the numbers were.


    But of course you're right, we can't say for sure what the best thing is until we know the application. (Of course we still might not be able to, but we'll have a better idea)


    This I don't understand though. I mean, I get that the actual numbers don't matter, just their relative ranking. But the sequence

    10 20 30 40 35 45 55 65

    only has one inversion (40:35) , whereas

    10 20 30 40 25 35 45 55

    has three inversions. (30:25, 40:25, 40:35)

    So we do get a measure of how far "back" you get taken.



    By the way Alteran, if you have a big data set, there's an O(n log n) algorithm for counting the number of inversions. I don't know it off hand, but it's somewhere in Knuth's Art of Computer Programming.
     
Know someone interested in this topic? Share a link to this question via email, Google+, Twitter, or Facebook