Random selection, with skewed distribution

  1. Hi there,

    (to mod: not sure where to post this, please move if I've got it wrong)

    I have a grid of values with 41x161 nodes describing some parameter space. Each node has an associated value, λ, which represents the uncertainty of the parameter choice at that node.

    I want to make/find an algorithm that samples my grid randomly, because I want to explore the parameter space. However I want the algorithm to sample the 'best' space more often, i.e. the nodes with low λ. Therefore I am looking for an algorithm that is random, but also is more likely to select the best nodes.

    I have two ideas:
    1) Create an array with all nodes, but over represent the best nodes (i.e. put them into the array multiple times). The amount of times a node is put into the array is a function of its 'goodness'. Then randomly sample this array. Good nodes are more likely to be sample simply because they occur most often.

    2) Randomly sample a node and then randomly decide whether or not to use that node based on a 'gate keeper' that has a randomly assigned threshold tolerance. Nodes that are good are more likely to pass whether or not the gate keeper has a high or low tolerance, whereas bad nodes will only pass if the gate keeper is feeling generous.

    Are there any well established, good ways to do this. Does what I'm doing even make sense?
     
  2. jcsd
  3. Re: random algorithm, with

    Hey billiards.

    If you want to sample low nodes more frequently maybe you could use your sampler as the inverse of the PDF of your grid: this will sample the low nodes more frequently and the upper nodes less so (i.e. in terms of lambda).

    You can use a technique in the Markov-Chain Monte-Carlo family of techniques to simulate arbitrary distributions: so what you would do is use your distribution (i.e. use the inverse of PDF and then normalize it so it's valid even if this is a bin distribution and it's being adjusted) and then use something like Metropolis-Hastings to take say a regular uniform distribution pseudo-random generator and turn it into a custom pseudo-random generator for your PDF of lambda values.

    If you have any distribution that corresponds to your sampling distribution, then you can use the MCMC techniques to do that.
     
  4. Re: random algorithm, with

    Sounds good. But could you break that down a bit for me please. I'm not a mathematician, my maths background is not strong :)

    Thanks

    TO MOD: How can I change the title of my thread?
     
    Last edited: Aug 15, 2012
  5. Re: random algorithm, with

    The basic idea is that you have some distribution you want to sample from call it f(x) for the moment. This will reflect your sampling distribution which you will do your random sample from.

    You then use the techniques of MCMC to basically construct a pseudo-random generator for f(x).

    Here is a wiki link to one technique: Accept-Reject

    http://en.wikipedia.org/wiki/Rejection_sampling

    The algorithm is specified below and as long as you have a pseudo-random generator for the uniform distribution, and as long as your pdf meets the requirements, you can construct a procedure to simulate your f(x). The algorithm is only a couple of lines so it's not too bad.

    There are other methods like this one (Metropolis-Hastings):

    http://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm

    They have their own issues but the best way to find out is to do it empirically: you should implement the code, generate ten thousands samples (or as many as desired) and then plot the histogram.

    I'd take a look at the Accept-Reject method first and test it out. If you're using a bin distribution then you could say calculate 1/lambda(x,y) where lambda(x,y) is the discrete distribution and your maximum value for the corresponding M will be where this is maximized.
     
  6. Stephen Tashi

    Stephen Tashi 4,330
    Science Advisor
    2014 Award

    Re: random algorithm, with

    What do you mean by "uncertainty"?

    What does "sample" mean in this context? You haven't made it clear whether the parameter "space" is a grid of possible values for some single parameter or whether each node in the grid represents a different parameter. You haven't made it clear whether "sample" means to evaluate a deterministic function of the current set of parameters or whether it means to make a physical measurement or simulate a realization of a random variable.
     
  7. Re: random algorithm, with

    Would it help if I described the physical problem I am trying to solve? I am a seismologist, I am looking at shear wave splitting -- which is analagous to birefringence in light waves -- the phenemenon whereby a transversely polarised signal splits into two (fast and slow waves) as it passes through an anisotropic medium. Now what I have done is to measure the "splitting parameters" of the medium. These parameters are the angle through which the signal components on your recording device must be rotated such that the waveforms are similar to each component (fast wave on one component, slow wave on the other), and the total delay time between the two waves. The parameter space (my grid) has 181 nodes in one dimension, from -90 to 90 one at every degree, and 41 nodes in the other direction at every 0.1 seconds of delay time. For each node I use the parameters to try to "undo" the splitting on the signal, and then measure how well this works. I end up with a grid of values over the whole parameter space, I then perform an F-test to contour the grid, and normalise so that a value of 1 corresponds to the 95% confidence interval and larger numbers correspond to lesser confidence intervals. This grid is then a measure of the anisotropy (or at least birefringence) of the medium through which the ray has passed, whereby each point in the parameter space has some "uncertainty" value attached. This grid is then used as the input for the next problem, which is the problem that I am trying to work out:

    I need to use this measure of anisotropy as an input to another problem. I want to input the whole grid, but it is too computationally expensive, and so I have use a subset of the grid. I've tried various ways of generating subsets, but there are always drawbacks, so my latest idea is to take a random subset. The drawback of that is that I need to make sure that my "best" parameters are part of that subset. Which is why I need to build in some kind of skew to get my random samples to better represent the parameters with the least uncertainty.

    It is the predetermined "suitability" of a parameter to describe a physical reality. A value of 1 corresponds to the 95% confidence interval, a value less than 1 falls inside 95% confidence, a value of greater than one is less confident.


    Each node in the grid represents a different parameter by its co-ordinates. The value assigned to that node is the "uncertainty".

    Sample in this case means to pick a node. I want to pick nodes at random, but preferentially pick the nodes with the least uncertainty.
     
  8. Stephen Tashi

    Stephen Tashi 4,330
    Science Advisor
    2014 Award

    Re: random algorithm, with

    I gather this means that for each node, you use the 2 parameters that define the coordinates of the node to do something. Do you use them in a calculation? Does this calculation produce a single number? Do you compare this number to the results of an experiment? - or do you compare it to the results of deterministic model? - or a simulation?

    What do you mean by "contour" the grid. Are you fitting a curve to a function that is defined at each node of the grid?

    Based on experience in answering questions in this forum (and not particularly on your own post), many people use the term "confidence interval" without understanding its technical definition. In particular, many people mistake "confidence" as a synonym for "probability". So they think that saying "a 95% confidence interval for the mean is [1.38, 2.69]" means that there is a 95% chance the mean is in that interval. (i.e. they mistake "confidence interval" for a "credible interval").

    In statistics, "tests" are associated with significance levels, not with confidence intervals. Perhaps you are using the F statistic as an estimator of something rather than a test for something.

    If it is complicated to explain exactly what you are doing, perhaps you can provide a link to an article that does the same thing. You are asking a question that apparently involves probability and statistics, but (to me) you have not clearly stated a mathematical problem. You haven't explained why there is any probabiliIty involved. Is it from "measurement error" in a repeated experiment?

    Why does the "the grid" become a measure of something? To say the grid is a "parameter space" suggests that only one node on the grid gives the appropriate parameters for a given situation. Why would results at all the other nodes matter?
     
  9. Re: random algorithm, with

    Yes, I use the coordinates to process the signal. If the parameters successfully remove the splitting then the particle motion will be linear. I then quantify the linearity of the particle motion using the second eigenvalue of the covariance matrix of the two waveforms. If the second eigenvalue is small then the particle motion is linear, and the parameters have worked. I insert the value of the second eigenvalue into the grid.


    I don't know how this bit works in detail.

    This paper describes the method:
    http://advseismo.asu.edu/readings/1991_Journal of Geophysical Research_Silver.pdf

    Read the section on 'Measuring shear wave splitting' to understand the problem, 'Error estimation', and in the appendix 'Calculation of confidence region', will give you more details on the use of the F-test and how this is used to contour the measurements.

    It is important to note that I do not have aproblem with any of the above. The problem comes in the extension to the method that I am currently working on.

    The grid essentially contains the information for how well each parameter did in correcting for the shear wave splitting. Only one node on the grid is, I suppose, the correct and absolute answer. But in actual fact there will be errors and uncertainties in the fidelity of the 'best' parameter, associated with the width of the confidence interval around that node. Previously people have just used the 'best' parameter and carried on, not stopping to think that actually the parameters might be wrong. I want to use more information, to incorporate the uncertainty in the measurement into the next step.
     
  10. Re: random algorithm, with

    I noticed you mentioned this statement:

    This statement is a little confusing: a confidence interval refers to an interval whereby the integral or summation of probability values is equal to the associated probability.

    I'm guessing the problem you have is one of random sampling, but it is a little troubling to see the statement above because this is not how confidence intervals work: they are simply relating probabilities of a simple region to the interval that describes the region.

    The other thing I want to ask is how you want to weigh your probabilities at each node in the grid.

    The simplest transformation is the inverse of the probabilities and then normalizing to get a PDF to be used as a sampling distribution, but I have a feeling that you have extra conditions and subtleties that we are missing in the context of your application.

    With regards to sampling, it's best if you lay out all the conditions for the actual sampling distribution and also for the sampling procedure: your problem is obviously not some quick textbook problem and you can't afford to make major screwups.

    Please post your comments with regard to firstly what specific properties you want the sampling distribution to have (you said inverse of the weights, but should this inverse be transformed in a certain way? Does it depend on the position of the node? Does it depend on the relationship between nodes?) and also how you are going to sample? (Is it a truly independent random sample or do you other things that make it a non-independent process)?
     
  11. Re: random algorithm, with

    I'm thinking this might have got a little bogged down.

    Let me rephrase the problem.

    Imagine a string of numbers:

    50,2,5,1,7,35,124,34,13,67,234,......

    Now I want to randomly pick a subset of these numbers. Easy enough.

    The twist, I want my subset to be contain more of the lower numbers, and less of the higher numbers.
     
  12. Re: random algorithm, with

    I have an algorithm to do this now. What it does is it picks five (or six, or seven -- I haven't decided yet) random numbers, and then keeps the lowest one. Then repeats until I have my subset.

    What do you think?
     
  13. Re: random algorithm, with

    Can you post the algorithm so we can give some feedback?

    You can simply create the PDF to shift the probabilities where you want them. Order statistics and distributions deal with minimum and maximums if you are interested.
     
  14. Re: random algorithm, with

    OK right. So the sum of all the probabilities inside the 95% confidence interval add up to 95%?

    The values assigned to each node can be thought of as how well the parameter does the job, where low numbers actually mean that the parameter (node) does a better job than high numbers. The F-distibution is calculated from the number of degrees of freedom in the signal. And those values are normalised such that nodes lying on the 95% confidence interval have a value of 1.

    There are two end-member samplings:
    1) sample one and only one node, and make that your "best" node.
    2) sample absolutely every node and wait about a year for your calculations to run (no joke).

    What I want is something in between. To sample a subset of the nodes, but to make that subset focussed around the best nodes. I've tried just using nodes inside the 95% confidence region, but the drawback is that sometimes the 95% confidence region is huge, and sometimes it's tiny. So I could maybe take the 30 best nodes, but it is likely that those nodes would all cluster together, and in the case where I have a large 95% confidence region I want to explore the parameter space a bit more. That is why I like the random approach, because it seems to get around some of the drawbacks with with the other approaches I've tried, but has the major drawback that it won't focus on my "best" parameters -- that is, unless I can build in some function that skews the random sampling towards my "best" nodes.

    I want less samples of "bad" nodes and more samples of "good" nodes. Whether the distribution is linear, log-normal, or some other fancy thing, doesn't really bother me too much at this stage. As long as it has the property of more "good" and less "bad".

    I want the algorithm not only to find "good" nodes that are all clustered together, if there are good nodes elsewhere I want the algorithm to be able to find them.
     
  15. Re: random algorithm, with

    Sure. The code seems to be working and returning sensible results in tests that I have done today. I could post plots if you're interested.

    Code (Text):

    subroutine random_reject_sample(lam2slice,no_samps,nodes)

    implicit none

    ! Subroutine to randomly sample points in a lam2 slice but preferentially
    ! to keep those samples with the lowest lam2 values.

    ! IN
    ! lam2slice       surface of nodes to randomly sample from
    ! np1             number of points in dt dimension
    ! np2             number of points in phi dimension
    ! no_samps        number of nodes to randomly sample

    ! OUT
    ! nodes           randomly sampled nodes

    !remove these for testing
    ! use array_sizes ! use the array_sizes modules
    ! Add these for testing
    integer :: np1,np2

    integer,parameter :: seed = 987654321
    !real :: lam2slice(np1,np2),lambda2min
    real :: lam2slice(181,41),lambda2min ! line changed for testing
    real :: picknode(3),value
    integer :: no_samps,nodecount
    integer :: nodes(1500,2) ! 1500 = more than enough rows (really only need #no_samps, one for each pick) and 2 columns (x and y coords)
    integer :: randomsubsetsize = 5 ! number of nodes to select in a subset, the 'best' node in each subset is kept.
    integer :: dupe_check ! used to check if a node has already been picked, to avoid sampling the same node more than once.
    integer :: ii,jj,kk ! used for looping
    integer :: ifast,itlag,x1,y1


    np1=181
    np2=41


    dupe_check=0
     
    call srand(seed)
       
      ! Take the 'best node' to ensure it's sampled
      call zerror_min(lam2slice,np1,np2,ifast,itlag,lambda2min)
      nodecount=1
      nodes(nodecount,1)=ifast
      nodes(nodecount,2)=itlag
     
        ! populate the nodes list
        do ii=2,no_samps
         
          nodecount = ii
    !     print *,'picking node:',nodecount
         
          do ! loop until we have randomly selected a node
         
            if (dupe_check .eq. 1) exit
                   
              ! sample #randomsubsetsize nodes, and keep the node with the lowest value (unless we already have it).
             
              picknode(:)=99999.99999 ! Reset picknode, important, must be a big number
             
              do jj=1,randomsubsetsize
                x1=nint(real(np1-1) * rand() ) + 1 ! generates a random integer between 1 and np1
                y1=nint(real(np2-1) * rand() ) + 1 ! generates a random integer between 1 and np2
                value=lam2slice(x1,y1)
    !           print *,'randomsubset pick',jj,':',x1,y1,value
                ! if the value is the smallest yet encountered, then keep it
                if ( value .lt. picknode(3) ) then
                  picknode(1)=real(x1)
                  picknode(2)=real(y1)
                  picknode(3)=value
                endif
               
              enddo ! end picking (best of subset) random node
                               
              ! scan through the nodes list to check for duplication
              do kk = 1,nodecount-1
                dupe_check = 1
                if ( (picknode(1) .eq. nodes(ii,1)) .and. (picknode(2) .eq. nodes(ii,2)) ) then
                  dupe_check = 0
    !             print *, 'duplicate node picked, trying again'
                  exit
                endif
              enddo ! end checking for duplication
             
              ! testing  -- enable to get stats on the values of the picks
    !         print *, 'node picked', picknode
             
              ! if we reach here the node should be ok so we can add it to our node list
              nodes(nodecount,1)=nint(picknode(1))
              nodes(nodecount,2)=nint(picknode(2))
           
           enddo ! end randomly selecting a node
           dupe_check=0
                                       
        enddo ! end populating the nodes list

    return
    end
     
     
  16. micromass

    micromass 18,678
    Staff Emeritus
    Science Advisor
    Education Advisor

    Re: random algorithm, with

    Please report the original post and say what the new title should be. A mentor will then change it for you.
     
  17. Stephen Tashi

    Stephen Tashi 4,330
    Science Advisor
    2014 Award

    Re: random algorithm, with

    Returning to that important question, what you are doing is trying to produce a solution without stating any precise goal or problem. So what you are doing doesn't make sense yet.

    As I see it, your data is a set of N vectors. Each vector is of the form [itex] (\phi_i, \delta^2_i, \lambda_i) [/itex]. The values [itex] \phi, \delta^2 [/itex] are estimates of some physical parameters. The value [itex] \lambda_i [/itex] can be used (somehow!) to define an ellipse around the point [itex] (\phi_i, \delta^2_i) [/itex] that is a 95% "confidence interval". (The common misinterpretation of that interval would be that there a 0.95 probability of the true values of the parameters being in that ellipse. If you take a Bayesian point of view, you can make this interpretation valid. Otherwise, watch your step.)

    It isn't clear to me whether the collection of data that you have is intended to estimate one correct pair of values [itex] (\phi,\delta^2) [/itex] that applies over the whole earth or whether the analysis you are doing assumes that each estimate is estimating a different true pair of values at different locations on the earth.

    For some unstated reason, you want to take samples from you data and do something with them. I think want to take more samples of the data points with the smaller [itex] \lambda_i [/itex]. You might also want to sample points from inside the ellipse around each [itex] (\phi_i,\delta^2_i ) [/itex] in which case you woudl be sampling points not actually in the original set of estimates.

    How you should do this is not clear until you define precisely what you are trying to accomplish. If you are randomly sampling the data, it seems that you must be doing a probabilistic simulation of something. What are you simulating? Are you estimating the statistical variability of some function of the data? Perhaps you are merely trying to estimate one true value [itex] (\phi, \delta^2) [/itex] by combining all the estimates.
     
  18. Re: random algorithm, with

    OK, so the problem is to measure the anisotropy of a layer in the presence of surrounding anisotropic layers. To fully understand it one needs to (a) fully understand the one layer problem (see the SIlver and Chan paper I linked to before), and (b) understand ray geometry in seismology and how this can be manipulated to measure multiple layers of anisotropy.

    Now I don't expect anybody to follow my particular physical application. In fact I don't have a problem with that bit -- I've already solved it. Now I'm looking for a slightly better way to handle one of the intermediate steps. The problem I have is with the mathematics. What I was hoping for by coming here, was help with the maths. The maths problem I have stated is basically this:

    Given a set of numbers e.g. 41,41,32,33,24,23,22,21,12,14,24 etc.... etc...

    Come up with an algorithm that picks out a random subset of those numbers, where the subset is biased towards the low numbers.

    Now I have an algorithm, but maybe someone has a better one? Surely someone has solved this kind of problem in the past!?

    It is a measure of anisotropy along the ray path. The parameters only apply for the ray in question.

    The data are the seismic waveforms. The parameters ∅ and δt define the co-ordinate of a parameter space. A grid search is performed at nodes in parameter space -- how well the parameters "fix" the data is recorded at the node λ. The grid of all these number defines an "error surface" over the parameter space. The parameters describe the anisotropy of a single layer. Now, I want to use this error surface as the input to measure the anisotropy on rays that go through two anisotropic layer, and to find the anisotropy of the second layer. I can do this because of the peculiarities of the ray geometry. I use the parameters in the high confidence region, to correct the signal for the anisotropy of the one layer, so that I can resolve the anisotropy in the other layer. What I want to do is to try a suite of corrections for the first layer, and to see how this affects my measurement of the second layer. I could try every point in parameter space, but the calculations would take forever to run. I want to pick a subset of points in parameter space at random, but preferentially pick those points that are known to be more likely, i.e. those in the high confidence regions.

    This statement makes me suspect that you are not visualising the problem correctly. Any confidence region (not necessarily an ellipse) is defined entirely within the [itex] (\phi,\delta t) [/itex] space.

    The anisotropy of a layer.

    I am trying to propagate the uncertainty in the measurement of a single layer of anisotropy, into the measurement of a second layer, wherein making the estimate of the second layer I have used the measurement of the first layer as an input to 'correct' the signal.

    Kind of. By combining lots of measurements, in a weighted way, I account for the variation that occurs if the correction to the signal is altered in a way that is reasonably constrained by the data.
     
  19. Stephen Tashi

    Stephen Tashi 4,330
    Science Advisor
    2014 Award

    Re: random algorithm, with

    You haven't stated any criteria that can be used to judge whether one algorithm is better than another. Of course people have solved problems that this when they had well defined objectives. State what criteria makes one algorithm better than another and perhaps we can find the best method.



    That could be true, but you haven't stated any well posed mathematical problem.
     
  20. I've taken a look at the code and there is absolutely no reason why you can't just define an appropriate multi-dimensional PDF that corresponds to your desired properties and then just use an MCMC type algorithm to draw independent random samples from that distribution.
     
  21. Well there is one. I wouldn't know how to begin setting the thing up :(
     
Know someone interested in this topic? Share a link to this question via email, Google+, Twitter, or Facebook

Have something to add?