# Random selection, with skewed distribution

by billiards
Tags: distribution, random, selection, skewed
 P: 747 Hi there, (to mod: not sure where to post this, please move if I've got it wrong) I have a grid of values with 41x161 nodes describing some parameter space. Each node has an associated value, λ, which represents the uncertainty of the parameter choice at that node. I want to make/find an algorithm that samples my grid randomly, because I want to explore the parameter space. However I want the algorithm to sample the 'best' space more often, i.e. the nodes with low λ. Therefore I am looking for an algorithm that is random, but also is more likely to select the best nodes. I have two ideas: 1) Create an array with all nodes, but over represent the best nodes (i.e. put them into the array multiple times). The amount of times a node is put into the array is a function of its 'goodness'. Then randomly sample this array. Good nodes are more likely to be sample simply because they occur most often. 2) Randomly sample a node and then randomly decide whether or not to use that node based on a 'gate keeper' that has a randomly assigned threshold tolerance. Nodes that are good are more likely to pass whether or not the gate keeper has a high or low tolerance, whereas bad nodes will only pass if the gate keeper is feeling generous. Are there any well established, good ways to do this. Does what I'm doing even make sense?
 P: 4,572 Hey billiards. If you want to sample low nodes more frequently maybe you could use your sampler as the inverse of the PDF of your grid: this will sample the low nodes more frequently and the upper nodes less so (i.e. in terms of lambda). You can use a technique in the Markov-Chain Monte-Carlo family of techniques to simulate arbitrary distributions: so what you would do is use your distribution (i.e. use the inverse of PDF and then normalize it so it's valid even if this is a bin distribution and it's being adjusted) and then use something like Metropolis-Hastings to take say a regular uniform distribution pseudo-random generator and turn it into a custom pseudo-random generator for your PDF of lambda values. If you have any distribution that corresponds to your sampling distribution, then you can use the MCMC techniques to do that.
P: 747
 Quote by chiro Hey billiards. If you want to sample low nodes more frequently maybe you could use your sampler as the inverse of the PDF of your grid: this will sample the low nodes more frequently and the upper nodes less so (i.e. in terms of lambda). You can use a technique in the Markov-Chain Monte-Carlo family of techniques to simulate arbitrary distributions: so what you would do is use your distribution (i.e. use the inverse of PDF and then normalize it so it's valid even if this is a bin distribution and it's being adjusted) and then use something like Metropolis-Hastings to take say a regular uniform distribution pseudo-random generator and turn it into a custom pseudo-random generator for your PDF of lambda values. If you have any distribution that corresponds to your sampling distribution, then you can use the MCMC techniques to do that.
Sounds good. But could you break that down a bit for me please. I'm not a mathematician, my maths background is not strong :)

Thanks

TO MOD: How can I change the title of my thread?

P: 4,572
Random selection, with skewed distribution

 Quote by billiards Sounds good. But could you break that down a bit for me please. I'm not a mathematician, my maths background is not strong :) Thanks TO MOD: How can I change the title of my thread?
The basic idea is that you have some distribution you want to sample from call it f(x) for the moment. This will reflect your sampling distribution which you will do your random sample from.

You then use the techniques of MCMC to basically construct a pseudo-random generator for f(x).

Here is a wiki link to one technique: Accept-Reject

http://en.wikipedia.org/wiki/Rejection_sampling

The algorithm is specified below and as long as you have a pseudo-random generator for the uniform distribution, and as long as your pdf meets the requirements, you can construct a procedure to simulate your f(x). The algorithm is only a couple of lines so it's not too bad.

There are other methods like this one (Metropolis-Hastings):

http://en.wikipedia.org/wiki/Metropo...ings_algorithm

They have their own issues but the best way to find out is to do it empirically: you should implement the code, generate ten thousands samples (or as many as desired) and then plot the histogram.

I'd take a look at the Accept-Reject method first and test it out. If you're using a bin distribution then you could say calculate 1/lambda(x,y) where lambda(x,y) is the discrete distribution and your maximum value for the corresponding M will be where this is maximized.
P: 3,254
 Quote by billiards I have a grid of values with 41x161 nodes describing some parameter space. Each node has an associated value, λ, which represents the uncertainty of the parameter choice at that node.
What do you mean by "uncertainty"?

 I want to make/find an algorithm that samples my grid randomly, because I want to explore the parameter space. However I want the algorithm to sample the 'best' space more often, i.e. the nodes with low λ.
What does "sample" mean in this context? You haven't made it clear whether the parameter "space" is a grid of possible values for some single parameter or whether each node in the grid represents a different parameter. You haven't made it clear whether "sample" means to evaluate a deterministic function of the current set of parameters or whether it means to make a physical measurement or simulate a realization of a random variable.
P: 747
Would it help if I described the physical problem I am trying to solve? I am a seismologist, I am looking at shear wave splitting -- which is analagous to birefringence in light waves -- the phenemenon whereby a transversely polarised signal splits into two (fast and slow waves) as it passes through an anisotropic medium. Now what I have done is to measure the "splitting parameters" of the medium. These parameters are the angle through which the signal components on your recording device must be rotated such that the waveforms are similar to each component (fast wave on one component, slow wave on the other), and the total delay time between the two waves. The parameter space (my grid) has 181 nodes in one dimension, from -90 to 90 one at every degree, and 41 nodes in the other direction at every 0.1 seconds of delay time. For each node I use the parameters to try to "undo" the splitting on the signal, and then measure how well this works. I end up with a grid of values over the whole parameter space, I then perform an F-test to contour the grid, and normalise so that a value of 1 corresponds to the 95% confidence interval and larger numbers correspond to lesser confidence intervals. This grid is then a measure of the anisotropy (or at least birefringence) of the medium through which the ray has passed, whereby each point in the parameter space has some "uncertainty" value attached. This grid is then used as the input for the next problem, which is the problem that I am trying to work out:

I need to use this measure of anisotropy as an input to another problem. I want to input the whole grid, but it is too computationally expensive, and so I have use a subset of the grid. I've tried various ways of generating subsets, but there are always drawbacks, so my latest idea is to take a random subset. The drawback of that is that I need to make sure that my "best" parameters are part of that subset. Which is why I need to build in some kind of skew to get my random samples to better represent the parameters with the least uncertainty.

 Quote by Stephen Tashi What do you mean by "uncertainty"?
It is the predetermined "suitability" of a parameter to describe a physical reality. A value of 1 corresponds to the 95% confidence interval, a value less than 1 falls inside 95% confidence, a value of greater than one is less confident.

 Quote by Stephen Tashi What does "sample" mean in this context? You haven't made it clear whether the parameter "space" is a grid of possible values for some single parameter or whether each node in the grid represents a different parameter. You haven't made it clear whether "sample" means to evaluate a deterministic function of the current set of parameters or whether it means to make a physical measurement or simulate a realization of a random variable.
Each node in the grid represents a different parameter by its co-ordinates. The value assigned to that node is the "uncertainty".

Sample in this case means to pick a node. I want to pick nodes at random, but preferentially pick the nodes with the least uncertainty.
P: 3,254
 Quote by billiards For each node I use the parameters to try to "undo" the splitting on the signal, and then measure how well this works.
I gather this means that for each node, you use the 2 parameters that define the coordinates of the node to do something. Do you use them in a calculation? Does this calculation produce a single number? Do you compare this number to the results of an experiment? - or do you compare it to the results of deterministic model? - or a simulation?

 I end up with a grid of values over the whole parameter space, I then perform an F-test to contour the grid, and normalise so that a value of 1 corresponds to the 95% confidence interval and larger numbers correspond to lesser confidence intervals.
What do you mean by "contour" the grid. Are you fitting a curve to a function that is defined at each node of the grid?

Based on experience in answering questions in this forum (and not particularly on your own post), many people use the term "confidence interval" without understanding its technical definition. In particular, many people mistake "confidence" as a synonym for "probability". So they think that saying "a 95% confidence interval for the mean is [1.38, 2.69]" means that there is a 95% chance the mean is in that interval. (i.e. they mistake "confidence interval" for a "credible interval").

In statistics, "tests" are associated with significance levels, not with confidence intervals. Perhaps you are using the F statistic as an estimator of something rather than a test for something.

 This grid is then a measure of the anisotropy (or at least birefringence) of the medium through which the ray has passed, whereby each point in the parameter space has some "uncertainty" value attached.
If it is complicated to explain exactly what you are doing, perhaps you can provide a link to an article that does the same thing. You are asking a question that apparently involves probability and statistics, but (to me) you have not clearly stated a mathematical problem. You haven't explained why there is any probabiliIty involved. Is it from "measurement error" in a repeated experiment?

Why does the "the grid" become a measure of something? To say the grid is a "parameter space" suggests that only one node on the grid gives the appropriate parameters for a given situation. Why would results at all the other nodes matter?
P: 747
 Quote by Stephen Tashi I gather this means that for each node, you use the 2 parameters that define the coordinates of the node to do something. Do you use them in a calculation? Does this calculation produce a single number? Do you compare this number to the results of an experiment? - or do you compare it to the results of deterministic model? - or a simulation?
Yes, I use the coordinates to process the signal. If the parameters successfully remove the splitting then the particle motion will be linear. I then quantify the linearity of the particle motion using the second eigenvalue of the covariance matrix of the two waveforms. If the second eigenvalue is small then the particle motion is linear, and the parameters have worked. I insert the value of the second eigenvalue into the grid.

 Quote by Stephen Tashi What do you mean by "contour" the grid. Are you fitting a curve to a function that is defined at each node of the grid?
I don't know how this bit works in detail.

This paper describes the method:

Read the section on 'Measuring shear wave splitting' to understand the problem, 'Error estimation', and in the appendix 'Calculation of confidence region', will give you more details on the use of the F-test and how this is used to contour the measurements.

It is important to note that I do not have aproblem with any of the above. The problem comes in the extension to the method that I am currently working on.

 Quote by Stephen Tashi Why does the "the grid" become a measure of something? To say the grid is a "parameter space" suggests that only one node on the grid gives the appropriate parameters for a given situation. Why would results at all the other nodes matter?
The grid essentially contains the information for how well each parameter did in correcting for the shear wave splitting. Only one node on the grid is, I suppose, the correct and absolute answer. But in actual fact there will be errors and uncertainties in the fidelity of the 'best' parameter, associated with the width of the confidence interval around that node. Previously people have just used the 'best' parameter and carried on, not stopping to think that actually the parameters might be wrong. I want to use more information, to incorporate the uncertainty in the measurement into the next step.
P: 4,572
I noticed you mentioned this statement:

 It is the predetermined "suitability" of a parameter to describe a physical reality. A value of 1 corresponds to the 95% confidence interval, a value less than 1 falls inside 95% confidence, a value of greater than one is less confident.
This statement is a little confusing: a confidence interval refers to an interval whereby the integral or summation of probability values is equal to the associated probability.

I'm guessing the problem you have is one of random sampling, but it is a little troubling to see the statement above because this is not how confidence intervals work: they are simply relating probabilities of a simple region to the interval that describes the region.

The other thing I want to ask is how you want to weigh your probabilities at each node in the grid.

The simplest transformation is the inverse of the probabilities and then normalizing to get a PDF to be used as a sampling distribution, but I have a feeling that you have extra conditions and subtleties that we are missing in the context of your application.

With regards to sampling, it's best if you lay out all the conditions for the actual sampling distribution and also for the sampling procedure: your problem is obviously not some quick textbook problem and you can't afford to make major screwups.

Please post your comments with regard to firstly what specific properties you want the sampling distribution to have (you said inverse of the weights, but should this inverse be transformed in a certain way? Does it depend on the position of the node? Does it depend on the relationship between nodes?) and also how you are going to sample? (Is it a truly independent random sample or do you other things that make it a non-independent process)?
 P: 747 I'm thinking this might have got a little bogged down. Let me rephrase the problem. Imagine a string of numbers: 50,2,5,1,7,35,124,34,13,67,234,...... Now I want to randomly pick a subset of these numbers. Easy enough. The twist, I want my subset to be contain more of the lower numbers, and less of the higher numbers.
 P: 747 I have an algorithm to do this now. What it does is it picks five (or six, or seven -- I haven't decided yet) random numbers, and then keeps the lowest one. Then repeats until I have my subset. What do you think?
P: 4,572
 Quote by billiards I have an algorithm to do this now. What it does is it picks five (or six, or seven -- I haven't decided yet) random numbers, and then keeps the lowest one. Then repeats until I have my subset. What do you think?
Can you post the algorithm so we can give some feedback?

You can simply create the PDF to shift the probabilities where you want them. Order statistics and distributions deal with minimum and maximums if you are interested.
P: 747
 Quote by chiro a confidence interval refers to an interval whereby the integral or summation of probability values is equal to the associated probability.
OK right. So the sum of all the probabilities inside the 95% confidence interval add up to 95%?

 The other thing I want to ask is how you want to weigh your probabilities at each node in the grid.
The values assigned to each node can be thought of as how well the parameter does the job, where low numbers actually mean that the parameter (node) does a better job than high numbers. The F-distibution is calculated from the number of degrees of freedom in the signal. And those values are normalised such that nodes lying on the 95% confidence interval have a value of 1.

 With regards to sampling, it's best if you lay out all the conditions for the actual sampling distribution and also for the sampling procedure: your problem is obviously not some quick textbook problem and you can't afford to make major screwups.
There are two end-member samplings:
1) sample one and only one node, and make that your "best" node.
2) sample absolutely every node and wait about a year for your calculations to run (no joke).

What I want is something in between. To sample a subset of the nodes, but to make that subset focussed around the best nodes. I've tried just using nodes inside the 95% confidence region, but the drawback is that sometimes the 95% confidence region is huge, and sometimes it's tiny. So I could maybe take the 30 best nodes, but it is likely that those nodes would all cluster together, and in the case where I have a large 95% confidence region I want to explore the parameter space a bit more. That is why I like the random approach, because it seems to get around some of the drawbacks with with the other approaches I've tried, but has the major drawback that it won't focus on my "best" parameters -- that is, unless I can build in some function that skews the random sampling towards my "best" nodes.

 Please post your comments with regard to firstly what specific properties you want the sampling distribution to have (you said inverse of the weights, but should this inverse be transformed in a certain way? Does it depend on the position of the node? Does it depend on the relationship between nodes?) and also how you are going to sample? (Is it a truly independent random sample or do you other things that make it a non-independent process)?
I want less samples of "bad" nodes and more samples of "good" nodes. Whether the distribution is linear, log-normal, or some other fancy thing, doesn't really bother me too much at this stage. As long as it has the property of more "good" and less "bad".

I want the algorithm not only to find "good" nodes that are all clustered together, if there are good nodes elsewhere I want the algorithm to be able to find them.
P: 747
 Quote by chiro Can you post the algorithm so we can give some feedback? You can simply create the PDF to shift the probabilities where you want them. Order statistics and distributions deal with minimum and maximums if you are interested.
Sure. The code seems to be working and returning sensible results in tests that I have done today. I could post plots if you're interested.

subroutine random_reject_sample(lam2slice,no_samps,nodes)

implicit none

! Subroutine to randomly sample points in a lam2 slice but preferentially
! to keep those samples with the lowest lam2 values.

! IN
! lam2slice       surface of nodes to randomly sample from
! np1             number of points in dt dimension
! np2             number of points in phi dimension
! no_samps        number of nodes to randomly sample

! OUT
! nodes           randomly sampled nodes

!remove these for testing
! use array_sizes ! use the array_sizes modules
integer :: np1,np2

integer,parameter :: seed = 987654321
!real :: lam2slice(np1,np2),lambda2min
real :: lam2slice(181,41),lambda2min ! line changed for testing
real :: picknode(3),value
integer :: no_samps,nodecount
integer :: nodes(1500,2) ! 1500 = more than enough rows (really only need #no_samps, one for each pick) and 2 columns (x and y coords)
integer :: randomsubsetsize = 5 ! number of nodes to select in a subset, the 'best' node in each subset is kept.
integer :: dupe_check ! used to check if a node has already been picked, to avoid sampling the same node more than once.
integer :: ii,jj,kk ! used for looping
integer :: ifast,itlag,x1,y1

np1=181
np2=41

dupe_check=0

call srand(seed)

! Take the 'best node' to ensure it's sampled
call zerror_min(lam2slice,np1,np2,ifast,itlag,lambda2min)
nodecount=1
nodes(nodecount,1)=ifast
nodes(nodecount,2)=itlag

! populate the nodes list
do ii=2,no_samps

nodecount = ii
!	  print *,'picking node:',nodecount

do ! loop until we have randomly selected a node

if (dupe_check .eq. 1) exit

! sample #randomsubsetsize nodes, and keep the node with the lowest value (unless we already have it).

picknode(:)=99999.99999 ! Reset picknode, important, must be a big number

do jj=1,randomsubsetsize
x1=nint(real(np1-1) * rand() ) + 1 ! generates a random integer between 1 and np1
y1=nint(real(np2-1) * rand() ) + 1 ! generates a random integer between 1 and np2
value=lam2slice(x1,y1)
!			print *,'randomsubset pick',jj,':',x1,y1,value
! if the value is the smallest yet encountered, then keep it
if ( value .lt. picknode(3) ) then
picknode(1)=real(x1)
picknode(2)=real(y1)
picknode(3)=value
endif

enddo ! end picking (best of subset) random node

! scan through the nodes list to check for duplication
do kk = 1,nodecount-1
dupe_check = 1
if ( (picknode(1) .eq. nodes(ii,1)) .and. (picknode(2) .eq. nodes(ii,2)) ) then
dupe_check = 0
!			  print *, 'duplicate node picked, trying again'
exit
endif
enddo ! end checking for duplication

! testing  -- enable to get stats on the values of the picks
!		  print *, 'node picked', picknode

! if we reach here the node should be ok so we can add it to our node list
nodes(nodecount,1)=nint(picknode(1))
nodes(nodecount,2)=nint(picknode(2))

enddo ! end randomly selecting a node
dupe_check=0

enddo ! end populating the nodes list

return
end
Mentor
P: 18,061
 Quote by billiards TO MOD: How can I change the title of my thread?
Please report the original post and say what the new title should be. A mentor will then change it for you.
P: 3,254
 Quote by billiards Does what I'm doing even make sense?
Returning to that important question, what you are doing is trying to produce a solution without stating any precise goal or problem. So what you are doing doesn't make sense yet.

As I see it, your data is a set of N vectors. Each vector is of the form $(\phi_i, \delta^2_i, \lambda_i)$. The values $\phi, \delta^2$ are estimates of some physical parameters. The value $\lambda_i$ can be used (somehow!) to define an ellipse around the point $(\phi_i, \delta^2_i)$ that is a 95% "confidence interval". (The common misinterpretation of that interval would be that there a 0.95 probability of the true values of the parameters being in that ellipse. If you take a Bayesian point of view, you can make this interpretation valid. Otherwise, watch your step.)

It isn't clear to me whether the collection of data that you have is intended to estimate one correct pair of values $(\phi,\delta^2)$ that applies over the whole earth or whether the analysis you are doing assumes that each estimate is estimating a different true pair of values at different locations on the earth.

For some unstated reason, you want to take samples from you data and do something with them. I think want to take more samples of the data points with the smaller $\lambda_i$. You might also want to sample points from inside the ellipse around each $(\phi_i,\delta^2_i )$ in which case you woudl be sampling points not actually in the original set of estimates.

How you should do this is not clear until you define precisely what you are trying to accomplish. If you are randomly sampling the data, it seems that you must be doing a probabilistic simulation of something. What are you simulating? Are you estimating the statistical variability of some function of the data? Perhaps you are merely trying to estimate one true value $(\phi, \delta^2)$ by combining all the estimates.
P: 747
 Quote by Stephen Tashi Returning to that important question, what you are doing is trying to produce a solution without stating any precise goal or problem. So what you are doing doesn't make sense yet.
OK, so the problem is to measure the anisotropy of a layer in the presence of surrounding anisotropic layers. To fully understand it one needs to (a) fully understand the one layer problem (see the SIlver and Chan paper I linked to before), and (b) understand ray geometry in seismology and how this can be manipulated to measure multiple layers of anisotropy.

Now I don't expect anybody to follow my particular physical application. In fact I don't have a problem with that bit -- I've already solved it. Now I'm looking for a slightly better way to handle one of the intermediate steps. The problem I have is with the mathematics. What I was hoping for by coming here, was help with the maths. The maths problem I have stated is basically this:

Given a set of numbers e.g. 41,41,32,33,24,23,22,21,12,14,24 etc.... etc...

Come up with an algorithm that picks out a random subset of those numbers, where the subset is biased towards the low numbers.

Now I have an algorithm, but maybe someone has a better one? Surely someone has solved this kind of problem in the past!?

 It isn't clear to me whether the collection of data that you have is intended to estimate one correct pair of values $(\phi,\delta^2)$ that applies over the whole earth or whether the analysis you are doing assumes that each estimate is estimating a different true pair of values at different locations on the earth.
It is a measure of anisotropy along the ray path. The parameters only apply for the ray in question.

 For some unstated reason, you want to take samples from you data and do something with them. I think want to take more samples of the data points with the smaller $\lambda_i$.
The data are the seismic waveforms. The parameters ∅ and δt define the co-ordinate of a parameter space. A grid search is performed at nodes in parameter space -- how well the parameters "fix" the data is recorded at the node λ. The grid of all these number defines an "error surface" over the parameter space. The parameters describe the anisotropy of a single layer. Now, I want to use this error surface as the input to measure the anisotropy on rays that go through two anisotropic layer, and to find the anisotropy of the second layer. I can do this because of the peculiarities of the ray geometry. I use the parameters in the high confidence region, to correct the signal for the anisotropy of the one layer, so that I can resolve the anisotropy in the other layer. What I want to do is to try a suite of corrections for the first layer, and to see how this affects my measurement of the second layer. I could try every point in parameter space, but the calculations would take forever to run. I want to pick a subset of points in parameter space at random, but preferentially pick those points that are known to be more likely, i.e. those in the high confidence regions.

 You might also want to sample points from inside the ellipse around each $(\phi_i,\delta^2_i )$ in which case you woudl be sampling points not actually in the original set of estimates.
This statement makes me suspect that you are not visualising the problem correctly. Any confidence region (not necessarily an ellipse) is defined entirely within the $(\phi,\delta t)$ space.

 How you should do this is not clear until you define precisely what you are trying to accomplish. If you are randomly sampling the data, it seems that you must be doing a probabilistic simulation of something. What are you simulating?
The anisotropy of a layer.

 Are you estimating the statistical variability of some function of the data?
I am trying to propagate the uncertainty in the measurement of a single layer of anisotropy, into the measurement of a second layer, wherein making the estimate of the second layer I have used the measurement of the first layer as an input to 'correct' the signal.

 Perhaps you are merely trying to estimate one true value $(\phi, \delta^2)$ by combining all the estimates.
Kind of. By combining lots of measurements, in a weighted way, I account for the variation that occurs if the correction to the signal is altered in a way that is reasonably constrained by the data.