Approximation of a process by independent draws

Stephen Tashi · Jul 12, 2012

Another thread in this section (https://www.physicsforums.com/showthread.php?t=619945) raises some interesting questions. I can't figure what the original poster there is asking, so I think it best to put what I'm asking in a new thread.

A simple "field test" of a system that detects a target would be to position the target (such as an aircraft) at a great distance from the detector (such as a rader) and then move the target closer in a straight line. Repeating such a test and recording the distance at which detection first takes place gives a distribution of ranges. A typical analysis of the results is to fit a curve F(x) to the cumulative histogram of the ranges. I've often seen the curve F(x) mis-named as "The Probability Of First Detection Vs Range". It actually is "The Cumulative Distribution Of Range At Which First Detection Occurs".

Suppose we wish to write a stochastic simulation of the above test that depicts events as the test progresses, We divide the total time of the test up into equal intervals and these define the "steps" of our simulation. We wish to simulate the first detection by making a independent random draw from a benoulli random variable at each step until a detection occurs. (i.e. this variable represents "first detection occurs" or "first detection does not occur".) In general, the bernoulli random variables will have different distributions at different steps. We want the distribution of first detection ranges that we get from the simulation to match the curve F(x). How do we solve for the distributions of the appropriate bernoulli random variables?

In the codes of many simulations and in many technical reports, I have seen the following done:
At step N, find the distance x between the detector and the target and let the probability of a first detection at that step be F(x). ( I have also seen people use F'(x)). This is a nonsensical model. If you divide a given interval up into even finer intervals and apply it, you increase the probability of detection within that interval since you do more independent random draws.

So what ought to be done?

chiro · Jul 12, 2012

I don't know if I got this right, but if you have a collection of Bernoulli random variables each with different probabilities depending on what happens in past trials, then I think you can model this distribution of the collection of bernoulli variables in terms of a probability generating function.

The generating function will have to take into account how the probabilities of later trials are affected by prior ones and it will probably be complicated.

The only thing is that a PGF assumes independence between each distribution, but it seems that each distribution will depend only on the value of the realization of the last trial.

What you could do is use a variant of the PGF to basically 'entangle' realizations of past results to affect the probability density functions of the next random variable and then take this and decompose it into a PGF.

For example let's say X(1) = [0,1] with [1/2,1/2] probability and X(2) = [0,1] with [1,0] probability if X(1) = 1 (detected) and otherwise X(2) = [0,1] with [1/4,3/4] probability.

Then we can use X(2) = (1 - X(1))Y where Y is distribution [0,1] with [1/4,3/4].

Doing this kind of thing for successive variables allows an expansion and a decomposition which can be used to generate a PGF function.

There might be problems with this, so I'm interested on any feedback.

Stephen Tashi · Jul 12, 2012

chiro said:

I don't know if I got this right, but if you have a collection of Bernoulli random variables each with different probabilities depending on what happens in past trials, then I think you can model this distribution of the collection of bernoulli variables in terms of a probability generating function.

I don't understand how to relate the distribution of detection ranges F'(x) to a generating function.

I'll assume "trials" means "steps" of the simulation = trials of the benoulli random variables.

Suppose at the [itex]n[/itex]th step of the simulation, we are at elapsted time [itex]t_n[/itex] and distance [itex]x_n[/itex] and let [itex]p_n[/itex] be the (as yet unknown) probability that first detection occurring in that step given it has not occurred on previous steps.

Assume the target is moving straight toward the detector, not back and forth or side to side or standing still.

The (unconditional) probability [itex]D_n[/itex] that the first detection happens on step [itex]n[/itex] is
[itex]D_n = (1-p_1)(1-p_2)...(1-p_{n-1}) p_n[/itex]

So I'd think that [itex]D_n[/itex] must be approximately equal to [itex]F(x_n) - F(x_{n-1}) = \int_{x_{n-1}}^{x_n} F'(x) dx[/itex], which is the fraction of detections predicted to happen between ranges [itex]x_{n}[/itex] and [itex]x_{n-1}[/itex].

Can this line of reasoning lead to a generating function?

chiro · Jul 12, 2012

I think I've actually mis-interpreted the problem, but you've clarified it with the latex in the above post.

I think the thing that's important for this example is separating whether something exists at a particular point (or distance) vs the actual probability of detecting it. Both refer to different things.

So we first have a distribution that something is at a particular distance. This is probably best to represented using a markovian distribution where the transition only happens with neighbouring cells. If the object is assuming to be constantly moving we can incorporate direction into the markovian model.

Then along-side this you have a model for the actual detection. The detection itself is different because its not going to always detect something even if it actually exists. If the detection always takes place, then there is no need to have this distribution since P(Detection) = 1.

So in your model you have given a model of detection, but in terms of an object having some sort of motion, the distribution of detection should actually be more dynamic especially if you utilize a detection scheme that makes use of the distribution of distance as well as the results of past detections.

For example if you get a detection, then your detection distribution will optimize itself to consider positions that have a local relation to that detection. There's no point for example checking a point 10 km's away from a detection that was registered a moment before.

I have a few ideas of how you would design the detection scheme based on geometry and markovian methods that look kind of like a queue distribution in that you can only add or subtract a member of the queue, except the number of people or things relates to a geometric interpretation involving a kind of distance.

Stephen Tashi · Jul 13, 2012

To make a versatile model of target detection, you would have to represent cases where the observer target distance could change (or not change) in a variety of ways. But in the situation I'm considering, the motion of the target toward the detector is deterministic. I'm willing to assume that the target starts at the same initial distance on each test and moves toward the detector with the same distance-vs-time function on each test. (i.e. Assume the [itex](t_i, x_i)[/itex] data are known and the same in all replications of the test.

The only stochastic aspect of the test is the range at which the (first) detection happens. I'm assuming that the distribution of this range (over the population of tests) is known and has cumulative distribution [itex]F(x)[/itex].

The problem is to determine the [itex]p_j[/itex] that will produce a distribution of first detection ranges that is consistend with [itex]F(x)[/itex]. (An interesing theoretical question is whether discrete simulations using steps approach a continuous random process as the distance intervals between the steps approaches zero. And is there some continuous function [itex]p(x)[/itex] which is, in some sense, the limit of the [itex]p_j[/itex]? )

chiro · Jul 13, 2012

So in your model, do you assume then that the detection at each "cell" is independent to every other point, or is dependent on the realizations of results given by past cells?

It seems that the initial position and trajectory in your model is deterministic, but that the detection probability over some interval is given by some distribution f(x).

The caveat though is the nature of the detection device which I hinted at in an earlier post. You can have quite a variety of detection methods and algorithms based on the uncertainty properties present in the actual physical detection.

It's these uncertainties when quantified in the physical detection process including its error that gives rise to the use of specific protocols. Think of it as the analog of constructing error correcting codes for communication channels with very specific noise properties: the code itself in the detector represents the detection protocol.

So the first thing to do is to start out with some error properties of physical detection. You could break it up into cells, do fancy geometry and whatever, but the basic model is to say given some cell (or an interval corresponding to continuous-space), then what is the probability of detection in that cell in the physical sense if we just want to relay a signal and see if it for lack of terminology 'bounces-pack' (kind of like a ping argument in network transmission).

You then construct a detection protocol that will actually send out signals in a very specific way as to gaurantee some probability given constraints for the trajectory, geometry, and for the actual detection device just like you do for a communications system with ECC's.

In this context, it's not only the physical detection itself, but the protocol that determines the final distribution.

Stephen Tashi · Jul 13, 2012

chiro said:

So the first thing to do is to start out with some error properties of physical detection.

That's good advice for creating a useful model of detection, but in this particular thread I have a more theoretical outlook. Having retired, I am (happily) not thinking about any realistic models of detection for particular systems. I'm merely recalling all the horribly wrong analysis and simulation code that I have seen, which were done based on tests like I have described. I'm wondering how hard it would be to come up with simple model for a moment-by-moment simulation that gives results that are consistent with a given range distribution F(x).

You could break it up into cells, do fancy geometry and whatever, but the basic model is to say given some cell (or an interval corresponding to continuous-space), then what is the probability of detection in that cell in the physical sense if we just want to relay a signal and see if it for lack of terminology 'bounces-pack' (kind of like a ping argument in network transmission).

The natural "cells"in this problem are simply the intervals of distance between the target and the detector. There are simple "physical models of detection that give the [itex]p_i[/itex] in terms of physical parameters like the radar cross section of the target, the pulse rate of the radar, it's signal strength at the given range, etc. However, if the task is to duplicate F(x) then one must solve the problem of finding some physical parameters that produce results that are consistent with F(x). Is that the approach you have in mind? That doesn't really answer the mathematical question I'm asking, namely: Does F(x) (alone) determine a unique set of [itex]p_i[/itex] in the very limited test scenario that I described?

awkward · Jul 13, 2012

If you have an empirical CDF for the range at first detection, F(x), how about simply drawing a Uniform(0,1) pseudo-random number u, and then solve F(x) = u for x? Then x is your simulated range. You would probably need to use numerical methods to solve the equation since you don't have a nice formula for F.

Stephen Tashi · Jul 13, 2012

awkward said:

If you have an empirical CDF for the range at first detection, F(x), how about simply drawing a Uniform(0,1) pseudo-random number u, and then solve F(x) = u for x?

That is not relevant to the question being asked. The question is how to implement a simulation that proceeds in a series of steps and makes an independent random draw of a random variable on each step - hence the title of the thread.

haruspex · Jul 13, 2012

So the CDF is sort-of backwards: F(0) = 1, F(∞) = 0.
If there's a step width δx at range x, we want the probability of triggering there (and no sooner) to be -F'(x).δx (the area under the PDF for that step).
But (if we've modeled it correctly) there's a probability F(x) that we don't get that far in the simulation, so we have amplify the trigger probability to -F'(x).δx /(1-F(x)).

Stephen Tashi · Jul 14, 2012

haruspex said:

So the CDF is sort-of backwards: F(0) = 1, F(∞) = 0.

For a mathematical discussion, it might be simpler to let F(x) be a non-backwards CDF. You are correct that my remark about F(x) being mis-interpreted as a probability of detection vs range is only plausible for a backwards CDF - one that is near 1.0 for short ranges. We can discuss the problem using either convention.

If there's a step width δx at range x, we want the probability of triggering there (and no sooner) to be -F'(x).δx (the area under the PDF for that step).
But (if we've modeled it correctly) there's a probability F(x) that we don't get that far in the simulation, so we have amplify the trigger probability to -F'(x).δx /(1-F(x)).

As I understand that reasoning:
[itex]F(x) =[/itex] pr( first detection occurs at range [itex]\ge x[/itex]
pr( first detection occurs in the interval [itex][x, x + \delta x][/itex] given no first detection at any range [itex]\ge x[/itex] )
= pr( first detection occurs in that interval and no first detection at range [itex]\ge x[/itex] )/ pr(no previous first detection at range [itex]\ge x[/itex] )
= [itex]-F'(x) \delta x/ ( 1 - F(x))[/itex]

haruspex · Jul 14, 2012

Stephen Tashi said:

As I understand that reasoning:

Yes, but on second thoughts it would be more practical to write it as (F(x-δx)-F(x))/(1-F(x)).
This gives an exact total probability of 1 however large the δx.

Stephen Tashi · Jul 16, 2012

haruspex said:

Yes, but on second thoughts it would be more practical to write it as (F(x-δx)-F(x))/(1-F(x)).
This gives an exact total probability of 1 however large the δx.

I wonder if there is a way to give mathematical meaning to the phrase "A random process implemented by a random draw to determine success or failure at each distance until a succes occurs". (i.e. take the limit (in some sense) as [itex]\delta \rightarrow 0[/itex], but get something interesting instead of zero)

Thinking of the exponential distribution and thinking of the variable as time instead of distance, the associated Poisson distribution could be viewed a limiting distribution of binomials, which we can imagine to be making independent random draws as finer and finter intervals of time.

Suppose I think of F(x) as analagous to the "backwards" cumulative of an exponential

If we imagine F(x) approximated (better and better) as a sum of exponentials, can we define an associated process analagous to the Poission?

Or can I think of F(x) as being implemented as "a exponential with a continuously varying parameter [itex]\lambda(x)[/itex] ?

haruspex · Jul 16, 2012

Stephen Tashi said:

I wonder if there is a way to give mathematical meaning to the phrase "A random process implemented by a random draw to determine success or failure at each distance until a succes occurs". (i.e. take the limit (in some sense) as [itex]\delta \rightarrow 0[/itex], but get something interesting instead of zero)

You could certainly give it a physical implementation, so a mathematical one follows. Imagine operating a Geiger counter with a continuously variable source (or shielding, say), with the r.v. being the time to first click. Is that the sort of process you have in mind?

Stephen Tashi · Jul 17, 2012

haruspex said:

You could certainly give it a physical implementation, so a mathematical one follows. Imagine operating a Geiger counter with a continuously variable source (or shielding, say), with the r.v. being the time to first click. Is that the sort of process you have in mind?

Yes, that's a physical version of what I have in mind and the mathematical question is whether one can take a somewhat arbitrary stochastic process and view it that way.

Perhaps the mathematical theory of reliabllity already does this in terms of the "faiure rate" function (http://en.wikipedia.org/wiki/Failure_rate), but the quality control orientation of that material isn't very inspiring.

I have in mind some analogy to the Wiener process, I visualize the Wiener process as the limit of processes, each of which involve independent random draws at small intervals (usually intervals of time) and the limit is taken by making both the size of the intervals and the effect of the random draws smaller and smaller.

In target detection simulations, the time to detect a target under "fixed conditions" is often modeled by a an exponential distribution with a parameter [itex]\lambda[/itex] that is a function of the conditions. If we take that model as the a basis for simulating what happens when a target moves toward a detector then the conditions would be changing continuously ( since range is one of the important variables).

Approximation of a process by independent draws

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect