Finding the maxima and minima of a surface defined by datapoints

sheelbe999 · Aug 11, 2012

Here's the problem I'm trying to solve,

We have an N dimensional surface. We do not know the form of this surface however we have datapoints which are likely to be close, though not guaranteed to be on the surface, with some outliers. What I want to do is determine where in N dimensional space the maxima and minima are, global and local. I want to only find the statistically significant maxima and minima.

Currently here's what I'm doing to solve this, I generate k nodes at random in N dimensional space. I then use the k means algorithm to cluster the datapoints into k populations about these k nodes. I then look at the average value in each cluster. This seems to work quite well for now however the issue of determining whether a maxima/minima is statistically significant (far enough away from the average value and enough data points for confidence). In addition I would like to eventually be able to come up with a smart way of discarding dimensions if they are uninformative in locating maxima/minima.

I know this is a complex problem to solve, any help would be appreciated. I wouldn't mind if I was only able to find the global maxima/minima with statistical significance, would be further along from where I am now.

chiro · Aug 11, 2012

Hey sheelbe999.

Personally, I think the real crux of the problem is finding definition for the surface in N dimensions that is not simply a deterministic description, but is a random variable.

Once you have this you can then do a hypothesis test for whether the derivative at a particular point has evidence to be 0 and then take it from there.

If you want to deal with a deterministic surface, then you can use the point estimate of this random variable based surface description, but if you want to do a full blown hypothesis test then you will have to resort to a more general situation.

The thing is though, that things get complicated because when the surface becomes a random variable, you will in general lose differentiability properties. One way to reconcile this is to look at what happens with Brownian motion: in Brownian motion, things are continuous but they are not differentiable.

You can "enforce" that your stochastic model is smooth and differentiable, but it means that you will end up having a specific kind of markov model that constraints the nature of surrounding points to be differentiable.

In other words, it means that you will have a lot of different surfaces that are defined based on whatever the realizations are. Once you have all these realizations, you can then calculate the derivatives of these realizations to get the minima and maxima of each dimension.

It's not easy by any means: but the thing you have to decide is whether you want to enforce differentiability and smoothness of each realization and this means having complicated markovian models for the distribution of surfaces. If you don't use this, then you can resort to using models of Brownian motion (or something similar depending on your assumptions) and then do hypothesis testing for the maximum and minimum values.

Either way, it's not pretty.

Stephen Tashi · Aug 12, 2012

sheelbe999 said:

What I want to do is determine where in N dimensional space the maxima and minima are, global and local.

Maxima and minima of what? An n-dimensional surface can be defined by setting a given function of n variables equal to a constant.

sheelbe999 · Aug 12, 2012

@stephen

an anologous problem would be z = f(x_1,x_2,...x_N) find local and global minima maxima for z. the issue is I don't know the form of f.

@chiro

thanks for your detailed response. I don't want to enforce differentiability, I think the surface is likely very discontinuous. How could I use brownian motion and hypothesis testing for find minima and maxima?

Stephen Tashi · Aug 12, 2012

sheelbe999 said:

@stephen

an anologous problem would be z = f(x_1,x_2,...x_N) find local and global minima maxima for z. the issue is I don't know the form of f.

Then z is a function of n variables and I think the surface you have is n+1 dimensional, not n-dimensional.

I want to only find the statistically significant maxima and minima.

What definition are you using for the phrase "statistically significant"? It seems to me you are facing a problem of estimation, not a problem of hypothesis testing (which is the context in which the standard definition of "statistically significant" would have meaning).

ajkoer · Aug 12, 2012

To determine the statistic significance of the points in space with random error, I would assume a shape, based on prior data examples, and then calculate an estimate of the Standard deviation from the assumed model 'true' points. Note, this is one case where upon selecting a poor model for the true shape, results in a more conservative (meaning larger) Standard deviation than warranted and makes the analysis more conservative.

One may also wish to search the regression literature on model specification error, and its effect on Standard deviation and hypothesis testing. Also, robust regression (meaning a 'model-free' approach) may provide some guidance on an acceptable level of conservatism (via any available measures of the SE of the coefficients).

Finding the maxima and minima of a surface defined by datapoints

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect