# Finding the maxima and minima of a surface defined by datapoints

1. Aug 11, 2012

### sheelbe999

Here's the problem I'm trying to solve,

We have an N dimensional surface. We do not know the form of this surface however we have datapoints which are likely to be close, though not guaranteed to be on the surface, with some outliers. What I want to do is determine where in N dimensional space the maxima and minima are, global and local. I want to only find the statistically significant maxima and minima.

Currently here's what I'm doing to solve this, I generate k nodes at random in N dimensional space. I then use the k means algorithm to cluster the datapoints into k populations about these k nodes. I then look at the average value in each cluster. This seems to work quite well for now however the issue of determining whether a maxima/minima is statistically significant (far enough away from the average value and enough data points for confidence). In addition I would like to eventually be able to come up with a smart way of discarding dimensions if they are uninformative in locating maxima/minima.

I know this is a complex problem to solve, any help would be appreciated. I wouldn't mind if I was only able to find the global maxima/minima with statistical significance, would be further along from where I am now.

2. Aug 11, 2012

### chiro

Hey sheelbe999.

Personally, I think the real crux of the problem is finding definition for the surface in N dimensions that is not simply a deterministic description, but is a random variable.

Once you have this you can then do a hypothesis test for whether the derivative at a particular point has evidence to be 0 and then take it from there.

If you want to deal with a deterministic surface, then you can use the point estimate of this random variable based surface description, but if you want to do a full blown hypothesis test then you will have to resort to a more general situation.

The thing is though, that things get complicated because when the surface becomes a random variable, you will in general lose differentiability properties. One way to reconcile this is to look at what happens with Brownian motion: in Brownian motion, things are continuous but they are not differentiable.

You can "enforce" that your stochastic model is smooth and differentiable, but it means that you will end up having a specific kind of markov model that constraints the nature of surrounding points to be differentiable.

In other words, it means that you will have a lot of different surfaces that are defined based on whatever the realizations are. Once you have all these realizations, you can then calculate the derivatives of these realizations to get the minima and maxima of each dimension.

It's not easy by any means: but the thing you have to decide is whether you want to enforce differentiability and smoothness of each realization and this means having complicated markovian models for the distribution of surfaces. If you don't use this, then you can resort to using models of Brownian motion (or something similar depending on your assumptions) and then do hypothesis testing for the maximum and minimum values.

Either way, it's not pretty.

3. Aug 12, 2012

### Stephen Tashi

Maxima and minima of what? An n-dimensional surface can be defined by setting a given function of n variables equal to a constant.

4. Aug 12, 2012

### sheelbe999

@stephen

an anologous problem would be z = f(x_1,x_2,....x_N) find local and global minima maxima for z. the issue is I don't know the form of f.

@chiro

thanks for your detailed response. I don't want to enforce differentiability, I think the surface is likely very discontinuous. How could I use brownian motion and hypothesis testing for find minima and maxima?

5. Aug 12, 2012

### Stephen Tashi

Then z is a function of n variables and I think the surface you have is n+1 dimensional, not n-dimensional.

What definition are you using for the phrase "statistically significant"? It seems to me you are facing a problem of estimation, not a problem of hypothesis testing (which is the context in which the standard definition of "statistically significant" would have meaning).

6. Aug 12, 2012

### ajkoer

To determine the statistic significance of the points in space with random error, I would assume a shape, based on prior data examples, and then calculate an estimate of the Standard deviation from the assumed model 'true' points. Note, this is one case where upon selecting a poor model for the true shape, results in a more conservative (meaning larger) Standard deviation than warranted and makes the analysis more conservative.

One may also wish to search the regression literature on model specification error, and its effect on Standard deviation and hypothesis testing. Also, robust regression (meaning a 'model-free' approach) may provide some guidance on an acceptable level of conservatism (via any available measures of the SE of the coefficients).

Last edited: Aug 12, 2012