Finding the maxima and minima of a surface defined by datapoints

Click For Summary

Discussion Overview

The discussion revolves around finding the maxima and minima of an N-dimensional surface defined by a set of data points, where the form of the surface is unknown. Participants explore various methods for identifying statistically significant extrema, addressing both theoretical and practical challenges in the context of statistical modeling and hypothesis testing.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes their approach of generating random nodes in N-dimensional space and using k-means clustering to identify clusters of data points, seeking statistically significant maxima and minima.
  • Another participant emphasizes the importance of defining the surface as a random variable rather than a deterministic function, suggesting hypothesis testing for derivatives to identify extrema.
  • A participant questions the definition of maxima and minima, proposing that an N-dimensional surface can be represented by a function of N variables set to a constant.
  • Concerns are raised about the potential discontinuity of the surface, with a participant expressing a preference against enforcing differentiability and inquiring about the application of Brownian motion in this context.
  • Discussion includes the need to clarify what is meant by "statistically significant" in the context of maxima and minima, with suggestions that the problem may be more about estimation than hypothesis testing.
  • One participant proposes assuming a shape based on prior data to estimate standard deviation and discusses the implications of model specification error on statistical significance.

Areas of Agreement / Disagreement

Participants express differing views on the nature of the surface and the appropriate methods for determining maxima and minima. There is no consensus on the best approach, and multiple competing perspectives remain throughout the discussion.

Contextual Notes

Participants highlight limitations related to the assumptions made about the surface, the potential for model specification error, and the challenges of applying hypothesis testing in this context. The discussion reflects a range of complexities associated with statistical modeling in high-dimensional spaces.

sheelbe999
Messages
13
Reaction score
0
Here's the problem I'm trying to solve,

We have an N dimensional surface. We do not know the form of this surface however we have datapoints which are likely to be close, though not guaranteed to be on the surface, with some outliers. What I want to do is determine where in N dimensional space the maxima and minima are, global and local. I want to only find the statistically significant maxima and minima.

Currently here's what I'm doing to solve this, I generate k nodes at random in N dimensional space. I then use the k means algorithm to cluster the datapoints into k populations about these k nodes. I then look at the average value in each cluster. This seems to work quite well for now however the issue of determining whether a maxima/minima is statistically significant (far enough away from the average value and enough data points for confidence). In addition I would like to eventually be able to come up with a smart way of discarding dimensions if they are uninformative in locating maxima/minima.

I know this is a complex problem to solve, any help would be appreciated. I wouldn't mind if I was only able to find the global maxima/minima with statistical significance, would be further along from where I am now.
 
Physics news on Phys.org
Hey sheelbe999.

Personally, I think the real crux of the problem is finding definition for the surface in N dimensions that is not simply a deterministic description, but is a random variable.

Once you have this you can then do a hypothesis test for whether the derivative at a particular point has evidence to be 0 and then take it from there.

If you want to deal with a deterministic surface, then you can use the point estimate of this random variable based surface description, but if you want to do a full blown hypothesis test then you will have to resort to a more general situation.

The thing is though, that things get complicated because when the surface becomes a random variable, you will in general lose differentiability properties. One way to reconcile this is to look at what happens with Brownian motion: in Brownian motion, things are continuous but they are not differentiable.

You can "enforce" that your stochastic model is smooth and differentiable, but it means that you will end up having a specific kind of markov model that constraints the nature of surrounding points to be differentiable.

In other words, it means that you will have a lot of different surfaces that are defined based on whatever the realizations are. Once you have all these realizations, you can then calculate the derivatives of these realizations to get the minima and maxima of each dimension.

It's not easy by any means: but the thing you have to decide is whether you want to enforce differentiability and smoothness of each realization and this means having complicated markovian models for the distribution of surfaces. If you don't use this, then you can resort to using models of Brownian motion (or something similar depending on your assumptions) and then do hypothesis testing for the maximum and minimum values.

Either way, it's not pretty.
 
sheelbe999 said:
What I want to do is determine where in N dimensional space the maxima and minima are, global and local.
Maxima and minima of what? An n-dimensional surface can be defined by setting a given function of n variables equal to a constant.
 
@stephen

an anologous problem would be z = f(x_1,x_2,...x_N) find local and global minima maxima for z. the issue is I don't know the form of f.

@chiro

thanks for your detailed response. I don't want to enforce differentiability, I think the surface is likely very discontinuous. How could I use brownian motion and hypothesis testing for find minima and maxima?
 
sheelbe999 said:
@stephen

an anologous problem would be z = f(x_1,x_2,...x_N) find local and global minima maxima for z. the issue is I don't know the form of f.

Then z is a function of n variables and I think the surface you have is n+1 dimensional, not n-dimensional.

I want to only find the statistically significant maxima and minima.

What definition are you using for the phrase "statistically significant"? It seems to me you are facing a problem of estimation, not a problem of hypothesis testing (which is the context in which the standard definition of "statistically significant" would have meaning).
 
To determine the statistic significance of the points in space with random error, I would assume a shape, based on prior data examples, and then calculate an estimate of the Standard deviation from the assumed model 'true' points. Note, this is one case where upon selecting a poor model for the true shape, results in a more conservative (meaning larger) Standard deviation than warranted and makes the analysis more conservative.

One may also wish to search the regression literature on model specification error, and its effect on Standard deviation and hypothesis testing. Also, robust regression (meaning a 'model-free' approach) may provide some guidance on an acceptable level of conservatism (via any available measures of the SE of the coefficients).
 
Last edited:

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
9
Views
3K