Curse of dimensionality - what is it that grows?

schniefen · Dec 12, 2020

The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. It seems that the space somehow “gets bigger” as the dimension increases. However, given data points in the interval ##[0,1]##, the density of points in the interval compared to the density of points for the same points in the unit square doesn’t increase, since ##\mathbb{R}## has the same cardinality as ##\mathbb{R^2}##. Then, what is it that “gets bigger”, gets more? Is sparseness a more relevant term than density?

Office_Shredder · Dec 12, 2020

Cardinality is not related volume measurements at all basically, and you should just not think about it. [0,1] and [0,10000000000] have the same cardinality, but the former has almost zero volume compared to the latter.

schniefen · Dec 12, 2020

Suppose one would like to estimate an integral of a 1D and 2D function in the unit interval and square respectively, using say Monte Carlo. Why are more points needed in the latter vs the former? It seems again the space has grown and the density of points decreased.

jedishrfu · Dec 12, 2020

The curse of dimensionality is often used in the context of machine learning where more attributes to be considered means more dimensions to evaluate and thus longer evaluation times. In ML it’s good to simplify.

fresh_42 · Dec 12, 2020

schniefen said:

Summary:: The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. It seems that the space somehow “gets bigger” as the dimension increases. However, ##\mathbb{R}## has the same cardinality as ##\mathbb{R^2}##.

The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases.

Can you explain this? The hypercube has volume ##1## and the hyperspheres have volume ##\dfrac{2\pi^{n/2}r^{n-1}}{\Gamma(n/2)}.##

We get strange behaviors if we increase the dimension, cp.
https://www.physicsforums.com/threads/math-challenge-december-2019.981217/page-2#post-6274751 f.

fresh_42 · Dec 12, 2020

schniefen said:

Suppose one would like to estimate an integral of a 1D and 2D function in the unit interval and square respectively, using say Monte Carlo. Why are more points needed in the latter vs the former? It seems again the space has grown and the density of points decreased.

I remember a colloquium about Banach-Tarski. The lecturer liked to emphasize, that it is less AC which contributes to the paradox, but rather our poor understanding of points.

schniefen · Dec 12, 2020

fresh_42 said:

I remember a colloquium about Banach-Tarski. The lecturer liked to emphasize, that it is less AC which contributes to the paradox, but rather our poor understanding of points.

What is AC? What’s up with the points?

With the volume outside a hypersphere inscribed in the unit hypercube converging to 1, I meant that the volume of the hypersphere goes to zero as ##n## increases, right?

Office_Shredder · Dec 12, 2020

fresh_42 said:

Can you explain this? The hypercube has volume ##1## and the hyperspheres have volume ##\dfrac{2\pi^{n/2}r^{n-1}}{\Gamma(n/2)}.##

We get strange behaviors if we increase the dimension, cp.
https://www.physicsforums.com/threads/math-challenge-december-2019.981217/page-2#post-6274751 f.

The unit sphere has radius 1, so to be fair the unit cube needs to have "radius" 1 also (and hence volume ##2^n##). A correct claim is the volume of a sphere inscribed in a hypercube divided by the volume of that hypercube goes to zero as n goes to infinity.

fresh_42 · Dec 12, 2020

schniefen said:

What is AC? What’s up with the points?

Banach-Tarski uses the Axiom of Choice as well as the concept of what a point is.

schniefen · Dec 12, 2020

Is it correct to say that the density of points decreases as the dimension increases, for a fixed number of points? It doesn’t sound right, but yet I’ve come across phrases like “lower data density...” in higher dimensional space.

Office_Shredder · Dec 12, 2020

I don't think you get that. If you set r=1, you have ##\pi^{n/2}\Gamma(n/2)##.

Taking log, and letting k=n/2, we get ##k \ln(\pi)- \ln(k!)##. Stirling's approximation says ##ln(k!) \approx k \ln(k)-k##. The ##k \ln(k)## term dominates the whole thing, so log of the volume goes to negative infinity as k goes to infinity, and hence the volume has to go to zero.Although now I realize there is a slight misstatements. Your formula is for the surface area of the hypersphere, the thing people normally talk about is the volume of a ball with radius 1 (though I don't think this changes the result, the ball is the thing that is geometrically a more tractable and important result).

The layman's description of what is happening is that as the dimension goes to infinity, all the volume of the ball is actually concentrated in points where all the coordinates look like ##1/\sqrt{n}##, so the volume of the ball becomes small relative to the cube it's inscribed in.

fresh_42 · Dec 12, 2020

Yes, I saw my mistakes. It was the volume of the sphere, not the ball, and I automatically associated the exponential function as I saw ##\frac{c^n}{n!}## without recognizing that there is no summation.

mfb · Dec 12, 2020

The volume ratio between unit sphere and surrounding cube is the probability that the sum of squares of uniform random numbers from 0 to 1 is below 1. For two dimensions, it's the chance that ##x^2+y^2<1## where x and y are both uniform random numbers in [0,1]. Obviously, as we add more and more terms that probability gets smaller and smaller. Every non-zero value of a third summand restricts the range x and y can have.
In other words, the average distance between your points increases, even though the distance in each dimension keeps following the same distribution. If you just have a fixed number of points then your coverage of the volume gets poorer and poorer the more dimensions you have.

Curse of dimensionality - what is it that grows?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight