Curse of dimensionality - what is it that grows?

  • Context: Graduate 
  • Thread starter Thread starter schniefen
  • Start date Start date
Click For Summary
SUMMARY

The discussion centers on the "curse of dimensionality," specifically how the volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. As dimensions grow, the density of points in a fixed interval does not increase, leading to a paradox where the space appears to expand while the volume of the hypersphere diminishes. The conversation highlights the implications of this phenomenon in machine learning, where increasing attributes results in longer evaluation times and necessitates simplification.

PREREQUISITES
  • Understanding of hyperspheres and hypercubes in n-dimensional space
  • Familiarity with the Axiom of Choice and its implications in mathematics
  • Knowledge of Monte Carlo methods for integral estimation
  • Basic concepts of probability and density in higher dimensions
NEXT STEPS
  • Explore the mathematical implications of the Axiom of Choice in set theory
  • Study Monte Carlo integration techniques in higher dimensions
  • Investigate the relationship between dimensionality and data density in machine learning
  • Learn about the geometric properties of hyperspheres and their volumes
USEFUL FOR

Mathematicians, data scientists, and machine learning practitioners interested in understanding the effects of dimensionality on data analysis and computational efficiency.

schniefen
Messages
177
Reaction score
4
TL;DR
The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. It seems that the space somehow “gets bigger” as the dimension increases. However, ##\mathbb{R}## has the same cardinality as ##\mathbb{R^2}##.
The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. It seems that the space somehow “gets bigger” as the dimension increases. However, given data points in the interval ##[0,1]##, the density of points in the interval compared to the density of points for the same points in the unit square doesn’t increase, since ##\mathbb{R}## has the same cardinality as ##\mathbb{R^2}##. Then, what is it that “gets bigger”, gets more? Is sparseness a more relevant term than density?
 
Physics news on Phys.org
Cardinality is not related volume measurements at all basically, and you should just not think about it. [0,1] and [0,10000000000] have the same cardinality, but the former has almost zero volume compared to the latter.
 
  • Like
Likes   Reactions: mfb and schniefen
Suppose one would like to estimate an integral of a 1D and 2D function in the unit interval and square respectively, using say Monte Carlo. Why are more points needed in the latter vs the former? It seems again the space has grown and the density of points decreased.
 
The curse of dimensionality is often used in the context of machine learning where more attributes to be considered means more dimensions to evaluate and thus longer evaluation times. In ML it’s good to simplify.
 
schniefen said:
Summary:: The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. It seems that the space somehow “gets bigger” as the dimension increases. However, ##\mathbb{R}## has the same cardinality as ##\mathbb{R^2}##.

The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases.
Can you explain this? The hypercube has volume ##1## and the hyperspheres have volume ##\dfrac{2\pi^{n/2}r^{n-1}}{\Gamma(n/2)}.##

We get strange behaviors if we increase the dimension, cp.
https://www.physicsforums.com/threads/math-challenge-december-2019.981217/page-2#post-6274751 f.
 
schniefen said:
Suppose one would like to estimate an integral of a 1D and 2D function in the unit interval and square respectively, using say Monte Carlo. Why are more points needed in the latter vs the former? It seems again the space has grown and the density of points decreased.
I remember a colloquium about Banach-Tarski. The lecturer liked to emphasize, that it is less AC which contributes to the paradox, but rather our poor understanding of points.
 
fresh_42 said:
I remember a colloquium about Banach-Tarski. The lecturer liked to emphasize, that it is less AC which contributes to the paradox, but rather our poor understanding of points.
What is AC? What’s up with the points?

With the volume outside a hypersphere inscribed in the unit hypercube converging to 1, I meant that the volume of the hypersphere goes to zero as ##n## increases, right?
 
fresh_42 said:
Can you explain this? The hypercube has volume ##1## and the hyperspheres have volume ##\dfrac{2\pi^{n/2}r^{n-1}}{\Gamma(n/2)}.##

We get strange behaviors if we increase the dimension, cp.
https://www.physicsforums.com/threads/math-challenge-december-2019.981217/page-2#post-6274751 f.
The unit sphere has radius 1, so to be fair the unit cube needs to have "radius" 1 also (and hence volume ##2^n##). A correct claim is the volume of a sphere inscribed in a hypercube divided by the volume of that hypercube goes to zero as n goes to infinity.
 
schniefen said:
What is AC? What’s up with the points?
Banach-Tarski uses the Axiom of Choice as well as the concept of what a point is.
 
  • Like
Likes   Reactions: schniefen
  • #10
Is it correct to say that the density of points decreases as the dimension increases, for a fixed number of points? It doesn’t sound right, but yet I’ve come across phrases like “lower data density...” in higher dimensional space.
 
  • #11
I don't think you get that. If you set r=1, you have ##\pi^{n/2}\Gamma(n/2)##.

Taking log, and letting k=n/2, we get ##k \ln(\pi)- \ln(k!)##. Stirling's approximation says ##ln(k!) \approx k \ln(k)-k##. The ##k \ln(k)## term dominates the whole thing, so log of the volume goes to negative infinity as k goes to infinity, and hence the volume has to go to zero.Although now I realize there is a slight misstatements. Your formula is for the surface area of the hypersphere, the thing people normally talk about is the volume of a ball with radius 1 (though I don't think this changes the result, the ball is the thing that is geometrically a more tractable and important result).

The layman's description of what is happening is that as the dimension goes to infinity, all the volume of the ball is actually concentrated in points where all the coordinates look like ##1/\sqrt{n}##, so the volume of the ball becomes small relative to the cube it's inscribed in.
 
Last edited by a moderator:
  • #12
Yes, I saw my mistakes. It was the volume of the sphere, not the ball, and I automatically associated the exponential function as I saw ##\frac{c^n}{n!}## without recognizing that there is no summation. :frown:
 
  • #13
The volume ratio between unit sphere and surrounding cube is the probability that the sum of squares of uniform random numbers from 0 to 1 is below 1. For two dimensions, it's the chance that ##x^2+y^2<1## where x and y are both uniform random numbers in [0,1]. Obviously, as we add more and more terms that probability gets smaller and smaller. Every non-zero value of a third summand restricts the range x and y can have.
In other words, the average distance between your points increases, even though the distance in each dimension keeps following the same distribution. If you just have a fixed number of points then your coverage of the volume gets poorer and poorer the more dimensions you have.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 15 ·
Replies
15
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 66 ·
3
Replies
66
Views
7K
  • · Replies 33 ·
2
Replies
33
Views
3K
  • · Replies 2 ·
Replies
2
Views
1K