Allocating a huge boolean array fails

jk22 · Oct 14, 2017

Hello,

I'm trying to allocate a bool[3^27] array of bits in a Linux system with 1,2TB swap partition but the system says Segmentation Fault. How could I do this ? Thanx.

rootone · Oct 14, 2017

An array needs to be allocated as a contiguous area of memory/disk.
You may have enough memory/disk available in total, but not a contiguous region of that size

I like Serena · Oct 14, 2017

However we do it, it should never result in a segmentation violation. If it can't be allocated, we'll get an error condition instead.
How are you allocating your array exactly?
And when do you get the segmentation fault exactly?

jedishrfu · Oct 14, 2017

I think you're asking too much of your computer to allocate such a big array. I would instead look to see how it could be partitioned and perhaps stored in a database or a random access file.

jk22 · Oct 14, 2017

I have written

Long long int size=pow3 (27);

Printf ("allocating");
Bool f [size];
Printf ("..ok");

Pow3(n); is a subroutine returning a long long int equals to 3^n.

It never arrives to print ok.

Would bool *f=new bool [size]; be more flexible even if the space is not contiguous ?

jedishrfu · Oct 14, 2017

What is the problem you are trying to solve that you need such a large array?

I like Serena · Oct 14, 2017

jk22 said:

I have written

Long long int size=pow3 (27);

Printf ("allocating");
Bool f [size];
Printf ("..ok");

Pow3(n); is a subroutine returning a long long int equals to 3^n.

It never arrives to print ok.

Would bool *f=new bool [size]; be more flexible even if the space is not contiguous ?

That declaration puts the array in stack space, which is limited, causing a stack overflow. Instead I suggest to use indeed 'new', which puts it in dynamic memory. Then it will either succeed or it will throw a bad_alloc exception.

wle · Oct 14, 2017

You left some important context out of your question, particularly what programming language you are trying to use.

However if you are trying to allocate large arrays in C or C++ then the following issues are relevant:

The size of the
bool datatype depends on the compiler but it is most likely one byte, not one bit, so when you try to create an array of ##3^{27}## booleans you are probably asking to create an array nearly 7 terabytes in size rather than the ~900 gigabytes you might have been expecting. If you think storing one boolean per byte is wasteful and want to store them as bits then this requires some additional work (e.g. look up how to use "bit fields").
[*]Like others have pointed out, arrays are stored as a contiguous block in memory. If you have e.g. 1000 megabytes of free memory then this is no guarantee that you will be able to allocate a 1000 megabyte array: it could be that there is a block of 400 megabytes free somewhere and another block of 600 megabytes free somewhere else. So if you need close to the total amount of memory you have available you may need to use some other data structure that doesn't store all data in one contiguous block.
[*]When you use the
bool array[SIZE] syntax inside a function it allocates the array on the stack. This is an area of memory reserved for local function variables. The amount of stack space is usually very limited (e.g. a few megabytes, though the compiler or operating system may offer a way to change this). There are some ways to allocate an array in heap memory which don't have this limitation:
- Make the array a global variable (i.e., move the declaration outside the body of any function).
- Declare the array static.
  [*]Use malloc() or similar or (in C++) the new keyword to allocate memory.
These won't help with the limitations above, though.

A couple of additional points:

When you create an array using TYPE array[SIZE], the size should be a compile-time constant (a literal number or constant expression like 1024*1024 or a #defined constant or const variable). If you want to create an array whose size will be computed at runtime (like by your pow3() function) then you should use malloc() or new in C++.

C (but not C++) supports variable-length arrays (arrays whose sizes don't need to be known until runtime) since the C99 standard and some C++ compilers (like the GNU compiler) support them as a nonstandard extension. But they're still stack-allocated (with the limitations that follow) and they're risky to use (and nonportable in C++) because there is no way to test if the allocation was successful. (malloc(), by contrast, returns a null pointer if it fails to allocate memory.)
[*]

jk22 said:

Printf ("allocating");
Bool f [size];
Printf ("..ok");

[...]It never arrives to print ok.

In your case probably the array allocation failed, but in general not seeing it print "..ok" doesn't necessarily mean anything. The standard output stream is buffered in C and C++. This means calling printf() won't necessarily print anything immediately but instead save what you want to print into a buffer and wait until there is a certain amount of text in the buffer before actually printing it. You should call fflush(stdout) if you want to force printing to be done immediately, or print to standard error, which is unbuffered, instead of standard output. (Also, what are "Printf" and "Bool"? C and C++ like most languages are case sensitive. I am assuming you meant "printf" and "bool".)

.Scott · Oct 14, 2017

jk22 said:

Hello,
I'm trying to allocate a bool[3^27] array of bits in a Linux system with 1,2TB swap partition but the system says Segmentation Fault. How could I do this ? Thanx.

Part of the skill involved in programming is to develop algorithms that fit within the resources available to you. Even it you managed to allocate such an array, the first thing you would probably want to do is clear it to all false values. That would take tens of minute if not hours.
I suggest you find a new attack.

wle · Oct 14, 2017

Also: swap (virtual memory) is really slow. It's hundreds or thousands of times slower than main memory. So even if you can get your code working be prepared for your computer to grind to a slow crawl, including any other applications you are running as the memory they're using is swapped out.

jim mcnamara · Oct 14, 2017

If I read this thread correctly, and I assume that a boolean data object is 8 bits for your compiler:
3²⁷ = 7 625 597 484 987 or on the order of 10¹³ - bytes - somewhat less than ten trillion bytes. Allowing for the 1024/kB conversion factor you are still in the 8 TB range. You do not have 8TB of virtual memory.

[aside]This also looks like you are trying something with Graham's number.[/aside]

You have choices, assuming this is a rational thing to try in the first place:
Find a bignum library for your compiler and use it, there are several free libraries. Your compiler will thank you.
Find an alternate data representation - maybe use a bit array which gets you down to circa 2TB of virtual memory. Still a lot of disk.

jk22 · Oct 15, 2017

Thanks for your replies. I assumed a bool was 1 bit indeed hence 7/8 Tbytes.

In fact I fell upon this problem by wanting to check if any function f(x,y,z) can be decomposed in k(g(x,y),h(x,z)),l(y,z))

This decomposition seems okay, it's like all the projections on 2D planes of a higher dimensional space.*

I took x,y,z in 0,1 it's ok, but 0,1,2 is already too huge. In fact the number of functions f to check to have been obtained has complexity v^(m^d), where f(x1,...xd) is in {1,..v} and xi in {1,..m}. Hence this grows much faster than exponential or even factorial.

*Then I also tried for higher number of variables in 0,1 but it seems that dimension 7 is the highest dimension we could represent in this way (which is out of the scope of any computer we could imagine) But I think there are infinite many possibilities to decompose in 2 variables functions. (I somebody had references to this problem, thanks)

wle · Oct 15, 2017

jk22 said:

In fact I fell upon this problem by wanting to check if any function f(x,y,z) can be decomposed in k(g(x,y),h(x,z)),l(y,z))

Is there some constraint on these functions? E.g., why don't you consider $$\begin{eqnarray}
k(x, z, y) &=& f(x, y, z) \,, \nonumber \\
g(x, y) &=& x \,, \nonumber \\
h(x, z) &=& z \,, \nonumber \\
l(y, z) &=& y
\end{eqnarray}$$ a solution to this problem?

jtbell · Oct 15, 2017

wle said:

If you want to create an array whose size will be computed at runtime (like by your pow3() function) then you should use malloc() or new in C++.

Or use the standard C++ 'vector' data type, which allocates memory as necessary, e.g. 'std::vector<bool> f(size)'.

jk22 · Oct 15, 2017

wle said:

Is there some constraint on these functions? E.g., why don't you consider $$\begin{eqnarray}

wle said:

k(x, z, y) &=& f(x, y, z) \,, \nonumber \\
g(x, y) &=& x \,, \nonumber \\
h(x, z) &=& z \,, \nonumber \\
l(y, z) &=& y
\end{eqnarray}$$ a solution to this problem?

K shall be a function of 2 variables
I wrote it wrong its m (k ( g (x,y),h (x,y)),l (x,y)). Your solution is basically writing f (x,y,z)=k (g (x,y),z) which is not always possible since in this way you can generate 88 over 256 functions on 0,1
Yes thanks jtbell i tried new but i have to decompose [3^27]=[3^9]^3 which means a slight redecoding in base 3 easy but it takes a time for such a huge number of calls

I like Serena · Oct 15, 2017

jtbell said:

Or use the standard C++ 'vector' data type, which allocates memory as necessary, e.g. 'std::vector<bool> f(size)'.

Fun fact, an std::vector<bool> is actually a specialization that is implemented as an array of bits.
As such it's not a 'true' STL container.
See for instance http://en.cppreference.com/w/cpp/container/vector_bool

Vanadium 50 · Oct 15, 2017

Whatever you are trying to do, setting up 3²⁷ bools is almost certainly not what you want. That's saving the result of 7 trillion calculations. If each calculation takes a microsecond, that's still 3 months.

jk22 · Oct 15, 2017

I changed to smaller : 2^27 but even there it bugged. in fact the bug came from the usage of bool. I did with char and recomputed to compute the bits and it worked fine.

Vanadium 50 · Oct 15, 2017

Messages like #18 make me sad.

Message #8 was a great message in explaining what the problem is and how to fix it. But your are sticking to your guns and trying the same thing that didn't work last time.

rcgldr · Oct 16, 2017

wle said:

arrays are stored as a contiguous block in memory.

Only the virtual address space for an array is contiguous, the physical allocation can made of up of scattered chunks of 4096 byte physical memory pages.

wle said:

C (but not C++) supports variable-length arrays since the C99 standard and some C++ compilers support them as a nonstandard extension.

Microsoft and Visual Studio compilers do not support variable length arrays. They've somewhat kept up with C++ standards, but not C standards.

newjerseyrunner · Oct 17, 2017

You need to figure out what parts of the memory you need at any given time and only load that. You're trying to just load the whole thing into virtual memory and let the OS figure out what you need in actual RAM. Any software that has to open more than a gig or two of RAM, manages a lot of its own swapping (when an app does it it's usually called caching.) What on Earth are you trying to calculate with all of that data? My guess is that if you require that much memory, you need to figure out how to use parallelization and send it off to a server farm.

Other than things like physics simulations, the only thing I can think of that actually requires terabytes of information is raytracing for things like movies. Neither can be done or a regular person's computers. Simple things like that are done on supercomputers and things that take even more horsepower are farmed to companies like Amazon.

ChrisVer · Oct 30, 2017

In fact having a mastercode that will overflow your memory consumption is not only discouraged practice but also wrong (it means that you are not working efficiently).
The only way to go about it is either revisit the way you approach your problem and try to be smart, or figure out a way to use cloud or cluster computing to split your (impossible) master-job into smaller mini-jobs that run on different machines in parallel.

jim mcnamara · Oct 30, 2017

Or use a sparse array.

https://en.wikipedia.org/wiki/Sparse_matrix - more general discussion

jk22 · Feb 10, 2018

erratum : I'm trying to write $$f(x,y,z)$$ in the form $$m(g(x,z),h(y,z))$$ more precisely to find the ratio of such generated function over the whole in the case $$m:\{0,1,2\}^2 \rightarrow \{0,1\}$$ and $$h,g : \{0,1,2\}^2\rightarrow\{0,1,2\}$$

Mark44 · Feb 10, 2018

jk22 said:

erratum : I'm trying to write $$f(x,y,z)$$ in the form $$m(g(x,z),h(y,z))$$ more precisely to find the ratio of such generated function over the whole in the case $$m:\{0,1,2\}^2 \rightarrow \{0,1\}$$ and $$h,g : \{0,1,2\}^2\rightarrow\{0,1,2\}$$

This is very different from what you first asked, about an array that contained 3²⁷ bits.

Your notation suggests to me that h and g each have one of 9 pairs of input values -- (0, 0), (0, 1), (0, 2), (1, 0), ..., (2, 2), and produce 0, 1, or 2 as the output. m has the same 9 pairs as input values, and produces 0 or 1 as its output. Maybe you could use small lookup tables or small arrays?

jk22 · Feb 10, 2018

If we count there are ##2^9## different m and ##3^9## different g and h.

Hence the loop is over ##2^9 3^{18}## different function composition.

Somehow i have to check if the obtained function f is different from the others hence an array of ##2^{27}## bits or 16 MB which is manageable.

In fact its the loops that use much time, like a week.

I wonder if its possible to count the functions without exhaustive search on a computer.

Mark44 · Feb 10, 2018

jk22 said:

If we count there are ##2^9## different m and ##3^9## different g and h.

Hence the loop is over ##2^9 3^{18}## different function composition.

Wouldn't that be ##2^9 3^9 = 6^9## different possibilities? That's just over 10,000,000 different possibilities.

jk22 · Feb 10, 2018

I think it's ##3^9## for g and ##3^9## for h then we have to multiply all.

Mark44 · Feb 10, 2018

jk22 said:

I think it's ##3^9## for g and ##3^9## for h then we have to multiply all.

You're right. I used m and either g or h instead of g and h. Anyway, that's just ##3^9 \cdot 3^9 = 9^9## or 387,420,489 different input pairs for m. Still a lot less than your original ##3^{27}##.

Since m produces either 0 or 1, I'm not sure you need to double the result above.

LaTeX tip: For plain old numbers, like ##3^9##, use a pair of ## characters rather than a pair of $$. The former are for inline LaTeX and the later for standalone LaTeX. There's no good reason that I can see for ##3^9##, for example, to be formatted on its own line with extra lines before and after.

I like Serena · Feb 11, 2018

jk22 said:

erratum : I'm trying to write $$f(x,y,z)$$ in the form $$m(g(x,z),h(y,z))$$ more precisely to find the ratio of such generated function over the whole in the case $$m:\{0,1,2\}^2 \rightarrow \{0,1\}$$ and $$h,g : \{0,1,2\}^2\rightarrow\{0,1,2\}$$

Have you considered starting with a smaller problem?
Say with ##m:\{0,1\}^2 \rightarrow \{0,1\}## and ##h,g : \{0,1\}^2\rightarrow\{0,1\}##.
It means we have ##2^{2^3}=256## options for f and ##2^{2^2}=16## options for both g and h.
From there we can construct m and validate it, so that with brute force we need to verify ##256\cdot 16 \cdot 16 \cdot 8 = 2^{19} = 524288## values.
Still large but much more feasible.

Btw, I have the sneaking suspicion that it's not possible for all functions f, but I can't prove it yet.
It seems to me that since we reduce the domain of f from ##\{0,1,2\}^3## to the domain of m, which is ##\{0,1,2\}^2##, we cannot distinguish all possibilities.
The same should apply for the smaller domains.

Allocating a huge boolean array fails

Similar threads

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

How to calculate Tension for a series of connected points?

Who is responsible for the software when AI takes over programming?

Learning Assembly and computer architecture for x86

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers