Can Malloc Use Mmapped Shared Memory for Efficient Parent-Child Data Sharing?

  • Thread starter Thread starter TylerH
  • Start date Start date
  • Tags Tags
    Area Memory Set
AI Thread Summary
The discussion centers on using mmap'ed shared memory to enhance data sharing between parent and child processes in C++, aiming to avoid page faults. The original poster seeks a method to configure malloc to utilize this shared memory, noting that while GNU Linux allows malloc hooks, this approach lacks portability. Alternatives such as overloading the global new operator and using custom allocators are suggested as portable solutions. Concerns are raised about the inefficiency of Windows processes for this task, particularly regarding copy-on-write behavior during forking. The poster ultimately questions the feasibility of using threads instead of processes, highlighting the potential for a more efficient shared memory model.
TylerH
Messages
729
Reaction score
0
I want to set the area from with malloc gets its memory to a mmap'ed shared memory file, to share data between children and parents without causing pagefaults. It is safe because the data used by children is guaranteed not to be touched by the parent and the children are guaranteed not to malloc at all. Is there a standard way to set the memory area which malloc manages?
 
Technology news on Phys.org
In GNU Linux you can install malloc hooks. See here.
This is not portable to other operating systems.

In C++ you can:
- overload the global new operator,
- declare a custom new operator within a class
- specify an allocator for the various C++ containers
- use placement new syntax
All these methods are portable.
 
In Windows you can use a shared memory file. MSDN article:

msdn_named_shared_memory.aspx

I'm not sure if the windows debugger functions could be used unless windows allows processes to attach each other. If so, then you could use DebugActiveProcess(), ReadProcessMemory(), WriteProcessMemory(), ... . You'd also probably need to use DuplicateHandle() for any mutexes or semaphores that you'd want to used for synchronization. One method to use DuplicateHandle() is to include the hex values of the main process id and any handles to be shared on the "command line" used for CreateProcess().

msdn_debugger_functions.aspx

msdn_create_process.aspx
 
I looked at the glibc malloc hook functions, but I'd really like to avoid writing a full heap allocator. It's outside the scope of my knowledge to do well. I also tried redefining sbrk and brk, but that didn't work either. (They weren't called by malloc while mallocing memory.)

As for the Windows stuff, rcgldr, my application would be totally unfit for Windows because Win processes are so heavy and my program forks a lot. Windows, IIRC, doesn't do COW address space copies. I didn't give enough context for you to know this, though.
 
Without knowing more of your problem, this sounds like a terrible idea to me. And if your child processes are so simple that they don't need to allocate, why not use threads? [Note that event printf does a malloc].
 
Threading in C++ is still pretty awful. The program makes a list of primes. The parent maintains the list and forks off a child for each number to be checked and the exit status of the child indicates the primality of the number.

The problem I'm having is that the parent writing back primes to the list it causes pages to be copied for no reason, so I was going to mmap an anonymous shared region of memory to use as the heap of the parent to prevent the pages from being copied.
 
TylerH said:
The program makes a list of primes. The parent maintains the list and forks off a child for each number to be checked and the exit status of the child indicates the primality of the number. The problem I'm having is that the parent writing back primes to the list it causes pages to be copied for no reason, so I was going to mmap an anonymous shared region of memory to use as the heap of the parent to prevent the pages from being copied.
If this was done using threads, there would only be a single and common virtual address space (just one actual instance of the list of primes) for the parent and all child threads. Each child thread would only need to be spawned one time, and each child thread could pend on a mutex and/or semaphore for each number to be checked, and then post a status also using a mutex and/or semaphore. Why isn't this a viable solution for this program?
 
Last edited:
Back
Top