How Does C Manage Memory Addresses and Types?

  • Thread starter Thread starter RedX
  • Start date Start date
  • Tags Tags
    Memory
Click For Summary

Discussion Overview

The discussion revolves around how the C programming language manages memory addresses and types, particularly in the context of dynamic memory allocation, memory segmentation, and the implications of these on different operating systems like Linux and Windows. Participants explore theoretical and practical aspects of memory management, including alignment, segmentation, and the structure of executable files.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant shares code to demonstrate memory address allocation in C and questions the significance of the address formats and the separation between different data types.
  • Another participant explains that on a 32-bit machine, addresses are padded with zeros and discusses how the heap and stack grow towards each other, suggesting that alignment may be based on hardware requirements.
  • A different participant mentions that memory allocation is managed by the operating system and not the standard library, and discusses the concept of virtual memory and its implications for application memory space.
  • Several participants express curiosity about the reasons for segmenting memory into different areas (code, data, stack) and question whether this is necessary for modern operating systems.
  • One participant raises concerns about the fragmentation that could occur if code and data are not properly segmented, referencing the ELF file format and its implications for memory protection.

Areas of Agreement / Disagreement

Participants express a range of views on memory management, with no clear consensus on the necessity or implications of memory segmentation across different operating systems. Some agree on the technical aspects of alignment and memory allocation, while others question the rationale behind segmentation.

Contextual Notes

There are unresolved questions regarding the specifics of memory management in different operating systems, including the handling of executable formats and the implications of paging versus segmentation. Participants also express uncertainty about the historical context of memory management techniques.

Who May Find This Useful

This discussion may be of interest to programmers, computer scientists, and students studying operating systems, memory management, and the C programming language.

RedX
Messages
963
Reaction score
3
I was playing around with C to try to discover how it assigns memory, so I wrote this:

int test1;
int test2;
char test3;
int* ptr=malloc(3*sizeof(int));
printf("address of dynamic memory: %p", ptr);
printf("address of a pointer: %p", &ptr);
printf("address of an integer: %p", &test1);
printf("address of another integer: %p", &test2);
printf("address of a char: %p", &test3);

and it outputed:
address of dynamic memory: 0x84ea008
address of a pointer: 0xbfc5bf98
address of an integer: 0xbfc5bf94
address of another integer: 0xbfc5f90
address of a char: 0xbfc5bf9f

All addresses are 8 digits long except the address of the dynamically allocated memory. I know you have to add a 0, but is it 0x084ea008 or 0x84ea0080? Is the dynamically allocated memory in the stack space? I heard that Linux stored the stack at 0xFFFFFFFF, and that it grew downward, so why isn't the dynamically allocated memory at 0xFFFFFFFF? Second, I notice that the separation between the address of a pointer and a char is 8 bytes, as 0xbfc5bf98 is smaller than 0xbfc5bf9f by 8 bytes. I even took out the pointer, and just had a program with chars and ints, and noticed that the separation between a char and an int is 8 bytes. Does Linux separate data types by 8 bytes:

int1, int2, int3, int4, int5 --- 8 bytes --- char1, char2, char3, char4

Separation between chars is 1 byte, and separation between ints is 4 bytes, but I don't know where 8 bytes comes from at the separation between a char and an int.

Is there a way I can see what address the code is stored in? The file type is ELF, and I assume what address the code is stored in is specified by the ELF structure, but every time I try to read an ELF file, Linux tells me it's not a text file so I can't read it.

Does Linux use a 2-tier page table for memory management? I'm a little confused about having code segments, data segments, stack and shared segments if the memory system is a 2-tier page table rather than a segment/page combo. And things like the shared segment: do you only see something stored in that segment if you create multithreaded programs, using the functions in some of the system call libraries?
 
Technology news on Phys.org
On a 32 bit machine your addresses will be 0x12345678. If you get a 7 hex-digit number in your printf() you pad the top with 0's, so your first guess of 0x084ea008 is correct. You can make printf() add the 0's somehow with extra formatting commands that I have blessedly forgotten...but the man page should reveal all.

Generally the heap, wherefrom dynamically allocated memory comes, is at the low end of memory and the stack somewhere above. They grow towards each other so the heap goes up as you add memory and the stack goes down as you call functions. This maximized the amount of memory you can use in a program.

Hardware may have requirements for the starting address of various data types. It looks like whatever you are running on (or the compiler you are using) wants to have everything "aligned" on 8 byte boundaries. This is the size of a double type and fairly standard, often sequentially declared values and multiple mallocs() will be "padded" to those boundaries for convenience. I'd bet that if you declared a series of chars on your stack (as you have one there now) they will be in sequential locations, but if you change types it might pad to the next alignment boundary.

You should be able to take the address of a function and print it just like a variable -- Lookup "function pointers" for the syntax. Also look at the man pages for the compiler and linker and see if there is an option for printing a memory map of your program. That should tell you where everything that is not dynamically alloced has been placed.

Sorry, but I don't know nuthin'bout Linux's page management...
 
On 32bit machines the memory is not segmented, and without looking at the source, I would bet that memory allocation is done through the operating system and not through standard lib implementation.

In theory under win32 each application gets a virtual memory space that is 4g.

You can build a dos application and win32 will give you a dos segemented memory space, but it is still. This is not to be confused with a command prompt application, which operates under win32.

I am not sure how virtual memory is handled today, but in the old real and protecte modes, pointers were simply a handle that the operating system and hardware understood. It was not always physical memory and could be cached, though you could force the page not to be cached.

Also you have to consider the architecture of the processors. They are designed to segment the system by levels, or rings. The lowest ring is hardware, bios, and device drivers. The operating system is the only software that can run across rings. Applications run at the highest ring ( i think it is 4). You will see the term prvilege level, that is how the system protects itself from rogue applications.

In any event. All memory, even device addresses, are fabricated by the operating system. Years ago you could just write directly to the screen by writing to the memory address, well those days are gone.

Read up on the architecture of the processor, that will clue you into how it works. I have been out of the device driver writing business for a long time, so my information is probably dated.
 
I always wondered why do you separate the memory of a program into different segments, such as code space, data space, and stack space. Why does code have to be by code, and data be by data, etc?

I read that DOS does not separate a program into segments.

I was reading some information on Linux and the format of the ELF file, and it's explanation was that the read/write protection is done by 4K pages, so that you can only specify protections on a contiguous 4k block, so you must have code in the same area, because if you put code in two different areas, then both of those areas have to be specified as read only, and you'll get fragmentation as you can't store data (which is writeable) in those two areas. Is this true of Windows .exe files, that a program is segmented into code space, data space, etc. (since all modern OS's use paging)? Ultimately after compiling its all machine code, so I don't quite understand why a OS can't read any executable format, but only a specified format. If you feed the processor machine code then shouldn't it run, no matter the exact virtual address of where the code is located, the data, the stack?
 
RedX said:
I always wondered why do you separate the memory of a program into different segments, such as code space, data space, and stack space. Why does code have to be by code, and data be by data, etc?

I read that DOS does not separate a program into segments.

I was reading some information on Linux and the format of the ELF file, and it's explanation was that the read/write protection is done by 4K pages, so that you can only specify protections on a contiguous 4k block, so you must have code in the same area, because if you put code in two different areas, then both of those areas have to be specified as read only, and you'll get fragmentation as you can't store data (which is writeable) in those two areas. Is this true of Windows .exe files, that a program is segmented into code space, data space, etc. (since all modern OS's use paging)? Ultimately after compiling its all machine code, so I don't quite understand why a OS can't read any executable format, but only a specified format. If you feed the processor machine code then shouldn't it run, no matter the exact virtual address of where the code is located, the data, the stack?

Dos is segmented. the reason for segmenting is to optimize the code for the processor. If you look at the instruction set for the intel processors you will find that memory operations are quicker if you use indexed operations. ( strcpy and memcpy are two big ones ). By segmenting the data groups and code groups you can optimize the instructions for the processor.

In the old days, 8086, 10 mhz, every opcode counted. And it still does.
 
RedX said:
I don't quite understand why a OS can't read any executable format, but only a specified format. If you feed the processor machine code then shouldn't it run, no matter the exact virtual address of where the code is located, the data, the stack?
(Disclaimer: it's been a while since I've actually known any of these things)

An executable doesn't just contain machine code. The executable file also contains things like data (ROM or otherwise), a table of pointers that should be initialized to point to shared library routines (e.g. things in *.DLL on windows, or lib*.so on *NIX), some metadata identifying it as an executable and for what architecture, etc..
 
Hurkyl said:
(Disclaimer: it's been a while since I've actually known any of these things)

An executable doesn't just contain machine code. The executable file also contains things like data (ROM or otherwise), a table of pointers that should be initialized to point to shared library routines (e.g. things in *.DLL on windows, or lib*.so on *NIX), some metadata identifying it as an executable and for what architecture, etc..

That's what puzzles me. I understand for dynamic linking there will be something else besides machine code. But if you statically link a program, then I thought it would be all machine code.

I thought the only thing that differed from OS to OS was memory management. So the reason that a Windows program can't run on Mac had to do with how each OS or hardware assigns read/write protection onto pages, so that if you tried to run one program on the other machine, then although the x86 processor understands things perfectly fine, the program would try to access memory that would be legal under the OS where it was compiled, but illegal under the new OS.

As for the metadata, I didn't think of that either. Is it really critical for a program to have metadata? The OS has protection schemes so nothing harmful can come out of running an image file alone.
 
With software everything is (im)possible. One could certainly support any executable file format containing any kind of code that one liked. Virtual processors like Mac Parallels do exactly that. You can even load the Intel machine code and treat it like Java byte code by translating each instruction into PowerPC code. It's just that standards like ELF make things easier and more efficient.

Each OS has slightly different ideas about it's interface. A program written for a Windows GUI will not have the right "attitude" to run in a Mac GUI without some kind of intermediary that knows how to translate. You could probably slip a completely POSIX compliant program through but, as another thread here has shown, you can't do much of interest with just POSIX.

I think the reasoning behind separating code and data into different memory segments is to prevent the code from being corrupted by in-inadvertent, or malicious, over-writing. This precludes such fun activities as writing self-modifying code, but you can always do that with an interpreted language or by re-writing your own executable file and re-exec()ing.
 

Similar threads

Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
Replies
16
Views
6K
  • · Replies 3 ·
Replies
3
Views
1K
Replies
7
Views
5K
  • · Replies 10 ·
Replies
10
Views
7K
  • · Replies 19 ·
Replies
19
Views
4K
  • · Replies 8 ·
Replies
8
Views
5K
  • · Replies 23 ·
Replies
23
Views
2K
  • · Replies 1 ·
Replies
1
Views
9K