How Does C Manage Memory Addresses and Types?

RedX · Aug 19, 2010

I was playing around with C to try to discover how it assigns memory, so I wrote this:

int test1;
int test2;
char test3;
int* ptr=malloc(3*sizeof(int));
printf("address of dynamic memory: %p", ptr);
printf("address of a pointer: %p", &ptr);
printf("address of an integer: %p", &test1);
printf("address of another integer: %p", &test2);
printf("address of a char: %p", &test3);

and it outputed:
address of dynamic memory: 0x84ea008
address of a pointer: 0xbfc5bf98
address of an integer: 0xbfc5bf94
address of another integer: 0xbfc5f90
address of a char: 0xbfc5bf9f

All addresses are 8 digits long except the address of the dynamically allocated memory. I know you have to add a 0, but is it 0x084ea008 or 0x84ea0080? Is the dynamically allocated memory in the stack space? I heard that Linux stored the stack at 0xFFFFFFFF, and that it grew downward, so why isn't the dynamically allocated memory at 0xFFFFFFFF? Second, I notice that the separation between the address of a pointer and a char is 8 bytes, as 0xbfc5bf98 is smaller than 0xbfc5bf9f by 8 bytes. I even took out the pointer, and just had a program with chars and ints, and noticed that the separation between a char and an int is 8 bytes. Does Linux separate data types by 8 bytes:

int1, int2, int3, int4, int5 --- 8 bytes --- char1, char2, char3, char4

Separation between chars is 1 byte, and separation between ints is 4 bytes, but I don't know where 8 bytes comes from at the separation between a char and an int.

Is there a way I can see what address the code is stored in? The file type is ELF, and I assume what address the code is stored in is specified by the ELF structure, but every time I try to read an ELF file, Linux tells me it's not a text file so I can't read it.

Does Linux use a 2-tier page table for memory management? I'm a little confused about having code segments, data segments, stack and shared segments if the memory system is a 2-tier page table rather than a segment/page combo. And things like the shared segment: do you only see something stored in that segment if you create multithreaded programs, using the functions in some of the system call libraries?

schip666! · Aug 19, 2010

On a 32 bit machine your addresses will be 0x12345678. If you get a 7 hex-digit number in your printf() you pad the top with 0's, so your first guess of 0x084ea008 is correct. You can make printf() add the 0's somehow with extra formatting commands that I have blessedly forgotten...but the man page should reveal all.

Generally the heap, wherefrom dynamically allocated memory comes, is at the low end of memory and the stack somewhere above. They grow towards each other so the heap goes up as you add memory and the stack goes down as you call functions. This maximized the amount of memory you can use in a program.

Hardware may have requirements for the starting address of various data types. It looks like whatever you are running on (or the compiler you are using) wants to have everything "aligned" on 8 byte boundaries. This is the size of a double type and fairly standard, often sequentially declared values and multiple mallocs() will be "padded" to those boundaries for convenience. I'd bet that if you declared a series of chars on your stack (as you have one there now) they will be in sequential locations, but if you change types it might pad to the next alignment boundary.

You should be able to take the address of a function and print it just like a variable -- Lookup "function pointers" for the syntax. Also look at the man pages for the compiler and linker and see if there is an option for printing a memory map of your program. That should tell you where everything that is not dynamically alloced has been placed.

Sorry, but I don't know nuthin'bout Linux's page management...

airborne18 · Aug 22, 2010

On 32bit machines the memory is not segmented, and without looking at the source, I would bet that memory allocation is done through the operating system and not through standard lib implementation.

In theory under win32 each application gets a virtual memory space that is 4g.

You can build a dos application and win32 will give you a dos segemented memory space, but it is still. This is not to be confused with a command prompt application, which operates under win32.

I am not sure how virtual memory is handled today, but in the old real and protecte modes, pointers were simply a handle that the operating system and hardware understood. It was not always physical memory and could be cached, though you could force the page not to be cached.

Also you have to consider the architecture of the processors. They are designed to segment the system by levels, or rings. The lowest ring is hardware, bios, and device drivers. The operating system is the only software that can run across rings. Applications run at the highest ring ( i think it is 4). You will see the term prvilege level, that is how the system protects itself from rogue applications.

In any event. All memory, even device addresses, are fabricated by the operating system. Years ago you could just write directly to the screen by writing to the memory address, well those days are gone.

Read up on the architecture of the processor, that will clue you into how it works. I have been out of the device driver writing business for a long time, so my information is probably dated.

RedX · Aug 22, 2010

I always wondered why do you separate the memory of a program into different segments, such as code space, data space, and stack space. Why does code have to be by code, and data be by data, etc?

I read that DOS does not separate a program into segments.

I was reading some information on Linux and the format of the ELF file, and it's explanation was that the read/write protection is done by 4K pages, so that you can only specify protections on a contiguous 4k block, so you must have code in the same area, because if you put code in two different areas, then both of those areas have to be specified as read only, and you'll get fragmentation as you can't store data (which is writeable) in those two areas. Is this true of Windows .exe files, that a program is segmented into code space, data space, etc. (since all modern OS's use paging)? Ultimately after compiling its all machine code, so I don't quite understand why a OS can't read any executable format, but only a specified format. If you feed the processor machine code then shouldn't it run, no matter the exact virtual address of where the code is located, the data, the stack?

airborne18 · Aug 22, 2010

RedX said:

I always wondered why do you separate the memory of a program into different segments, such as code space, data space, and stack space. Why does code have to be by code, and data be by data, etc?

I read that DOS does not separate a program into segments.

I was reading some information on Linux and the format of the ELF file, and it's explanation was that the read/write protection is done by 4K pages, so that you can only specify protections on a contiguous 4k block, so you must have code in the same area, because if you put code in two different areas, then both of those areas have to be specified as read only, and you'll get fragmentation as you can't store data (which is writeable) in those two areas. Is this true of Windows .exe files, that a program is segmented into code space, data space, etc. (since all modern OS's use paging)? Ultimately after compiling its all machine code, so I don't quite understand why a OS can't read any executable format, but only a specified format. If you feed the processor machine code then shouldn't it run, no matter the exact virtual address of where the code is located, the data, the stack?

Dos is segmented. the reason for segmenting is to optimize the code for the processor. If you look at the instruction set for the intel processors you will find that memory operations are quicker if you use indexed operations. ( strcpy and memcpy are two big ones ). By segmenting the data groups and code groups you can optimize the instructions for the processor.

In the old days, 8086, 10 mhz, every opcode counted. And it still does.

Hurkyl · Aug 23, 2010

RedX said:

I don't quite understand why a OS can't read any executable format, but only a specified format. If you feed the processor machine code then shouldn't it run, no matter the exact virtual address of where the code is located, the data, the stack?

(Disclaimer: it's been a while since I've actually known any of these things)

An executable doesn't just contain machine code. The executable file also contains things like data (ROM or otherwise), a table of pointers that should be initialized to point to shared library routines (e.g. things in *.DLL on windows, or lib*.so on *NIX), some metadata identifying it as an executable and for what architecture, etc..

RedX · Aug 24, 2010

Hurkyl said:

(Disclaimer: it's been a while since I've actually known any of these things)

An executable doesn't just contain machine code. The executable file also contains things like data (ROM or otherwise), a table of pointers that should be initialized to point to shared library routines (e.g. things in *.DLL on windows, or lib*.so on *NIX), some metadata identifying it as an executable and for what architecture, etc..

That's what puzzles me. I understand for dynamic linking there will be something else besides machine code. But if you statically link a program, then I thought it would be all machine code.

I thought the only thing that differed from OS to OS was memory management. So the reason that a Windows program can't run on Mac had to do with how each OS or hardware assigns read/write protection onto pages, so that if you tried to run one program on the other machine, then although the x86 processor understands things perfectly fine, the program would try to access memory that would be legal under the OS where it was compiled, but illegal under the new OS.

As for the metadata, I didn't think of that either. Is it really critical for a program to have metadata? The OS has protection schemes so nothing harmful can come out of running an image file alone.

schip666! · Aug 24, 2010

With software everything is (im)possible. One could certainly support any executable file format containing any kind of code that one liked. Virtual processors like Mac Parallels do exactly that. You can even load the Intel machine code and treat it like Java byte code by translating each instruction into PowerPC code. It's just that standards like ELF make things easier and more efficient.

Each OS has slightly different ideas about it's interface. A program written for a Windows GUI will not have the right "attitude" to run in a Mac GUI without some kind of intermediary that knows how to translate. You could probably slip a completely POSIX compliant program through but, as another thread here has shown, you can't do much of interest with just POSIX.

I think the reasoning behind separating code and data into different memory segments is to prevent the code from being corrupted by in-inadvertent, or malicious, over-writing. This precludes such fun activities as writing self-modifying code, but you can always do that with an interpreted language or by re-writing your own executable file and re-exec()ing.

How Does C Manage Memory Addresses and Types?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect