C: Pointers, Strings & Compiler Warnings

In summary, pointers are important in C because they allow for efficient access to different data types by storing the address of the data. Different types take up different amounts of storage, so defining a pointer type prevents programming errors. Arrays in C are automatically created as pointers, and a pointer to a char can be treated as a string. However, this does not hold for other data types. In order to use a pointer, it must be assigned a valid address first, otherwise it will result in a segmentation fault. Additionally, when using scanf, it is important to pass in the correct format specifier and provide a valid address for the input to be stored.
  • #1
RedX
970
3
In C, why do you have to define a pointer type? For a 32 bit computer, the address is 4 bytes long no matter what, so it shouldn't matter if the pointer is to a char or an int, yet I get a compiler warning when I use a char pointer to point to the address of an int.

Also, I don't quite understand how this program works:

main(){
char* ptr;
ptr="this is a test";
printf("%s",ptr);
}

This program prints out "this is a test", but how is a pointer to a char equal to a string? I thought pointers stored an address, and not values of data? It would make a little more sense if the pointer were dereferenced:

main(){
char* ptr;
*ptr="this is a test";
printf("%s",*ptr);
}

but that still doesn't make sense as I never specified an address for the pointer to point to, so where exactly does the computer store "this is a test"?
 
Technology news on Phys.org
  • #2
In C, a string, "...", is allocated in the programs data space, but references to it can be the content, such as initializing an array (char a[] = "..."), or the address of the string, such as initializing a pointer (char * ptr = "...").

Although all pointers are normally the same size, pointer types are used to prevent simple programming errors (wrong pointer type assignment).

Your second example wouldn't work. *ptr refers to a single character addressed by that pointer.
 
  • #3
A string is an array of chars. The pointer stores the address of the array.

A void pointer can be used as a generic pointer, but you have to cast it to the proper type when you go to use it.
 
  • #4
RedX said:
In C, why do you have to define a pointer type? For a 32 bit computer, the address is 4 bytes long no matter what, so it shouldn't matter if the pointer is to a char or an int, yet I get a compiler warning when I use a char pointer to point to the address of an int.

The type of a pointer is important information when you access the pointer. If ptr is declared as a pointer of some type, then it holds an address. You can access what's at that address by this expression: *ptr. Without knowing the type of the pointer, you (and the compiler) wouldn't know whether to get just the byte at that address or the four bytes starting at that address (for an int) or the eight bytes starting at that address (for a double) or whatever.

Different data types take different amounts of storage, so declaring a pointer as being a pointer to some specific type makes it possible to access a block of memory of the right size for that type.
 
  • #5
If I declare an array like this:

char array[]="testing";

then there is automatically a pointer that I can use called array:

printf("%p", array);

which is equal to

printf("%p", &array[0]);

I didn't ask for the pointer, but somehow C has it built in that the name of an array acts as a pointer.

And so it seems that C also does it in reverse! I didn't ask for a string, but somehow C has it built in that a pointer to a char acts as a char array (a string):

char* ptr;
ptr="testing";
printf("%c", ptr[0]);

which will output t, the first letter in the string "testing"!

What confuses me is that this doesn't hold for data types other than char. For example:

int* ptr;
*ptr=5;
printf("%d", *ptr);

compiles, but outputs "segmentation fault". Same with:

int* ptr;
ptr=5;
printf("%d", *ptr);

So would it be correct to say that whenever you create a pointer to a char, you also define an array that has the same name as the pointer? As in the example I presented, char *ptr; defines an array called ptr[]?

Even this works:

char* ptr;
printf("%c", ptr[1]);

where I didn't even use ptr="testing";
 
Last edited:
  • #6
Actually, something weird is going on. If I type this code:

main(){
char* ptr;
printf("enter a sentence");
scanf("%s", ptr);
printf("you entered: %s", ptr);
}

then the compiler doesn't complain, but I get a segmentation fault. However, if I put in a 25 before the %s like this:

main(){
char* ptr;
printf("enter a sentence");
scanf("25%s", ptr);
printf("you entered: %s", ptr);
}

the compiler doesn't complain, I don't get a segmentation fault, but I get funny symbols that appear on my screen.

I have no idea what's going on. Maybe I don't understand scanf correctly. What does the 25 before the %s do?

Here is a program that works:

main(){
char arr[25];
printf("enter a sentence");
scanf("%s", arr);
printf("you entered: %s", arr);
}

but I don't know why this one works and the other one doesn't!
 
  • #7
RedX said:
If I declare an array like this:

char array[]="testing";

then there is automatically a pointer that I can use called array:

printf("%p", array);

which is equal to

printf("%p", &array[0]);

I didn't ask for the pointer, but somehow C has it built in that the name of an array acts as a pointer.
Yes. The name of an array evaluates to the address of the first byte in memory of the array.
RedX said:
And so it seems that C also does it in reverse! I didn't ask for a string, but somehow C has it built in that a pointer to a char acts as a char array (a string):

char* ptr;
ptr="testing";
printf("%c", ptr[0]);

which will output t, the first letter in the string "testing"!

What confuses me is that this doesn't hold for data types other than char. For example:

int* ptr;
*ptr=5;
printf("%d", *ptr);

compiles, but outputs "segmentation fault".
The reason the first example works and the second doesn't is that in the first example, the compiler allocates storage for the pointer variable, and allocates storage in the string table for the characters in the string, and then assigns the address of the first byte of the string to ptr. Notice that when the compiler allocates memory for a pointer it allocates only enough space to hold an address. It doesn't allocate storage for what the pointer will eventually point to.
In the second example, the compiler allocates storage for the pointer variable, which, like ptr in the first example, is uninitialized. Since there is a garbage address in ptr, the statement *ptr = 5 attempts to put the value 5 in whatever random location ptr points to.

Always, when you have a pointer variable, you need to make sure that the location pointed to is suitable for storing whatever you intend to store there.
RedX said:
Same with:

int* ptr;
ptr=5;
printf("%d", *ptr);

So would it be correct to say that whenever you create a pointer to a char, you also define an array that has the same name as the pointer? As in the example I presented, char *ptr; defines an array called ptr[]?
Yes. In most situations, a pointer (to char) variable can be treated as if it were an array of char, and vice versa.
RedX said:
Even this works:

char* ptr;
printf("%c", ptr[1]);

where I didn't even use ptr="testing";

The problem with this example is that ptr is an uninitialized pointer. There is an address stored in ptr, but it's not one that you set. Whatever random character happens to be at location (ptr + 1) is what is displayed in the printf statement.
 
  • #8
RedX said:
Actually, something weird is going on. If I type this code:

main(){
char* ptr;
printf("enter a sentence");
scanf("%s", ptr);
printf("you entered: %s", ptr);
}

then the compiler doesn't complain, but I get a segmentation fault.
ptr is an uninitialized pointer variable, which is generally a bad thing. The call to scanf attempts to store whatever you type into the memory pointed to by ptr. This could overwrite something important or could be in a part of memory that your program does not have access to. Either way, this is a bad thing.
RedX said:
However, if I put in a 25 before the %s like this:

main(){
char* ptr;
printf("enter a sentence");
scanf("25%s", ptr);
printf("you entered: %s", ptr);
}

the compiler doesn't complain, I don't get a segmentation fault, but I get funny symbols that appear on my screen.
I have no idea what the effect of 25 in the format control string does, but it's not anything useful.

You still have an uninitialized pointer variable - one that is pointing to a random byte in memory somewhere. scanf is storing a string of characters starting at that location, and that's not a good thing.
RedX said:
I have no idea what's going on. Maybe I don't understand scanf correctly. What does the 25 before the %s do?
I have no idea either.
RedX said:
Here is a program that works:

main(){
char arr[25];
printf("enter a sentence");
scanf("%s", arr);
printf("you entered: %s", arr);
}

but I don't know why this one works and the other one doesn't!

The difference between the two examples is that now you have allocated memory to hold the string.

Alternatively, you could do this:
Code:
#include <stdlib.h>
main()
{
   char* ptr = malloc(25);
   printf("enter a sentence (no more than 24 characters)");
   scanf("%s", ptr);
   printf("you entered: %s", ptr);
}

malloc allocates memory (25 bytes in this case) on the heap, and returns the address of this memory. After the call to malloc, ptr contains the address of this memory.
 
  • #9
Thanks, that was very helpful.

I noticed that if I typed in:

main(){
int* ptr;
printf("%p",&ptr[0]);
printf("%p",&ptr[1]);
printf("%p",&ptr[2]);
}

it gives:

0x42aff4
0x42aff8
0x42affc

whereas if ptr were a char pointer, the addresses would differ by only one byte instead of four.

So integers are really big, as 4 bytes allow you to address 2^32=4 billion integers. But I also tried this with the ptr being "long int", and I still get 4 bytes in size instead of something like 8! My compiler is gcc, so maybe gcc doesn't respect "int" versus "long int"? Also, those hex addresses are 24 bits in size, and not 32. I don't know why this is. I'm using a virtual machine to run gcc and I only gave my virtual machine 800 MB of system memory, so maybe gcc limited by address size to 24 bits because of my small system memory? But I thought all memory was handled by the virtual OS, so that it would provide me with the illusion of 4 gigs of dedicated memory by swapping out hard disk? But then again maybe I didn't give my virtual machine enough hard disk space. I can try df in the command prompt, and it says my virtual hard drive has over 4 gigs of free space, so there shouldn't be any problems. Does anyone know why this is happening, that the hex values are 24 bits instead of 32?

I know that you should always initialize your pointers, but as it turns out, if you don't initialize a pointer and store a value by dereferencing your uninitialized pointer, then gcc allows this if the pointer is a char type, but will give you a segmentation fault if it is an int type. I guess as you say, something is different with strings, that there is a string table which gets mapped onto pointers, but there is no integer table that can get mapped onto pointers.
 
  • #10
But I also tried this with the ptr being "long int", and I still get 4 bytes in size instead of something like 8!

Try "long long".

Also, those hex addresses are 24 bits in size, and not 32.

They are 32 bits, but their highest 8 bits are zero.
 
  • #11
RedX said:
Thanks, that was very helpful.

I noticed that if I typed in:

main(){
int* ptr;
printf("%p",&ptr[0]);
printf("%p",&ptr[1]);
printf("%p",&ptr[2]);
}

it gives:

0x42aff4
0x42aff8
0x42affc

whereas if ptr were a char pointer, the addresses would differ by only one byte instead of four.

So integers are really big, as 4 bytes allow you to address 2^32=4 billion integers. But I also tried this with the ptr being "long int", and I still get 4 bytes in size instead of something like 8! My compiler is gcc, so maybe gcc doesn't respect "int" versus "long int"?
Back about 12 to 15 years ago, C compilers for PCs typically had the short int and int types being the same size - 2 bytes, and 4 bytes for the long int type. With the prevalence of 32-bit processors and OSes since then, the size of an int was changed to 4 bytes, so that it is now the same as that of the long int type.
RedX said:
Also, those hex addresses are 24 bits in size, and not 32. I don't know why this is. I'm using a virtual machine to run gcc and I only gave my virtual machine 800 MB of system memory, so maybe gcc limited by address size to 24 bits because of my small system memory? But I thought all memory was handled by the virtual OS, so that it would provide me with the illusion of 4 gigs of dedicated memory by swapping out hard disk? But then again maybe I didn't give my virtual machine enough hard disk space. I can try df in the command prompt, and it says my virtual hard drive has over 4 gigs of free space, so there shouldn't be any problems. Does anyone know why this is happening, that the hex values are 24 bits instead of 32?

I know that you should always initialize your pointers, but as it turns out, if you don't initialize a pointer and store a value by dereferencing your uninitialized pointer, then gcc allows this if the pointer is a char type, but will give you a segmentation fault if it is an int type. I guess as you say, something is different with strings, that there is a string table which gets mapped onto pointers, but there is no integer table that can get mapped onto pointers.
I don't know enough about the gcc compiler to know why it does what it does with uninitialized char pointer variables. In any case, it's a bad idea to store data at an address pointed to by an uninitialized pointer variable.
 
  • #12
On a side note, I've wondered why this is allowed:

char * charptr = "string";

and this is not allowed:

int * intptr = {1,2,3,4,5};

while these are allowed:

char chararray1[5] = "1234";
char chararray2[5] = {'1','2','3','4',0}
int intarray[5] = {1,2,3,4,5};

char *charptr1 = chararray1;
char *charptr2 = chararray2;
int *intptr = intarray;
 
  • #13
rcgldr said:
On a side note, I've wondered why this is allowed:

char * charptr = "string";

and this is not allowed:

int * intptr = {1,2,3,4,5};

while these are allowed:

char chararray1[5] = "1234";
char chararray2[5] = {'1','2','3','4',0}
int intarray[5] = {1,2,3,4,5};

char *charptr1 = chararray1;
char *charptr2 = chararray2;
int *intptr = intarray;

It's just part of C and C++ that character arrays are treated differently from arrays of other types in C, and string literals are different from other kinds of literals, such as 37 or 25.3 or '0x30' and so on.

With regard to your first example,

char * charptr = "string"; // allowed

The string literal on the right evaluates to the address in memory where it is stored (i.e., where the first byte is stored), and the compiler takes care of storing s t r i n g and a null character in the string table. The pointer variable is initialized with the address of the start of this string literal.

A similar example to your second example is the following
char * charptr = {'s', 't', 'r', 'i', 'n', 'g', '\0'};

Just like you 2nd example, this isn't allowed either. Unlike the characters in the string literal "string", I think that what is happening is that the characters aren't actually stored anywhere, so there is no address that can be used to initialize charptr. To get the address of something, it has to actually exist in memory somewhere.
 
  • #14
Mark44 said:
The reason the first example works and the second doesn't is that in the first example, the compiler allocates storage for the pointer variable, and allocates storage in the string table for the characters in the string, and then assigns the address of the first byte of the string to ptr. Notice that when the compiler allocates memory for a pointer it allocates only enough space to hold an address. It doesn't allocate storage for what the pointer will eventually point to.

Are you allowed to overwrite the string table?

Because this program produces a segmentation fault:

main(){
char* ptr;
ptr="this is a test";
scanf("%s",ptr);
}

If the line ptr="this is a test";
inserts the address of the string table where "this is a test" begins into ptr, then from what you say the only reason I can think of for why this doesn't work is that you're not allowed to rewrite a string table.
 
  • #15
Are you allowed to overwrite the string table?

The outcome is undefined. Some compilers/operating systems will allow it, some will put it in a write-protected memory region. Sometimes the compiler might even "save" memory by reusing strings: if you write

char* p1 = "this is test";
char* p2 = "this is test;

it is possible that p1 and p2 will point to the same place.

Short answer, don't do it. And try to declare pointers to the string table as "const".
 
  • #16
Thanks everyone. I tried googling examples of strings and C, but none of the examples I found provided your level of knowledge.

I understand it now.

I'll stay away from the string table, and just use:

char string[100]="...";

so that I can have strings stored in the normal data space instead of a string table. That way I can always change it regardless of compiler, and don't have to worry about initializing pointers.

Something like:

char* ptr="...";

is just asking for trouble.
 
  • #17
Redx,
You should learn this, and understand it. If you do not understand the concept of pointers you should forget c.

You need to understand the data types, and understand the difference between initialization and assignement. They are two entirely different concepts in c and c++.

First,, whenever you declare a pointer, initialize it. ALWAYS.

char *message = NULL;

Doing that simple inialization will force errors when you try and do stupid assignments..

C will allow you to do anything, and sometimes it will print the correct answer, because the stack is clean. But you have to understand it.
 
  • #18
Would it be correct to say that C prevents you from writing self-modifying code by preventing you access to the addresses where code is stored, and that is the primary reason that not initializing pointers is dangerous? And why can't the language automatically set each pointer to NULL? Is this just a waste of the compiler's time?
 
  • #19
RedX said:
Would it be correct to say that C prevents you from writing self-modifying code by preventing you access to the addresses where code is stored, and that is the primary reason that not initializing pointers is dangerous? And why can't the language automatically set each pointer to NULL? Is this just a waste of the compiler's time?

First, some compilers have options that force automatic initialization, but forget that. C does not guarantee the initial state of any variable. The fact that your compiler might do it for you, it is poor practice.

forget pointers.. let's stick to a simple example.

int a; // what is the value of a? who knows

int a = 0; // now you know what it is

now let's foget strings. just confuses people..

int *a = NULL; // it is pointer.. initialized to NULL.. try and do anyting in this state and you will get an easy to understand error.

int b[5]; // now we have an array of integers.. 5 of them.. what are their values?? anything
int b[5] = {1,2,3,4,5}; // now we have the same array, with initial values.. cleaner.
or

int b[] = { 1,2,3,4,5} // same as above.


so you can access b like this.

b[0] = 2;
b[1] = 3;
b[2] = 4;
b[3] = 5;
b[4] = 6;
b[5] = 7; // Oops this is wrong.. the size of b is 5 memory spots,, 0 - 4, not 1 -5 and not 0-5 some compiliers will catch this.


// what is the difference between an array and a pointer.. nothing.. in reality.. but a lot in implementation. both are blocks of memory.

a = &b[0] // give a the start of our array b..

a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
a[5] = 6; // again.. big no-no. though the compiler will not catch it.. it has no idea how big a is.. b it knows, because it is declared space, a is not. it is just a pointer..

also notice I did not have to do *a to assign.. why? because a pointer and an array are the same thing, except for how the memory is allocated.
With a pointer, you assume all liability for where it points, and that the memory block is valid.

with an array, it is allocated by the compiler on the stack. ( you should avoid this ).

instead of this a[0] = 1 you can do this.

*a = 1;
and then keep incrementing the pointer, but array notation is easier.

or even
*(a+1)=2; // this is position a[1]..

I have not touched memory allocation because it will confuse the situation.. you have to understand the above examples...




The beauty of C is that you can do anything, and the pitfall of C is that you can do anything. It is the reason that you learned Pascal first, then C.
 
  • #20
RedX said:
that is the primary reason that not initializing pointers is dangerous?
Once upon a time, writing through a random pointer could corrupt the operating system. These days, your only fear is giving a hook for hackers.

However, errors with pointers are notoriously difficult to debug -- initializing everything is one approach to trying to ameliorate that problem.

Example 1: You might not notice an error when debugging and seeing that your program wrote through a pointer with value 0xb854cca0, but it's stands out if the value is 0x00000000!

Example 2: A particularly subtle bug is when you sometimes (or always!) fail to initialize a needed variable in your function. However, because of the way it's used, along with compiler, operating system, hardware behavior, that variable by chance picks up a correct or reasonable value. e.g. if a variable is supposed to be initialized to zero, you write a program specifically to try and debug it -- you'll never encounter the error if the operating system initializes stack to all 0's!


And why can't the language automatically set each pointer to NULL? Is this just a waste of the compiler's time?
It could. Java takes that approach. C rejects it, I presume for the sake of efficiency -- a very short function function call might spend more time in irrelevant initialization than actually using useful work, and if it gets called a billion times...

A good compiler might be able to eliminate redundant initializations or delay initialization until it knows it to be useful. But there will be limitations on how well it performs, and offers extra chances for the compiler to produce buggy code. Also, recall that C was standardized back when compilers were fairly dumb, relatively speaking.


Some compilers or other environments offer "debug" modes that do a lot of things to help you debug, though. e.g. one version of malloc on some systems will fill memory with the byte pattern 0xdeadbeef. So when some of your variables start showing up with that hex value, you know something is wrong. :smile:
 

1. What are pointers in C?

Pointers in C are variables that store the address of another variable. They are used to indirectly access and manipulate data in memory.

2. How do you declare and initialize a pointer in C?

To declare and initialize a pointer in C, you use an asterisk (*) before the variable name. For example: int *ptr; This declares a pointer named "ptr" that can store the address of an integer variable.

3. What are strings in C?

Strings in C are sequence of characters that are terminated by a null character (\0). They are stored as arrays of characters and are commonly used to store and manipulate text.

4. How do you declare and initialize a string in C?

To declare and initialize a string in C, you use the array notation. For example: char str[10] = "Hello"; This declares a string named "str" that can store up to 10 characters and initializes it with the value "Hello".

5. How do you fix compiler warnings in C?

To fix compiler warnings in C, you need to identify the cause of the warning and make the necessary changes in your code. Common ways to fix warnings include providing proper data types for variables, using correct syntax, and addressing uninitialized variables or unused variables.

Similar threads

  • Programming and Computer Science
Replies
20
Views
1K
  • Programming and Computer Science
Replies
2
Views
932
  • Programming and Computer Science
Replies
5
Views
1K
  • Programming and Computer Science
Replies
19
Views
2K
  • Programming and Computer Science
4
Replies
118
Views
6K
  • Programming and Computer Science
Replies
7
Views
1K
  • Programming and Computer Science
Replies
5
Views
883
  • Programming and Computer Science
Replies
22
Views
2K
  • Programming and Computer Science
Replies
12
Views
1K
  • Programming and Computer Science
Replies
7
Views
3K
Back
Top