Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Pointers and strings

  1. Aug 17, 2010 #1
    In C, why do you have to define a pointer type? For a 32 bit computer, the address is 4 bytes long no matter what, so it shouldn't matter if the pointer is to a char or an int, yet I get a compiler warning when I use a char pointer to point to the address of an int.

    Also, I don't quite understand how this program works:

    char* ptr;
    ptr="this is a test";

    This program prints out "this is a test", but how is a pointer to a char equal to a string? I thought pointers stored an address, and not values of data? It would make a little more sense if the pointer were dereferenced:

    char* ptr;
    *ptr="this is a test";

    but that still doesn't make sense as I never specified an address for the pointer to point to, so where exactly does the computer store "this is a test"?
  2. jcsd
  3. Aug 17, 2010 #2


    User Avatar
    Homework Helper

    In C, a string, "...", is allocated in the programs data space, but references to it can be the content, such as initializing an array (char a[] = "..."), or the address of the string, such as initializing a pointer (char * ptr = "...").

    Although all pointers are normally the same size, pointer types are used to prevent simple programming errors (wrong pointer type assignment).

    Your second example wouldn't work. *ptr refers to a single character addressed by that pointer.
  4. Aug 17, 2010 #3


    User Avatar
    Gold Member

    A string is an array of chars. The pointer stores the address of the array.

    A void pointer can be used as a generic pointer, but you have to cast it to the proper type when you go to use it.
  5. Aug 17, 2010 #4


    Staff: Mentor

    The type of a pointer is important information when you access the pointer. If ptr is declared as a pointer of some type, then it holds an address. You can access what's at that address by this expression: *ptr. Without knowing the type of the pointer, you (and the compiler) wouldn't know whether to get just the byte at that address or the four bytes starting at that address (for an int) or the eight bytes starting at that address (for a double) or whatever.

    Different data types take different amounts of storage, so declaring a pointer as being a pointer to some specific type makes it possible to access a block of memory of the right size for that type.
  6. Aug 18, 2010 #5
    If I declare an array like this:

    char array[]="testing";

    then there is automatically a pointer that I can use called array:

    printf("%p", array);

    which is equal to

    printf("%p", &array[0]);

    I didn't ask for the pointer, but somehow C has it built in that the name of an array acts as a pointer.

    And so it seems that C also does it in reverse! I didn't ask for a string, but somehow C has it built in that a pointer to a char acts as a char array (a string):

    char* ptr;
    printf("%c", ptr[0]);

    which will output t, the first letter in the string "testing"!

    What confuses me is that this doesn't hold for data types other than char. For example:

    int* ptr;
    printf("%d", *ptr);

    compiles, but outputs "segmentation fault". Same with:

    int* ptr;
    printf("%d", *ptr);

    So would it be correct to say that whenever you create a pointer to a char, you also define an array that has the same name as the pointer? As in the example I presented, char *ptr; defines an array called ptr[]?

    Even this works:

    char* ptr;
    printf("%c", ptr[1]);

    where I didn't even use ptr="testing";
    Last edited: Aug 18, 2010
  7. Aug 18, 2010 #6
    Actually, something weird is going on. If I type this code:

    char* ptr;
    printf("enter a sentence");
    scanf("%s", ptr);
    printf("you entered: %s", ptr);

    then the compiler doesn't complain, but I get a segmentation fault. However, if I put in a 25 before the %s like this:

    char* ptr;
    printf("enter a sentence");
    scanf("25%s", ptr);
    printf("you entered: %s", ptr);

    the compiler doesn't complain, I don't get a segmentation fault, but I get funny symbols that appear on my screen.

    I have no idea what's going on. Maybe I don't understand scanf correctly. What does the 25 before the %s do?

    Here is a program that works:

    char arr[25];
    printf("enter a sentence");
    scanf("%s", arr);
    printf("you entered: %s", arr);

    but I don't know why this one works and the other one doesn't!
  8. Aug 18, 2010 #7


    Staff: Mentor

    Yes. The name of an array evaluates to the address of the first byte in memory of the array.
  9. Aug 18, 2010 #8


    Staff: Mentor

    ptr is an uninitialized pointer variable, which is generally a bad thing. The call to scanf attempts to store whatever you type into the memory pointed to by ptr. This could overwrite something important or could be in a part of memory that your program does not have access to. Either way, this is a bad thing.
    I have no idea what the effect of 25 in the format control string does, but it's not anything useful.

    You still have an uninitialized pointer variable - one that is pointing to a random byte in memory somewhere. scanf is storing a string of characters starting at that location, and that's not a good thing.
    I have no idea either.
    The difference between the two examples is that now you have allocated memory to hold the string.

    Alternatively, you could do this:
    Code (Text):

    #include <stdlib.h>
       char* ptr = malloc(25);
       printf("enter a sentence (no more than 24 characters)");
       scanf("%s", ptr);
       printf("you entered: %s", ptr);
    malloc allocates memory (25 bytes in this case) on the heap, and returns the address of this memory. After the call to malloc, ptr contains the address of this memory.
  10. Aug 18, 2010 #9
    Thanks, that was very helpful.

    I noticed that if I typed in:

    int* ptr;

    it gives:


    whereas if ptr were a char pointer, the addresses would differ by only one byte instead of four.

    So integers are really big, as 4 bytes allow you to address 2^32=4 billion integers. But I also tried this with the ptr being "long int", and I still get 4 bytes in size instead of something like 8! My compiler is gcc, so maybe gcc doesn't respect "int" versus "long int"? Also, those hex addresses are 24 bits in size, and not 32. I don't know why this is. I'm using a virtual machine to run gcc and I only gave my virtual machine 800 MB of system memory, so maybe gcc limited by address size to 24 bits because of my small system memory? But I thought all memory was handled by the virtual OS, so that it would provide me with the illusion of 4 gigs of dedicated memory by swapping out hard disk? But then again maybe I didn't give my virtual machine enough hard disk space. I can try df in the command prompt, and it says my virtual hard drive has over 4 gigs of free space, so there shouldn't be any problems. Does anyone know why this is happening, that the hex values are 24 bits instead of 32?

    I know that you should always initialize your pointers, but as it turns out, if you don't initialize a pointer and store a value by dereferencing your uninitialized pointer, then gcc allows this if the pointer is a char type, but will give you a segmentation fault if it is an int type. I guess as you say, something is different with strings, that there is a string table which gets mapped onto pointers, but there is no integer table that can get mapped onto pointers.
  11. Aug 18, 2010 #10
    Try "long long".

    They are 32 bits, but their highest 8 bits are zero.
  12. Aug 18, 2010 #11


    Staff: Mentor

    Back about 12 to 15 years ago, C compilers for PCs typically had the short int and int types being the same size - 2 bytes, and 4 bytes for the long int type. With the prevalence of 32-bit processors and OSes since then, the size of an int was changed to 4 bytes, so that it is now the same as that of the long int type.
    I don't know enough about the gcc compiler to know why it does what it does with uninitialized char pointer variables. In any case, it's a bad idea to store data at an address pointed to by an uninitialized pointer variable.
  13. Aug 19, 2010 #12


    User Avatar
    Homework Helper

    On a side note, I've wondered why this is allowed:

    char * charptr = "string";

    and this is not allowed:

    int * intptr = {1,2,3,4,5};

    while these are allowed:

    char chararray1[5] = "1234";
    char chararray2[5] = {'1','2','3','4',0}
    int intarray[5] = {1,2,3,4,5};

    char *charptr1 = chararray1;
    char *charptr2 = chararray2;
    int *intptr = intarray;
  14. Aug 19, 2010 #13


    Staff: Mentor

    It's just part of C and C++ that character arrays are treated differently from arrays of other types in C, and string literals are different from other kinds of literals, such as 37 or 25.3 or '0x30' and so on.

    With regard to your first example,

    char * charptr = "string"; // allowed

    The string literal on the right evaluates to the address in memory where it is stored (i.e., where the first byte is stored), and the compiler takes care of storing s t r i n g and a null character in the string table. The pointer variable is initialized with the address of the start of this string literal.

    A similar example to your second example is the following
    char * charptr = {'s', 't', 'r', 'i', 'n', 'g', '\0'};

    Just like you 2nd example, this isn't allowed either. Unlike the characters in the string literal "string", I think that what is happening is that the characters aren't actually stored anywhere, so there is no address that can be used to initialize charptr. To get the address of something, it has to actually exist in memory somewhere.
  15. Aug 19, 2010 #14
    Are you allowed to overwrite the string table?

    Because this program produces a segmentation fault:

    char* ptr;
    ptr="this is a test";

    If the line ptr="this is a test";
    inserts the address of the string table where "this is a test" begins into ptr, then from what you say the only reason I can think of for why this doesn't work is that you're not allowed to rewrite a string table.
  16. Aug 20, 2010 #15
    The outcome is undefined. Some compilers/operating systems will allow it, some will put it in a write-protected memory region. Sometimes the compiler might even "save" memory by reusing strings: if you write

    char* p1 = "this is test";
    char* p2 = "this is test;

    it is possible that p1 and p2 will point to the same place.

    Short answer, don't do it. And try to declare pointers to the string table as "const".
  17. Aug 20, 2010 #16
    Thanks everyone. I tried googling examples of strings and C, but none of the examples I found provided your level of knowledge.

    I understand it now.

    I'll stay away from the string table, and just use:

    char string[100]="...";

    so that I can have strings stored in the normal data space instead of a string table. That way I can always change it regardless of compiler, and don't have to worry about initializing pointers.

    Something like:

    char* ptr="...";

    is just asking for trouble.
  18. Aug 22, 2010 #17
    You should learn this, and understand it. If you do not understand the concept of pointers you should forget c.

    You need to understand the data types, and understand the difference between initialization and assignement. They are two entirely different concepts in c and c++.

    First,, whenever you declare a pointer, initialize it. ALWAYS.

    char *message = NULL;

    Doing that simple inialization will force errors when you try and do stupid assignments..

    C will allow you to do anything, and sometimes it will print the correct answer, because the stack is clean. But you have to understand it.
  19. Aug 22, 2010 #18
    Would it be correct to say that C prevents you from writing self-modifying code by preventing you access to the addresses where code is stored, and that is the primary reason that not initializing pointers is dangerous? And why can't the language automatically set each pointer to NULL? Is this just a waste of the compiler's time?
  20. Aug 22, 2010 #19
    First, some compilers have options that force automatic initialization, but forget that. C does not guarantee the initial state of any variable. The fact that your compiler might do it for you, it is poor practice.

    forget pointers.. lets stick to a simple example.

    int a; // what is the value of a? who knows

    int a = 0; // now you know what it is

    now lets foget strings. just confuses people..

    int *a = NULL; // it is pointer.. initialized to NULL.. try and do anyting in this state and you will get an easy to understand error.

    int b[5]; // now we have an array of integers.. 5 of them.. what are their values?? anything
    int b[5] = {1,2,3,4,5}; // now we have the same array, with initial values.. cleaner.

    int b[] = { 1,2,3,4,5} // same as above.

    so you can access b like this.

    b[0] = 2;
    b[1] = 3;
    b[2] = 4;
    b[3] = 5;
    b[4] = 6;
    b[5] = 7; // Oops this is wrong.. the size of b is 5 memory spots,, 0 - 4, not 1 -5 and not 0-5 some compiliers will catch this.

    // what is the difference between an array and a pointer.. nothing.. in reality.. but alot in implementation. both are blocks of memory.

    a = &b[0] // give a the start of our array b..

    a[0] = 1;
    a[1] = 2;
    a[2] = 3;
    a[3] = 4;
    a[4] = 5;
    a[5] = 6; // again.. big no-no. though the compiler will not catch it.. it has no idea how big a is.. b it knows, because it is declared space, a is not. it is just a pointer..

    also notice I did not have to do *a to assign.. why? because a pointer and an array are the same thing, except for how the memory is allocated.
    With a pointer, you assume all liability for where it points, and that the memory block is valid.

    with an array, it is allocated by the compiler on the stack. ( you should avoid this ).

    instead of this a[0] = 1 you can do this.

    *a = 1;
    and then keep incrementing the pointer, but array notation is easier.

    or even
    *(a+1)=2; // this is position a[1]..

    I have not touched memory allocation because it will confuse the situation.. you have to understand the above examples...

    The beauty of C is that you can do anything, and the pitfall of C is that you can do anything. It is the reason that you learned Pascal first, then C.
  21. Aug 23, 2010 #20


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Once upon a time, writing through a random pointer could corrupt the operating system. These days, your only fear is giving a hook for hackers.

    However, errors with pointers are notoriously difficult to debug -- initializing everything is one approach to trying to ameliorate that problem.

    Example 1: You might not notice an error when debugging and seeing that your program wrote through a pointer with value 0xb854cca0, but it's stands out if the value is 0x00000000!

    Example 2: A particularly subtle bug is when you sometimes (or always!) fail to initialize a needed variable in your function. However, because of the way it's used, along with compiler, operating system, hardware behavior, that variable by chance picks up a correct or reasonable value. e.g. if a variable is supposed to be initialized to zero, you write a program specifically to try and debug it -- you'll never encounter the error if the operating system initializes stack to all 0's!

    It could. Java takes that approach. C rejects it, I presume for the sake of efficiency -- a very short function function call might spend more time in irrelevant initialization than actually using useful work, and if it gets called a billion times....

    A good compiler might be able to eliminate redundant initializations or delay initialization until it knows it to be useful. But there will be limitations on how well it performs, and offers extra chances for the compiler to produce buggy code. Also, recall that C was standardized back when compilers were fairly dumb, relatively speaking.

    Some compilers or other environments offer "debug" modes that do a lot of things to help you debug, though. e.g. one version of malloc on some systems will fill memory with the byte pattern 0xdeadbeef. So when some of your variables start showing up with that hex value, you know something is wrong. :smile:
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook