Efficient C++ String Manipulation: Tips and Tricks from Experts

Mark44 · Oct 12, 2020

jtbell said:

Of course, nobody would actually do it that way in practice, but it makes explicit the fact that str2 has length 4 whereas str1 has length 3.

Just to be clear, str2 has 4 bytes allocated, but its length, as determined by strlen() would be 3.

jtbell · Oct 12, 2020

Yep, I should have said "size" (which includes the terminating null in a C-string) instead of "length" (which doesn't).

Some fun with size versus length:

C++:

#include <iostream>
#include <iomanip>

using namespace std;

//--------------------------------------------------------------------------
// Display the contents of a char array as 2-digit hexadecimal numbers, etc.

void DisplayHex (char carray[], int size)
{
    cout << "as hex: ";
    for (int k = 0; k < size; ++k)
    {
        cout << hex << setfill('0') << setw(2) << (int)carray[k] << " ";
    }
    cout << endl;
    cout << "as C-string: " << carray << endl;
    cout << "sizeof = " << size << endl;
    cout << "strlen = " << strlen(carray) << endl;
}

//--------------------------------------------------------------------------

int main ()
{
    char animal[] = {'c', 'a', 't', '\0'};
    cout << "\nChar array #1:  c a t \\0" << endl;
    DisplayHex (animal, sizeof(animal));

    char animals[] = {'c', 'a', 't', '\0', 'd', 'o', 'g'};
    cout << "\nChar array #2:  c a t \\0 d o g" << endl;
    DisplayHex (animals, sizeof(animals));

    cout << endl;

    return 0;
}

Output:

Code:

Char array #1:  c a t \0
as hex: 63 61 74 00 
as C-string: cat
sizeof = 4
strlen = 3

Char array #2:  c a t \0 d o g
as hex: 63 61 74 00 64 6f 67 
as C-string: cat
sizeof = 7
strlen = 3

Tom.G · Oct 12, 2020

yungman said:

I have to stress I can't thank you and a few others enough in helping me through this leaning process.

My oh my, how apropos!

yungman · Oct 13, 2020

Tom.G said:

My oh my, how apropos!

What do you mean? I have been thanking everyone that help me all the time. Read my past posts. I can't say enough thanks to Mark and Jarvis lately for all the help.

yungman · Oct 13, 2020

Hi
Other than working on the algebra problem for my grand daughter that I got stuck! :))

I have been reviewing pointers again as I have to use them in exercise of Class objects. I thought I understand very well...until I really get down and step by step looking at the address and content in it in Debug. Have I learn more. I have no question, just want to say the pointers is still the hardest subject so far in C++ for me. I actually re-write this part of my notes and use a working program with Immediate Window to show step by step using break points to show the address and content how the pointers work. I attached that page of my notes. I really got into it this time.

Thanks for all the help from all you guys.

Vanadium 50 · Oct 13, 2020

Tom.G said:

My oh my, how apropos!

yungman said:

What do you mean?

He means you said "leaning" and not "learning".

yungman said:

I have been thanking everyone that help me all the time. Read my past posts. I can't say enough thanks to Mark and Jarvis lately for all the help.

There have been 21 people who have helped you over the last bunch of threads. Deciding to thank two of them may be viewed as a snub by the other 19.

yungman · Oct 13, 2020

Vanadium 50 said:

He means you said "leaning" and not "learning".
There have been 21 people who have helped you over the last bunch of threads. Deciding to thank two of them may be viewed as a snub by the other 19.

Did you read, I said lately. I thank everyone that help me.

yungman · Oct 13, 2020

I just want to tell this as it can be funny...not for me!

I spent the time working on the pointers since yesterday because I have a program on Class Object that gave me like 20 error on the funniest thing. Since Class Object like the header file is similar to function but just external to the main file. So I worked on the pointers in function call to make sure I really understand this.

So I finally went back to the Class Object program...Still 20+ mistakes. Sure I saw some problems on the pointer which became very obvious after working on it. I fixed all those, but still 20+ mistakes. I started to look around because I have faith on the pointer part. Guess what I found?! I forgot to put in using namespace std; in the .h file! I put it in, it compiled one time through. Something is still not working, but this is a big step forward. Now it's just regular troubleshooting.

I am glad I spent last night and this morning to work on the pointers to pointers. Now it's crystal clear for me. I can spot the mistake in the Class program just like that!

That's why I keep making up programs to revisit the older topics in the former chapters. Now that I review pointers, dynamic memory allocations, the next is FILES, with seekp() and seekg() member functions. I feel this is the best way to learn for old leaky brain. Just do it over and over.

Tom.G · Oct 13, 2020

yungman said:

I just want to tell this as it can be funny...not for me!

I spent the time working on the pointers since yesterday because I have a program on Class Object that gave me like 20 error on the funniest thing. Since Class Object like the header file is similar to function but just external to the main file. So I worked on the pointers in function call to make sure I really understand this.

So I finally went back to the Class Object program...Still 20+ mistakes. Sure I saw some problems on the pointer which became very obvious after working on it. I fixed all those, but still 20+ mistakes. I started to look around because I have faith on the pointer part. Guess what I found?! I forgot to put in using namespace std; in the .h file! I put it in, it compiled one time through. Something is still not working, but this is a big step forward. Now it's just regular troubleshooting.

I am glad I spent last night and this morning to work on the pointers to pointers. Now it's crystal clear for me. I can spot the mistake in the Class program just like that!

That's why I keep making up programs to revisit the older topics in the former chapters. Now that I review pointers, dynamic memory allocations, the next is FILES, with seekp() and seekg() member functions. I feel this is the best way to learn for old leaky brain. Just do it over and over.

Yes! As they say, "Practice makes perfect."

Don't try to rush thru everything, try for a deeper understanding before jumping forward.

Many of use will read thru most or all of the documentation before we try a new language, or even a new program. That way we get a general idea of how it all fits together and an impression of the thought processes needed to understand it.

It sounds like you are starting to get this realization, follow thru on it and you may find it actually works for you! If not, you can always fall back on what does work.

Cheers,
Tom

yungman · Oct 13, 2020

Tom.G said:

Yes! As they say, "Practice makes perfect."

Don't try to rush thru everything, try for a deeper understanding before jumping forward.

Many of use will read thru most or all of the documentation before we try a new language, or even a new program. That way we get a general idea of how it all fits together and an impression of the thought processes needed to understand it.

It sounds like you are starting to get this realization, follow thru on it and you may find it actually works for you! If not, you can always fall back on what does work.

Cheers,
Tom

Thanks you for the encouragement. I needed that. Particularly I was tripped up by the algebra question from my little grand daughter. She's only freshman in high school! We Chinese called this the old cat got its whiskers burned.

Thanks

Jarvis323 · Oct 13, 2020

Just a heads up, it's not good to put using namespace... in a header file. If you put it in a source file, it's usually ok because it only affects the source file, but if you do it in a header file, it applies to every file that includes it, and every file that includes a file that includes it.

yungman · Oct 14, 2020

Jarvis323 said:

Just a heads up, it's not good to put using namespace... in a header file. If you put it in a source file, it's usually ok because it only affects the source file, but if you do it in a header file, it applies to every file that includes it, and every file that includes a file that includes it.

Thanks. Why is that so? I know I read from Mark also, but the books keep using it. I know I need to type std::count <<...; every time if I don't use namespace std. I was wondering why my invItem.h need it. I don't have cin or count:

C++:

#ifndef invItem_H
#define invItem_H
#include <cstring>
#include <string>
using namespace std;
class invItem
{
private:
    string* description;
    double cost;
    int units;
public:
//Constructor. pointer desc points to C-string, c for cost, u for units.
    invItem(string** desc, double c, int u)
    {
        string St2 = **desc;//St2 = St1 in main.
        int length = St2.length();
        description = new string[length];//allocate new memory using pointer description.
        *description = St2;//copy the description to memory
        *desc = description;
        cost = c; units = u;
    }

Which one of this code need using namespace std ? Do I just put std:: in front of the ones that need it? I just never look at this at all.

Thanks

Jarvis323 · Oct 14, 2020

yungman said:
Thanks. Why is that so? I know I read from Mark also, but the books keep using it. I know I need to type std::count <<...; every time if I don't use namespace std. I was wondering why my invItem.h need it. I don't have cin or count:
C++:
#ifndef invItem_H
#define invItem_H
#include <cstring>
#include <string>
using namespace std;
class invItem
{
private:
    string* description;
    double cost;
    int units;
public:
//Constructor. pointer desc points to C-string, c for cost, u for units.
    invItem(string** desc, double c, int u)
    {
        string St2 = **desc;//St2 = St1 in main.
        int length = St2.length();
        description = new string[length];//allocate new memory using pointer description.
        *description = St2;//copy the description to memory
        *desc = description;
        cost = c; units = u;
    }
Which one of this code need using namespace std ? Do I just put std:: in front of the ones that need it? I just never look at this at all.

Thanks

Like I said, it's usually fine in a cpp file, but should be avoided in .h files.

The answers here explain it better than I can.
https://stackoverflow.com/questions/5849457/using-namespace-in-c-headers

std:: goes in front of string, count, endl, ifstream, or any object from the C++ standard library.

yungman · Oct 14, 2020

Jarvis323 said:

Like I said, it's usually fine in a cpp file, but should be avoided in .h files.

The answers here explain it better than I can.
https://stackoverflow.com/questions/5849457/using-namespace-in-c-headers

std:: goes in front of string, count, endl, ifstream, or any object from the C++ standard library.

Thank you, no wonder I got error lining up on me! I only know cin and count!

That I can do on the .h files from now on.

yungman · Oct 14, 2020

I changed my .h files and use std::, works, no big deal. Just use that from now on.

I have a general, The book use C-String and char array a lot. I find using std::string a lot better. I don't have to worry about length, copying is easy, just St1 = St2 will copy over. Set up dynamic memory and file is easier. I don't see any advantage of using C-String. Not to mention, count is easier, just count << St1; That's it! I have been changing all the programs in the books from C-string to std::string for a while already. I hate dealing with the length when passing back and fore to function and all that.

Tell me what am I missing.

Thanks

pbuk · Oct 14, 2020

yungman said:

Tell me what am I missing.

You are missing the point that was discussed about 500 posts ago in a different thread that the ease of use of the Standard Template Library (STL) classes comes with three penalties:

The compier has to add the relevant parts of the STL code to your executable. If you are writing code for a microcontroller in a toaster or a fighter jet you may not have enough program memory to store it.
STL programs achieve their ease of use by liberal use of heap memory: for instance when you concatenate a single char to a 128 char long string it will leave the existing string on the heap and add a new string 129 chars long. If you are writing code for a microcontroller in a toaster or a fighter jet you may not have enough RAM to store it all potentially leading to burnt toast or losing control of your plane.
From time to time you may need to deal with an ever-growing heap by deleting the no longer needed space and moving the current references into it - this is called 'garbage collection' (GC). The STL can do this for you, or you can do it yourself but whichever way you do it you can't use the heap while it is happening. Toast that is overdone by 100 ms is not a problem, but it could be critical in say the stability controller of an aeroplane. Even with greater computing power, GC can be a problem in applications where timing is critical e.g. Digital Signal Processing (DSP).

So only use the STL when you have enough resources to do so and timing is not critical. But of course if you have plenty of resources and timing is not critical then you may as well use a more modern language than C++ that is easier to program in and debug! Microcontrollers with multi-core processors and large memories are now on the market that allow you to do exactly that e.g. in CircuitPython, Lua or Node.js.

yungman · Oct 14, 2020

pbuk said:

You are missing the point that was discussed about 500 posts ago in a different thread that the ease of use of the Standard Template Library (STL) classes comes with three penalties:

The compier has to add the relevant parts of the STL code to your executable. If you are writing code for a microcontroller in a toaster or a fighter jet you may not have enough program memory to store it.

STL programs achieve their ease of use by liberal use of heap memory: for instance when you concatenate a single char to a 128 char long string it will leave the existing string on the heap and add a new string 129 chars long. If you are writing code for a microcontroller in a toaster or a fighter jet you may not have enough RAM to store it all potentially leading to burnt toast or losing control of your plane.

From time to time you may need to deal with an ever-growing heap by deleting the no longer needed space and moving the current references into it - this is called 'garbage collection' (GC). The STL can do this for you, or you can do it yourself but whichever way you do it you can't use the heap while it is happening. Toast that is overdone by 100 ms is not a problem, but it could be critical in say the stability controller of an aeroplane. Even with greater computing power, GC can be a problem in applications where timing is critical e.g. Digital Signal Processing (DSP).

So only use the STL when you have enough resources to do so and timing is not critical. But of course if you have plenty of resources and timing is not critical then you may as well use a more modern language than C++ that is easier to program in and debug! Microcontrollers with multi-core processors and large memories are now on the market that allow you to do exactly that e.g. in CircuitPython, Lua or Node.js.

Thanks so much for the reply. I did not know that. I don't recall I ever ask this question. I don't even know enough to ask.

Yes, this is VERY VERY important point. Particularly all the CPU relate design were MPU with very limited RAM. I have been talking about this all along that higher level language and all the fancy style programming assume there are unlimited amount of resources. I also complained a lot about the how slow the newer stuffs are.

Thank you.

Jarvis323 · Oct 14, 2020

In C++ there is no garbage collection. But variable size STL containers may allocate more than they need at the moment so they don't need to reallocate so frequently. For example, if you add to a string and the new length is longer than the allocated memory, then it will need to allocate new memory. But it will always delete the old memory at that time. Also, when the string goes out of scope, the memory is freed. There will never be garbage piling up so to speak.

Mark44 · Oct 14, 2020

yungman said:

The book use C-String and char array a lot.

These are almost the same, with the only difference being that a C-string is a null-terminated array of type char, and a char array is just an array of type char.
The C standard library functions rely on a string being null-terminated so that copying strings works correctly, the length is calculated correctly, appending strings works correctly, and so on.

yungman said:

C++:

#ifndef invItem_H
#define invItem_H
#include <cstring>                                   // 1) Delete this line
#include <string>                                            
using namespace std;
class invItem
{
private:
    string* description;
    double cost;
    int units;
public:
//Constructor. pointer desc points to C-string, c for cost, u for units.                   // 2) comment incorrect
    invItem(string** desc, double c, int u)
    {
        string St2 = **desc;//St2 = St1 in main.
        int length = St2.length();
        description = new string[length];//allocate new memory using pointer description.
        *description = St2;//copy the description to memory
        *desc = description;
        cost = c; units = u;
    }

The line with "#include <cstring>" should be deleted. The code shown doesn't use any C-strings.
The comment for the constructor is wrong on two counts.
1) The desc parameter is a pointer to a pointer to a string object.
2) The object being pointed to is not a C-string (which would be a null-terminated char array).

yungman · Oct 14, 2020

Thanks guys, I am the student here, so is it not good to use strings over c-strings, or it's not that much difference? This is way beyond my head, I don't know anything on the internal mechanism of these.

My issue with C-string is when I pass into a function, if the function want to change the content before returning back to main, I have to keep track with the different length. If the return c-string is longer, that won't work as the length is defined in the main already. I have to play around with it. For strings, it doesn't matter. If it is only more complicate in compiling stage, that's ok. If it takes up much more memory during execution, that's not good.

Of cause, I can always make it longer, but then I have to be careful with the termination character. I ran into problem before that it output bunch of garbage because the array is longer than the c-string and it kept outputing garbage!

Thanks

Mark44 · Oct 14, 2020

yungman said:

Thanks guys, I am the student here, so is it not good to use strings over c-strings, or it's not that much difference?

There's a big difference between C-strings (i.e., null terminated character arrays) and the standard template library (STL) string template class. C-strings are so called because this is how strings were implemented in C, before C++ came on the scene.
A C-string consists of consecutive bytes in memory, and nothing else. A string object contains methods, such as length and size and a lot more, in addition to the characters that make up the string.

yungman said:

My issue with C-string is when I pass into a function, if the function want to change the content before returning back to main, I have to keep track with the different length. If the return c-string is longer, that won't work as the length is defined in the main already.

When you pass a C-string to a function, what's being passed is a pointer to the first character of the string. If your function is intended to modify a C-string parameter, you'll run into problems if the function tries to insert more characters than the C-string had allocated to it.
Here's a simple example of what I'm talking about.

C++:

#include <cstring>        // For strcpy_s()
#include <iostream>
using std::count;
using std::endl;

void modifyStr(char * str);

int main()
{    
    char Str[10] = "dogs";
    count << "Before modification: " << Str << endl;
    modifyStr(Str);
    count << "After modification: " << Str << endl;
}

void modifyStr(char * str)
{
    strcpy_s(str, sizeof("hot dogs"), "hot dogs");
}

The copy will work as long as I don't try to copy a string that's too long to fit in the 10 bytes that are allocated for the Str array.
The strcpy_s() function is a MSFT extension that is more secure than the older standard library C function strcpy().

yungman · Oct 14, 2020

Thanks Mark

Yes, I understand about we only pass the address of C-String to function, that if I expect function to change it, I have to declare the char array longer to cover the extra length like in your program.

My question is whether I can make using std::string not wasting more space than using c-string. I have not study Std::string class in detail, I saw chapter on Std stuffs. I don't have any idea about the memory usage and speed. I have been using std::strings in my program for a while because it's much easier to use. If there is a true disadvantage on RAM usage and running speed, I have to go back and practice using c-string. I have to say I am not nearly as familiar with c-strings as the real string as I have decided to concentrate on using std::string a while back. So I need to determine whether I need to back track.

I am glad I ask this question, I never stop and think about this. std::strings are just so much easier to use. On top, it's easy to copy, concanticate and all the other things. A lot of c-string manipulation has to specify the length and all.

Thanks

Mark44 · Oct 14, 2020

yungman said:

My question is whether I can make using std::string not wasting more space than using c-string.

An STD string object will generally take up more space in memory, I believe, because in addition to the memory used to store the characters in the string, there is also memory set aside in reserve, in case you append some more characters to the string.

yungman · Oct 14, 2020

Mark44 said:

An STD string object will generally take up more space in memory, I believe, because in addition to the memory used to store the characters in the string, there is also memory set aside in reserve, in case you append some more characters to the string.

Ha ha, wrong answer! That's not I want to hear! I thought if I kept asking, I'd get a different answer!

Yeh, that's what I am afraid of already. I am working on a new program and I just made part 1 working, I already stopped and changing from strings to c-string to start over again right now. Well, it's better to know now than later. I am sure I will have a little learning curve as I am more familiar with strings at this point.

Thanks for your time.

Mark44 · Oct 14, 2020

yungman said:

That's not I want to hear!

We're just talking about a few bytes, say 20 to 30, so it's not like a massive hit on memory.

yungman · Oct 14, 2020

Mark44 said:

We're just talking about a few bytes, say 20 to 30, so it's not like a massive hit on memory.

Ah, it's time for me to get more familiar with c-string anyway, what is an extra day or so.

Thanks

yungman · Oct 14, 2020

Come to think about it, for example, if you want to input names, some Europeans have really long names. If you have to declare a char array to accommodate that, you have to set up a long array. It might end up wasting space for people with short names. std::strings might not be bad in this case.

BTW, surprisingly, I made the part 1 of my program work with c-strings already, not much issue. Just don't need as much pointers as the array itself is an address. It does make passing parameter easier.

pbuk · Oct 14, 2020

Jarvis323 said:

In C++ there is no garbage collection.

Yes, good catch. What I was talking about was actually heap fragmentation, but the effect on a microcontroller is much the same.

yungman · Oct 14, 2020

pbuk said:

Yes, good catch. What I was talking about was actually heap fragmentation, but the effect on a microcontroller is much the same.

Ha ha, You ruin my day!

But thank you for saying that. It's better to find out now than later. I am changing my program to c-string right now.

Thanks

jtbell · Oct 14, 2020

A std::string is a more complicated object (internally) than a C-string, and takes up more memory than a C-string (which is after all just an array of chars). For example, a std::string keeps track of how many characters it contains, as an int buried somewhere inside it.

You gain something in exchange for the extra space and internal complexity: the ability to do easily, things that might need a lot of work on your part if you had used C-strings. Consider this:

C++:

string firstName, lastName;
count << "Enter your first name: ";
cin >> firstName;
count << "Enter your last name: ";
cin >> lastName;
string fullName = firstName + " " + lastName;

Suppose you use C-strings instead. You now have to tell the user how many characters (max) that he can enter for each name, and you should be ready to deal with a stubborn user who tries to enter too many characters. With std::string, no problem. Each string allocates enough memory to accommodate whatever the user enters.

Then, when you put the names together to get the full name, you have to calculate how many total characters there are, and dynamically allocate an array of the correct size using "new". std::string takes care of all this for you automatically.

So there's a tradeoff. If you need to work close to hardware level, with "small" processors and limited amounts of memory, then C-style ways of doing things may be better, at the cost of you having to spend more programming time to get things to work right in all circumstances. If you're working with a typical desktop computer, tablet, or even phone, you have oodles of memory and lots of computing power, and can easily afford to let it do more of your work for you.

Efficient C++ String Manipulation: Tips and Tricks from Experts

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Attachments

Similar threads

How to increase phone signal strength by lying about it

Who is responsible for the software when AI takes over programming?

Use of AI (ML/DL) in Science

Could the reason why I can't select any kernels in VS Code be this error?

How useful is this if I want to begin programming?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight