Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Critique my C++ code

  1. Aug 22, 2013 #1
    These are some functions I'm using to encode and decode text.

    bitstring returns a string of bits representing the ascii values of the characters in the string passed to the function. For example, passing in "Hi" changes it to "0100100001101001" since (int)'H'=72=01001000 and (int)'i'=105=01101001. The function compress_bitstring takes any sequence of 3 to 9 of the same bits and compresses them by putting the number followed by the bit. For example, "0100100001101001" is changed to "01001401101001". Then the other 2 functions are for undoing the functions I just described.



    Code (Text):

    #include <iostream>
    #include <sstream>
    #include <string>
    #include <math.h>


    static const size_t charbits = 8;  // note that sizeof(char) is 1 by the C++ standard


    static void bitstring(std::string& str) {
        std::string str2(charbits*str.length(), '0');
        for (unsigned i = 0; i < str2.length(); i += charbits) {
            int this_ascii = (int)str[i/charbits];
            for (unsigned j = 0; j < charbits ; j++) { // NOTE: ASCII values are 0-127, 00000000-01111111
                if (this_ascii % 2) str2[i+charbits-j-1] = '1';
                this_ascii /= 2;
            }
        }
        str = str2;
    }


    static void undo_bitstring(std::string& str) {
        for (unsigned i = 0; i < str.length(); i += charbits) {
            int ascii_val = 0;
            for (unsigned j = 0; j < charbits; j++) {
                if (str[i+j] == '1') ascii_val += (int)exp2(charbits-j-1);
            }
            str[i/charbits] = (char)ascii_val;
        }
        str.erase(str.begin()+str.size()/charbits, str.end());
    }

    static void compress_bitstring(std::string& str) {
        std::stringstream ss;
        std::string cur_num;
        int count = 1;
        for (unsigned i = 0; i < str.size()-1; ++i) {
            if (str[i] == str[i+1]){
                ++count;
                cur_num = str[i];
                if (i == str.size()-2) {
                    ss << count << cur_num;
                } else {
                    if (count == 9) {
                        ss << count << cur_num;
                        count = 0;
                    }
                }
            } else if (count > 2) {
                ss << count << cur_num;
                count = 1;
                if (i == str.size()-2) {
                    ss << str[i+1];
                }
            } else if (i != str.size()-2) {
                for (int j = 0; j < count; ++j) {
                    ss << str[i];
                }
                count = 1;
            } else {
                ss << str[i] << str[i];
            }
        }
        str = ss.str();
    }

    static void decompress_bitstring(std::string& str) {
        unsigned i = 0;
        int count;
        std::stringstream ss;
        while (i < str.size()) {
            std::stringstream to_int;
            to_int << str[i];
            to_int >> count;
            if (count > 1){
                for (int j = 0; j < count; ++j) {
                    ss << str[i+1];
                }
                i = i + 2;
            } else {
                ss << str[i];
                ++i;
            }
        }
        str = ss.str();
    }
     
     
  2. jcsd
  3. Aug 26, 2013 #2
    Firstly, where are the tests? Comments? Both are important, and would even increase the odds on people taking the time to check over your code, since it makes it easier to read, and gives them a way to run it without writing their own test code.

    Why static functions? Do you really intend them to only be visible in this compilation unit? Make sure you know what static does to a function in C++.

    No putting things on one line that should really be on two or more. I still suggest using curly braces, even when not strictly necessary, but definitely don't mash things onto one line just to save space.

    Use #include <cmath>, not #include <math.h>. Though they both should work, using math.h *in C++* is deprecated. It may stop working, eventually. Same goes for a number of other header files with a C origin.

    You can use <climits>, also, which contains CHAR_BIT, which you should most definitely use rather than defining your own. Try not to assume its value is 8.

    Use static_cast and friends, and not C style casts. For a good explanation, see the top answer here: c++ - Should I really use static_cast every single time I want to convert between primitive types? - Stack Overflow

    In for loops, use pre-increment, not post. For one, it is simply idiomatic and semantically correct. In some cases, it can be faster, but I won't get into that. But consider the difference between 'x = ++y;' and 'x = y++'. Then think which is more appropriate, semantically, for the for loop increment. I found this, and the top answer seems ok: c++ - Avoid Postfix Increment Operator - Programmers Stack Exchange

    I'd normally suggest using bit shifting, masks, and so on rather than multiply/divide/exp2/etc when using factors of 2. It's faster, generally, and any programmer that doesn't understand what's going on is probably not well qualified. However, all that is still more complicated than necessary.

    I'd suggest you look very closely at bitset. I wrote my own versions of your bitstring and undo_bitstring functions using bitset, and it is far, far simpler and more easy to maintain. Do these functions *really* end up being the bottleneck in a program, such that you need to optimize them? Doubtful. Don't optimize unnecessarily. Don't be wasteful and inefficient, sure, but don't waste time and increase maintenance costs just to squeeze a few nanoseconds, either! In this case, use the already written bit handling code in bitset. Consider that over 100,000 iterations with the string 'Hi' to binary digits and back with bitset, on my system, took slightly over 100ms, whereas bit shifting and so on was about 55ms. Not worth the trouble in most cases.

    Not huge on the way you're modifying the parameter, either, though technically ok (not a const parameter). Since your str = str2; line will copy anyways, why not just return the result by value? Still, particularly depending on intended use, there's room for argument, here, I suppose.

    You might also document your functions. Look at doxygen. For example:

    Code (Text):

    /*! Convert string parameter to binary representation.
     *
     * \param str String of characters to convert to a string representation of bits.
     * \return String of parameter's binary representation.
     */
     
    A note about comments that can't hurt: They should be brief and to the point, explaining what you're doing in general or pointing out complicated parts. Let them guide the reader. They should aid any reader in comprehending the code quickly, and yet not space out the code unnecessarily or be annoying to maintain. You must *always* check comments related to code when you change the code, so overdoing it isn't great either. It's a tough line to walk, but no comments is almost always too few.

    Use size_t when appropriate. Why use unsigned (int) when it should be size_t?

    I've long thought that knowing C is sometimes a barrier to learning C++ properly. In a way, forget that the two languages are related, and don't just do things the way you might in C by default. It's so often not the right way to do it. Just a personal observation.

    Do check out features in the new C++11 standard, as well. There's some awesome bits in there. Ranged based for loops, initializer lists... Much goodness.

    Haven't really had much of a look at the compress/decompress functions, but you should have plenty to work with, I hope. If you post bitset based versions of the bitstring and undo_bitstring functions, I'll show you what I wrote and we can compare.

    I also wrote my own versions of a function to calculate the mode of a vector of integers (mentioned in the last thread you posted like this). Also a good exercise for you, perhaps.

    Comments on my comments are welcome, as usual, from anyone. Hope this helps you! Good luck.
     
  4. Aug 26, 2013 #3
    How did you do it?

    I think that making a binary tree class does the job the best. Any comments on the following code?

    Code (Text):

    using namespace std;

    class binArr {
       
    private:
        typedef struct node {
            node* next[2];
            int cnt;
        };
        node* root;
        node* newNode();
        void clearBranch(node*);
        int mostVal;    
        int mostCnt;

        public:
        binArr();
        ~binArr();
        void add(int);
        void addArray(int, int*);
        int getMost(){return mostVal;}
    };

    void binArr::addArray(int size, int* intArr) {
        for (int i = 0; i < size; i++) {
            add(intArr[i]);
        }
    }

    void binArr::add(int newVal) {
        node* thisPtr = root;
        int thisVal = newVal;
        while(!thisVal) {
            int nextBit = thisVal % 2;
            if (thisPtr->next[nextBit] == NULL) {
                thisPtr->next[nextBit] = newNode();
            }
            thisPtr = thisPtr->next[nextBit];
            thisVal /= 2;
        }
        thisPtr->cnt++;
        if (thisPtr->cnt > mostCnt) {
            mostCnt = thisPtr->cnt;
            mostVal = newVal;
        }
    }

    binArr::node* binArr::newNode() {
        node* newNode = new node;
        newNode->cnt = 0;
        newNode->next[0] = NULL;
        newNode->next[1] = NULL;
        return newNode;
    }


    void binArr::clearBranch(node* branch) {
        if (branch) {
            clearBranch(branch->next[0]);
            clearBranch(branch->next[1]);
            delete branch;
        }
    }

    binArr::binArr() {
        root = newNode();
        mostCnt = 0;
    }

    binArr::~binArr() {
        clearBranch (root);
    }

    int main() {
        int arr[] = {9, 3, 2, 11, 87, 4, 3, 9, 21, 11, 91, 11, 9, 2, 9};
        binArr B;
        B.addArray(sizeof(arr)/sizeof(int), arr);
        cout << "The mode of arr is " << B.getMost();
        return 0;
    }
     
     
  5. Aug 26, 2013 #4
    Should I make a separate .h file that has the machinery for running tests?
     
  6. Aug 27, 2013 #5
    In general, you can have a .h for your class/function declarations as usual, and perhaps a separate test program that uses them. There are many ways to do it. Though for our purposes, a main() in the same file that runs through some more comprehensive tests works, and makes our lives easier (no Makefiles and linking to worry about, for example). In real life, you'd maybe look at some kind of test framework.

    Some additional tests might reveal an issue with your (mode) algorithm. For example, add the number 43 to the end of the array and see what happens.

    Other than that, incorporate advice you've already received, I'd say. A lot of previous advice you've gotten applies to the mode code as well.
     
  7. Aug 27, 2013 #6
    It returned 43 as the mode. I can't figure out why this happened though .....
     
  8. Aug 27, 2013 #7
    Looking at your add() function, there's some issues there. It's not doing what you're expecting, so find out exactly what's happening.

    I usually develop with various std::cout calls to see what's going on while I develop. But you also need to really learn to use and love your debugger. Step through the code and see what happens. It should be apparent rather quickly what it's doing and why it always returns the last element in the input array.

    I've noticed so many professional developers that don't use the debugger, and that's a shame. I've never understood why that is, but don't be one of those people. Possibly some of the most important advice I give to neophite programmers, and sadly one of the most ignored.
     
  9. Aug 27, 2013 #8
    That's great advice. Thanks!
     
  10. Sep 22, 2013 #9
    Your latest thread reminded me of this one, and that I had written my own code to show you how I might approach these problems. I may as well share them so you have the chance to learn from them. You've already done the work, and enough time has passed that it's not likely to be of use to you in cheating on anything, so I presume the mods won't mind. Here are far simpler versions of your encoding/decoding functions:

    Code (Text):

    #include <string>
    #include <bitset>
    // For CHAR_BIT, i.e. number of bits per char. Usually 8 bits, but you
    // never know.
    #include <climits>

    /*! Convert string parameter to binary representation.
     *
     * \param str The character string to convert to a string of bits.
     * \return String of parameter's binary representation.
     */
    std::string bitstring(const std::string& str)
    {
        std::string result;
        result.reserve(str.size()*CHAR_BIT);

        for (size_t i = 0; i < str.size(); ++i)
        {
            result.append(std::bitset<CHAR_BIT>(str[i]).to_string());
        }

        return result;
    }

    /*! Convert string with binary representation back to a string of the
     *  characters it represents.
     *
     * \param str The character string of bits to convert back to a string if
     *     characters.
     * \return String with characters represented by the bits in the input string
     *     parameter.
     */
    std::string undo_bitstring(const std::string& str)
    {
        std::string result;
        size_t num_chars = str.size() / CHAR_BIT;
        result.reserve(num_chars);

        for (size_t i = 0; i < num_chars; ++i)
        {
            std::string sub_str = str.substr(i*CHAR_BIT, CHAR_BIT);
            char c = static_cast<char>(std::bitset<CHAR_BIT>(sub_str).to_ulong());
            result.append(1, c);
        }

        return result;
    }
     
    You could make faster versions, but the added complexity isn't worth the loss of clarity, unless it becomes a bottleneck. Note that I'm returning the resulting string by value. Return Value Optimization (RVO) will almost certainly eliminate the copying, anyways. Unless you find it to be a problem in a profiler, as a general rule, don't try and optimize prematurely. Don't be wantonly inefficient, either, mind you. For example, reserving space in those strings doesn't really make it harder to follow, and it's easy to see why that could help performance.

    If anyone has any constructive criticism of this code, please do share. I'm always looking for ways to improve.

    I also have a version of the mode finding function that is more efficient, flexible, shorter and more understandable. If you show me a working version of yours, I'll show you mine. :)
     
  11. Sep 22, 2013 #10
    Can you give me a hint about what you did?
     
  12. Sep 22, 2013 #11
    One hint: unordered_map. :)

    For bonus points, so to speak, make it a template that can operate on a vector of arbitrary element types. For example, maybe it can operate on a vector of strings. The logic is basically the same.

    I'll just add that I did use features from C++11. For g++, you can add the option '-std=c++11' to enable it.
     
  13. Sep 22, 2013 #12
    k I'll work on this
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook