Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

[C++] Any suggestions on how to clean up this procedure?

  1. Nov 23, 2014 #1
    Any ideas on how I can clean up the following?

    Code (Text):

    void HtmlParser::parseTag(std::string::const_iterator tagBegin, std::string::const_iterator tagEnd, node & thisNode)



    tagBegin: pointer to the '<' character that starts the tag

      tagEnd: pointer to the '>' character that ends the tag

    thisNode: node element in whose fields the class and identifier info will be entered

    E.g. If the string between tagBegin and tagEnd is

            "div class= 'class1 class2     class3' id = 'myId' onlick   =  'myFunction()'"

        then make thisNode.element_type equal to "div"; add "class1", "class2" and "class3" to thisNode.class_set;

        and make thisNode.iden equal to "myId"


        (1) Get the first sequence of characters inside the tag. If it is a proper element type

            expression, set it equal to thisNode's element_type; otherwise, throw an error.

        (2) For each expression after the element type and of the form of valid attribute name-value


                (i)  if the attribute's name is "class", then add each of the classes to thisNode's

                    class_set field by getting each substring of the attribute's value that is

                    separated by whitespace

                (ii) if the attribute's name is "id", then set thisNode's iden field eqal to the

                    attribute's value


        // (1)

        std::string str;

        while (++tagBegin != tagEnd && *tagBegin != ' ')



        if (std::regex_match(str, _elementReg))

            thisNode.element_type = str;


            throw"Could not process element type.";


        // (2)

        std::regex_iterator<std::string::const_iterator> regit (tagBegin, tagEnd, _attrReg);

        std::regex_iterator<std::string::const_iterator> regend {std::regex_iterator<std::string::const_iterator>()};

        for (; regit != regend; ++regit)


            std::string attrName = (*regit)[1];

            std::transform(attrName.begin(), attrName.end(), attrName.begin(), ::tolower);

            // (2i)

            if (attrName == "class")


                std::stringstream ss((*regit)[2]);

                std::string thisClass;

                while (std::getline(ss, thisClass, ' '))



            // (2ii)

            else if (attrName == "id")

                thisNode.iden = (*regit)[2];



    For reference:

    Code (Text):

    const std::regex HtmlParser::_elementReg("[A-Za-z0-9\\-]");

    const std::regex HtmlParser::_attrReg("([A-Za-z0-9\\-]+)\\s*=\\s*(['\"])(.*?)\\2");


    Code (Text):

    struct node


        std::string element_type;

        std::set<std::string> class_set;

        std::string iden;

        std::set<node *> children;

    Last edited: Nov 23, 2014
  2. jcsd
  3. Nov 29, 2014 #2
    Thanks for the post! This is an automated courtesy bump. Sorry you aren't generating responses at the moment. Do you have any further information, come to any new conclusions or is it possible to reword the post?
  4. Nov 30, 2014 #3
    <1> Input: tagBegin, tagEnd, thisNode
    if one of them is null
    then what are the expected results ?
    Code (Text):

    std::stringstream ss((*regit)[2]);
                std::string thisClass;
                while (std::getline(ss, thisClass, ' '))
    Introducing stringstream is unnecessary in this case. Limiting your use of methods in another libraries as much as possible is always good.
  5. Dec 1, 2014 #4
    Really? Then I'd be always reinventing the wheel.
  6. Dec 3, 2014 #5
    I'd agree with limiting external libraries as they often come with many wheels, axes, seats, ... and stuff like electric windows, roofs, hifi, etc...
    If I just need a wheel I like to reinvent it as it is often easier to adapt them to individual problems...
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook