[C++] Any suggestions on how to clean up this procedure?

Click For Summary

Discussion Overview

The discussion revolves around improving the implementation of a C++ procedure for parsing HTML tags within a class. Participants are exploring ways to clean up the code, focusing on efficiency and the use of external libraries.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant questions the expected behavior of the function when inputs such as tagBegin, tagEnd, or thisNode are null.
  • Another participant suggests that using std::stringstream to parse class names is unnecessary and advocates for minimizing reliance on external libraries.
  • A later reply challenges the notion of avoiding external libraries, arguing that reinventing solutions can lead to unnecessary complexity.
  • Further, a participant agrees with the idea of limiting external libraries, suggesting that it can be beneficial to create tailored solutions for specific problems.

Areas of Agreement / Disagreement

Participants express differing views on the use of external libraries, with some advocating for their limitation while others defend their utility. The discussion remains unresolved regarding the best approach to clean up the procedure.

Contextual Notes

There are assumptions about the behavior of the function with null inputs that are not fully explored. The discussion also reflects varying opinions on the balance between using established libraries and creating custom solutions.

Jamin2112
Messages
973
Reaction score
12
Any ideas on how I can clean up the following?

Code:
void HtmlParser::parseTag(std::string::const_iterator tagBegin, std::string::const_iterator tagEnd, node & thisNode)

{

/*
tagBegin: pointer to the '<' character that starts the tag

  tagEnd: pointer to the '>' character that ends the tag

thisNode: node element in whose fields the class and identifier info will be entered
E.g. If the string between tagBegin and tagEnd is
        "div class= 'class1 class2     class3' id = 'myId' onlick   =  'myFunction()'"
    then make thisNode.element_type equal to "div"; add "class1", "class2" and "class3" to thisNode.class_set;

    and make thisNode.iden equal to "myId"
Procedure:
    (1) Get the first sequence of characters inside the tag. If it is a proper element type

        expression, set it equal to thisNode's element_type; otherwise, throw an error.
    (2) For each expression after the element type and of the form of valid attribute name-value

        pairs,
            (i)  if the attribute's name is "class", then add each of the classes to thisNode's 

                class_set field by getting each substring of the attribute's value that is 

                separated by whitespace
            (ii) if the attribute's name is "id", then set thisNode's iden field eqal to the 

                attribute's value
*/
    // (1)

    std::string str;

    while (++tagBegin != tagEnd && *tagBegin != ' ')

        str.push_back(*tagBegin);

   

    if (std::regex_match(str, _elementReg))

        thisNode.element_type = str;

    else

        throw"Could not process element type.";

   

    // (2)

    std::regex_iterator<std::string::const_iterator> regit (tagBegin, tagEnd, _attrReg);

    std::regex_iterator<std::string::const_iterator> regend {std::regex_iterator<std::string::const_iterator>()};

    for (; regit != regend; ++regit)

    {

        std::string attrName = (*regit)[1];

        std::transform(attrName.begin(), attrName.end(), attrName.begin(), ::tolower);

        // (2i)

        if (attrName == "class")

        {

            std::stringstream ss((*regit)[2]);

            std::string thisClass;

            while (std::getline(ss, thisClass, ' '))

                thisNode.class_set.insert(thisClass);

        }

        // (2ii)

        else if (attrName == "id")

            thisNode.iden = (*regit)[2];

    }

}
For reference:

Code:
const std::regex HtmlParser::_elementReg("[A-Za-z0-9\\-]");

const std::regex HtmlParser::_attrReg("([A-Za-z0-9\\-]+)\\s*=\\s*(['\"])(.*?)\\2");
and

Code:
struct node

{

    std::string element_type;

    std::set<std::string> class_set;

    std::string iden;

    std::set<node *> children;

};
 
Last edited:
<1> Input: tagBegin, tagEnd, thisNode
if one of them is null
then what are the expected results ?
<2>
Code:
std::stringstream ss((*regit)[2]);
            std::string thisClass;
            while (std::getline(ss, thisClass, ' '))
                thisNode.class_set.insert(thisClass);
Introducing stringstream is unnecessary in this case. Limiting your use of methods in another libraries as much as possible is always good.
 
Medicol said:
<1> Input: tagBegin, tagEnd, thisNode
if one of them is null
then what are the expected results ?
<2>
Code:
std::stringstream ss((*regit)[2]);
            std::string thisClass;
            while (std::getline(ss, thisClass, ' '))
                thisNode.class_set.insert(thisClass);
Introducing stringstream is unnecessary in this case. Limiting your use of methods in another libraries as much as possible is always good.

Really? Then I'd be always reinventing the wheel.
 
I'd agree with limiting external libraries as they often come with many wheels, axes, seats, ... and stuff like electric windows, roofs, hifi, etc...
If I just need a wheel I like to reinvent it as it is often easier to adapt them to individual problems...