C/C++ [C++] Any suggestions on how to clean up this procedure?

AI Thread Summary
The discussion focuses on improving the `HtmlParser::parseTag` function, particularly regarding its handling of HTML tags and attributes. Key points include the need for error handling when inputs like `tagBegin`, `tagEnd`, or `thisNode` are null, as well as the critique of using `std::stringstream` for parsing class names. Some participants argue that relying on external libraries can complicate code unnecessarily, advocating for simpler, more tailored solutions. The conversation emphasizes balancing efficiency and simplicity in code design while ensuring robust functionality in parsing HTML attributes.
Jamin2112
Messages
973
Reaction score
12
Any ideas on how I can clean up the following?

Code:
void HtmlParser::parseTag(std::string::const_iterator tagBegin, std::string::const_iterator tagEnd, node & thisNode)

{

/*
tagBegin: pointer to the '<' character that starts the tag

  tagEnd: pointer to the '>' character that ends the tag

thisNode: node element in whose fields the class and identifier info will be entered
E.g. If the string between tagBegin and tagEnd is
        "div class= 'class1 class2     class3' id = 'myId' onlick   =  'myFunction()'"
    then make thisNode.element_type equal to "div"; add "class1", "class2" and "class3" to thisNode.class_set;

    and make thisNode.iden equal to "myId"
Procedure:
    (1) Get the first sequence of characters inside the tag. If it is a proper element type

        expression, set it equal to thisNode's element_type; otherwise, throw an error.
    (2) For each expression after the element type and of the form of valid attribute name-value

        pairs,
            (i)  if the attribute's name is "class", then add each of the classes to thisNode's 

                class_set field by getting each substring of the attribute's value that is 

                separated by whitespace
            (ii) if the attribute's name is "id", then set thisNode's iden field eqal to the 

                attribute's value
*/
    // (1)

    std::string str;

    while (++tagBegin != tagEnd && *tagBegin != ' ')

        str.push_back(*tagBegin);

   

    if (std::regex_match(str, _elementReg))

        thisNode.element_type = str;

    else

        throw"Could not process element type.";

   

    // (2)

    std::regex_iterator<std::string::const_iterator> regit (tagBegin, tagEnd, _attrReg);

    std::regex_iterator<std::string::const_iterator> regend {std::regex_iterator<std::string::const_iterator>()};

    for (; regit != regend; ++regit)

    {

        std::string attrName = (*regit)[1];

        std::transform(attrName.begin(), attrName.end(), attrName.begin(), ::tolower);

        // (2i)

        if (attrName == "class")

        {

            std::stringstream ss((*regit)[2]);

            std::string thisClass;

            while (std::getline(ss, thisClass, ' '))

                thisNode.class_set.insert(thisClass);

        }

        // (2ii)

        else if (attrName == "id")

            thisNode.iden = (*regit)[2];

    }

}
For reference:

Code:
const std::regex HtmlParser::_elementReg("[A-Za-z0-9\\-]");

const std::regex HtmlParser::_attrReg("([A-Za-z0-9\\-]+)\\s*=\\s*(['\"])(.*?)\\2");
and

Code:
struct node

{

    std::string element_type;

    std::set<std::string> class_set;

    std::string iden;

    std::set<node *> children;

};
 
Last edited:
<1> Input: tagBegin, tagEnd, thisNode
if one of them is null
then what are the expected results ?
<2>
Code:
std::stringstream ss((*regit)[2]);
            std::string thisClass;
            while (std::getline(ss, thisClass, ' '))
                thisNode.class_set.insert(thisClass);
Introducing stringstream is unnecessary in this case. Limiting your use of methods in another libraries as much as possible is always good.
 
Medicol said:
<1> Input: tagBegin, tagEnd, thisNode
if one of them is null
then what are the expected results ?
<2>
Code:
std::stringstream ss((*regit)[2]);
            std::string thisClass;
            while (std::getline(ss, thisClass, ' '))
                thisNode.class_set.insert(thisClass);
Introducing stringstream is unnecessary in this case. Limiting your use of methods in another libraries as much as possible is always good.

Really? Then I'd be always reinventing the wheel.
 
I'd agree with limiting external libraries as they often come with many wheels, axes, seats, ... and stuff like electric windows, roofs, hifi, etc...
If I just need a wheel I like to reinvent it as it is often easier to adapt them to individual problems...
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...

Similar threads

Back
Top