[C++] Any suggestions on how to clean up this procedure?

In summary, the function HtmlParser::parseTag takes in three inputs - tagBegin, tagEnd, and thisNode - and performs the following procedures: 1. It extracts the first sequence of characters between tagBegin and tagEnd and sets it as the element_type of thisNode.2. It then goes through each attribute name-value pair and performs the following actions: a) If the attribute name is "class", it extracts each class separated by whitespace from the attribute value and adds it to thisNode's class_set field. b) If the attribute name is "id", it sets thisNode's iden field to the attribute value.If any of the input parameters are null, an error is thrown. The use
  • #1
Jamin2112
986
12
Any ideas on how I can clean up the following?

Code:
void HtmlParser::parseTag(std::string::const_iterator tagBegin, std::string::const_iterator tagEnd, node & thisNode)

{

/*
tagBegin: pointer to the '<' character that starts the tag

  tagEnd: pointer to the '>' character that ends the tag

thisNode: node element in whose fields the class and identifier info will be entered
E.g. If the string between tagBegin and tagEnd is
        "div class= 'class1 class2     class3' id = 'myId' onlick   =  'myFunction()'"
    then make thisNode.element_type equal to "div"; add "class1", "class2" and "class3" to thisNode.class_set;

    and make thisNode.iden equal to "myId"
Procedure:
    (1) Get the first sequence of characters inside the tag. If it is a proper element type

        expression, set it equal to thisNode's element_type; otherwise, throw an error.
    (2) For each expression after the element type and of the form of valid attribute name-value

        pairs,
            (i)  if the attribute's name is "class", then add each of the classes to thisNode's 

                class_set field by getting each substring of the attribute's value that is 

                separated by whitespace
            (ii) if the attribute's name is "id", then set thisNode's iden field eqal to the 

                attribute's value
*/
    // (1)

    std::string str;

    while (++tagBegin != tagEnd && *tagBegin != ' ')

        str.push_back(*tagBegin);

   

    if (std::regex_match(str, _elementReg))

        thisNode.element_type = str;

    else

        throw"Could not process element type.";

   

    // (2)

    std::regex_iterator<std::string::const_iterator> regit (tagBegin, tagEnd, _attrReg);

    std::regex_iterator<std::string::const_iterator> regend {std::regex_iterator<std::string::const_iterator>()};

    for (; regit != regend; ++regit)

    {

        std::string attrName = (*regit)[1];

        std::transform(attrName.begin(), attrName.end(), attrName.begin(), ::tolower);

        // (2i)

        if (attrName == "class")

        {

            std::stringstream ss((*regit)[2]);

            std::string thisClass;

            while (std::getline(ss, thisClass, ' '))

                thisNode.class_set.insert(thisClass);

        }

        // (2ii)

        else if (attrName == "id")

            thisNode.iden = (*regit)[2];

    }

}
For reference:

Code:
const std::regex HtmlParser::_elementReg("[A-Za-z0-9\\-]");

const std::regex HtmlParser::_attrReg("([A-Za-z0-9\\-]+)\\s*=\\s*(['\"])(.*?)\\2");
and

Code:
struct node

{

    std::string element_type;

    std::set<std::string> class_set;

    std::string iden;

    std::set<node *> children;

};
 
Last edited:
  • #3
<1> Input: tagBegin, tagEnd, thisNode
if one of them is null
then what are the expected results ?
<2>
Code:
std::stringstream ss((*regit)[2]);
            std::string thisClass;
            while (std::getline(ss, thisClass, ' '))
                thisNode.class_set.insert(thisClass);
Introducing stringstream is unnecessary in this case. Limiting your use of methods in another libraries as much as possible is always good.
 
  • #4
Medicol said:
<1> Input: tagBegin, tagEnd, thisNode
if one of them is null
then what are the expected results ?
<2>
Code:
std::stringstream ss((*regit)[2]);
            std::string thisClass;
            while (std::getline(ss, thisClass, ' '))
                thisNode.class_set.insert(thisClass);
Introducing stringstream is unnecessary in this case. Limiting your use of methods in another libraries as much as possible is always good.

Really? Then I'd be always reinventing the wheel.
 
  • #5
I'd agree with limiting external libraries as they often come with many wheels, axes, seats, ... and stuff like electric windows, roofs, hifi, etc...
If I just need a wheel I like to reinvent it as it is often easier to adapt them to individual problems...
 
  • #5


There are a few suggestions that could be implemented to clean up this procedure:

1. Use more descriptive variable names: The variables "tagBegin" and "tagEnd" could be renamed to something like "tagStart" and "tagEnd" to make their purpose clearer. This will make the code easier to read and understand.

2. Use comments to explain each step: Adding comments throughout the code will help future users understand the purpose of each step and make it easier to modify or debug the code.

3. Use more specific error messages: Instead of just throwing a generic error message, provide more specific information about what went wrong. For example, if the element type is not valid, the error message could state which element type was expected.

4. Consider using more modern C++ features: The use of std::string and std::regex can be replaced with newer features such as std::string_view and std::smatch, which can improve performance and readability.

5. Break up the procedure into smaller functions: Instead of having one large function, consider breaking it up into smaller functions that each handle a specific task. This will make the code more modular and easier to maintain.

6. Use consistent indentation and formatting: Making sure the code is consistently indented and formatted will make it easier to read and understand, especially for larger procedures.

7. Consider using a different data structure: Instead of using a std::set to store the classes and children, a std::unordered_set or std::vector may be more efficient in terms of performance.

Overall, the procedure seems to be well-written and functional, but implementing these suggestions could make it easier to read, maintain, and modify in the future.
 

What is the purpose of cleaning up a procedure in C++?

The purpose of cleaning up a procedure in C++ is to improve the readability, maintainability, and efficiency of the code. This can also help to reduce errors and make the code more organized.

What are some common ways to clean up a procedure in C++?

Some common ways to clean up a procedure in C++ include removing unnecessary code, breaking up long functions into smaller ones, using proper naming conventions, adding comments, and optimizing algorithms.

How can I make my code more readable when cleaning up a procedure in C++?

To make your code more readable, you can use proper indentation, spacing, and formatting. You can also use meaningful variable names and comments to explain the purpose of each line of code.

What are some potential benefits of cleaning up a procedure in C++?

Cleaning up a procedure in C++ can lead to shorter and more efficient code, making it easier to debug and maintain. It can also improve the overall performance of the program and make it easier for others to understand and work with the code.

Is it important to regularly clean up procedures in C++?

Yes, it is important to regularly clean up procedures in C++ to ensure that the code remains organized, efficient, and easy to maintain. This can also help to prevent potential errors and make it easier to add new features or make changes in the future.

Back
Top