Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Making a program to interpret equations. Am I going about this wrong?

  1. Dec 21, 2013 #1
    I'm making a program that interprets and evaluates equations. I won't go into all the specifications, but basically it would take a statement like

    ~1^(2~&3) 0x3FFE 0x2FCE 0xFCC1

    and returns

    4926 (0x133E).

    As you may have guessed, the 1, 2 and 3 are where the parameters 0x3FFE, 0x2FCE and 0xFCC1 are substituted.

    I'm right now working on a function that determines whether the equation is valid in the first place. For instance, there can't be any unbalanced parentheses, any of the numbers cannot exceed the number of parameters, there cannot be any invalid characters like '#', a left-facing paranthesis must always be followed by an operator, etc.

    For that function function alone, I've already written 225 lines of code, probably 150 if you exclude comments. It's very tricky. And I'm only iterating through characters of the equation once and I've carefully eliminated redundancies.

    Or am I inevitably going about this wrong if I'm having to write so much?
     
  2. jcsd
  3. Dec 21, 2013 #2

    chiro

    User Avatar
    Science Advisor

    Hey Jamin2112.

    What kind of data structure have you come up with to evaluate expressions?

    If you have not come up with one, probably the simplest data structure to evaluate expressions like these would be a binary tree. You can evaluate an expression recursively at each node depending on the node type.

    Examples of node types for standard arithmetic include +,-,*,/, and % (MOD). If you have unary operators then you will need to treat those in special cases.
     
  4. Dec 21, 2013 #3

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    Start by learning Lex and Yacc (or whateever they are called in your favorite version of Unix).

    A parser for the complete grammar of ANSI C is less than 750 lines.

    http://www.quut.com/c/ANSI-C-grammar-l.html
    http://www.quut.com/c/ANSI-C-grammar-y-2011.html

    Of course you also have to write the code that actually "does something", but for a calculator type of application that is trivial. Here's a "make a calculator " tutorial: http://epaperpress.com/lexandyacc/
     
  5. Dec 21, 2013 #4
    I have not gotten to the evaluation part yet. Should I be doing this in 1 fell swoop? That is, should I attempt to evaluate and check for errors along the way?
     
  6. Dec 22, 2013 #5

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    The point of using tools like lex and yacc is that you don't have to do the error checking. You just write the grammar for the input language, and the tools do the work. Turning the grammar into an executable program (in C or C++) is a solved computer science problem, and that's what the tools do for you. With luck, you never need to look at the generated code - just compile and run it.

    Hand-crafting a lexical parser and input analyzer is mildly interesting to do once in your life (but as you are finding out, it can be tediously long and repetitive), but it's about as useful for real life computing as making a model of the Eiffel tower out of matchsticks and glue.
     
  7. Dec 22, 2013 #6

    chiro

    User Avatar
    Science Advisor

    You can do it as part of the evaluation.

    In fact you can use a parser like those mentioned above and use the structural aspects of the parsed sentence tree to evaluate the actual expression. You can decide whether you want to use the binary tree, yacc, or some other method and/or data structure.
     
  8. Dec 22, 2013 #7

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    That's not true. Find one single parser for C++ that uses lex and yacc. You can't. Lex and yacc cannot parse C++. The language is too complex; it's beyond the scope of a LALR parser such as lex and yacc. The compilers from gcc, clang, Microsoft: They are all hand-written recursive descent parsers.

    A recursive descent parser for a BNF such as the following is a standard exercise in many computer language classes.
    Code (Text):

    <full_expression> ::= <expression> <EOL>
    <expression> ::= <term> | <expression> <additive_op> <term>
    <term> ::= <factor> | <factor> <multiplicative_op> <term>
    <factor> ::= <number> | <lparen> <expression> <rparen>
    <number> ::= <digit> | <number> <digit>

    <EOL> ::= "\n"
    <additive_op > ::= "+" | "-"
    <multiplicative_op > ::= "*" | "/"
    <digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
    <lparen> ::= "("
    <rparen> ::= ")"
     
     
  9. Dec 22, 2013 #8
    Jamin2112, I suggest you see whether your local library has a copy of Programming: Principles and Practice Using C++ (Stroustrup, 2008). I had a look at it recently and feel that inexperienced programmers could benefit greatly from it, due to its emphasis on problem solving, industry-quality practices, and avoidance of toy examples.

    I suggest it particularly because a simple calculator application is the first major example in the book. While writing it, he gives a gentle introduction to tokenization and grammars.

    Although, as others have said, you're dealing with a solved problem, if you'd like to gain a token understanding of it*, I think that book is worth your time. (I also think it's a valuable book generally.)

    * Sorry.
     
  10. Dec 22, 2013 #9

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    ++thanks on that suggestion. This book is particularly apropos given that it covers a simple parser of the type that Jamin2112 is trying to learn to write.
     
  11. Dec 22, 2013 #10

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    Point taken. I should have said "Hand crafting a parser and analyzer for an input language where they can be automatically generated ....."

    But if you are designing your own input language (rather than implementing a camel designed by somebody else), why design it so an LALR parser can't handle it? More language complexity or ambiguity = more user misunderstandings and errors, I think.
     
  12. Dec 22, 2013 #11

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    The reason lex and yacc "cannot" (see below) parse C++ is because the syntax and semantics of C++ are horribly intertwined A well known example is
    Code (Text):

    int (x);
     
    This is legal but ambiguous, without the context of the rest of the program source code. It could be a declaration of an integer variable x (with a redundant set of parentheses), or an expression statement which casts the value of x to an integer (and probably has no useful purpose, but that is beside the point).

    You can make examples with much more complexity than the above by using features like templates.

    One way to resolve the ambiguity is to mix up the semantic analysis with the parsing. Another way is to recast the grammar into something that can be parsed be context-free methods like lex and yacc, and create a parse tree that captures the semantic ambiguity so that it can be resolved later on. See section 5 of http://www.computing.surrey.ac.uk/research/dsrg/fog/FogThesis.pdf for more.
     
  13. Dec 22, 2013 #12
    Good suggestions.

    I think that I'm going to start by making a function that evaluates expressions limited to integers, multiplication, addition and parentheses. That will give me practice writing elegant recursive algorithms for this type of problem. I'll post my solution here.
     
  14. Dec 22, 2013 #13

    AlephZero

    User Avatar
    Science Advisor
    Homework Helper

    I'm a bit confused here. AFAIK the main advantage of recursive descent parsers is that they are easy to hand-craft, because the translation of the grammar into code is so simple it's hard to make mistakes, and easy to debug them just by reading the code.

    The other advantage is that for a simple application like a calculator, you can combine the syntax analysis and semantic operations in one pass, and for a simple application you night not need any data structures at all, because the calling tree of the functions at run time is equivalent to the parse tree.

    But apart from the ability to mix up syntax and semantic operations in an arbitrary way, I wasn't away that recursive descent was a more powerful parsing algorithm than LALR. In fact I thought it was less powerful. unless you put something sneaky in your example grammar that I missed, that example is simple to parse in lex and yacc.
     
  15. Dec 23, 2013 #14
    Alright, guys. I mentioned in my last reply that I'm making a simple equation interpreter that does addition and multiplication, with parentheses allowed and with addition having a higher precedence than multiplication.

    So far I've made a function that checks whether the equation is valid. It's passed every test that I've given it. The code is below and I'd like you to scrutinize it and give me input, even if condescendingly. I'm trying to avoid repeated logic but at the same time be somewhat elegant.

    Code (Text):

    #include <iostream>
    #include <string>

    bool valid_equation(const std::string&);
    int str2Int(const std::string&);
    bool isdigit(char);

    int main (int argc, char* const argv[]) {
       
        std::cout << valid_equation("4*(3+4)");
       
        return 0;
    }



    bool valid_equation(const std::string& eq) {
        std::string::const_iterator it = eq.begin();
        // Equation cannot be an empty string:
        if (it == eq.end())
            return false;
        // Count for the number of right-facing and left-facing parantheses:
        int RFPcnt(0), LFPcnt(0);
        /* Iterate through the characters of the equation to check that it is valid based on
           which characters are lined up next to which. For example, if two plus signs are found
           next to each other, that makes the equation meaningless and the our function
           valid_equation immediately returns false;
        */
        while (it != eq.end()) {
            if (*it == '(') {
                /*
                   Rules for a right-facing paranthesis:
                   (1) Must be followed by a digit or another right-facing paranthesis
                   (2) Must be preceded by an operator or another right-facing paranthesis or nothing at all
                */
                if ((it + 1) == eq.end()) {
                    return false;
                } else {
                    if (!isdigit(*(it + 1)) && *(it + 1) != '(')
                        return false;
                }
                if (it != eq.begin()) {
                    if (*(it - 1) != '*' && *(it - 1) != '+' && *(it - 1) != '(')
                        return false;
                }
                ++it; // If we made it here, increment the iterator
                ++RFPcnt; // Increment the count of right-facing paranthesis as well
            } else if (*it == ')') {
                /*
                   Rules for a left-facing paranthesis:
                   (1) Cannot be the beginning of an equation
                   (2) Must be preceded by a digit or another left-facing paranthesis
                   (3) Must be followed by an operator or another left-facing paranthesis or nothing at all
                */
                if (it == eq.begin()) {
                    return false;
                } else {
                    if (!isdigit(*(it - 1)) && *(it - 1) != ')')
                        return false;
                }
                if ((it + 1) != eq.end()) {
                    if (*(it + 1) != '*' && *(it + 1) != '+' && *(it + 1) != ')')
                        return false;
                }
                ++it; // If we made it here, increment the iterator
                ++LFPcnt; // Increment the count of left-facing paranthesis as well
            } else if (*it == '*' || *it == '+') {
                /*
                   Rules for an operator:
                   (1) Cannot be the beginning or end of an equation
                   (2) Must be preceded by a digit or a left-facing paranthesis
                   (3) Must be followed by a digit or a right-facing paranthesis
                */
                if (it == eq.begin() || (it + 1) == eq.end()) {
                    return false;
                } else {
                    if (!isdigit(*(it - 1)) && *(it - 1) != ')')
                        return false;
                    if (!isdigit(*(it + 1)) && *(it + 1) != '(')
                        return false;
                }
                ++it;
            } else if (isdigit(*it)) {
                // Simply increment the iterator until it's no longer on a digit:
                while (isdigit(*++it));
            } else { // If we're here, *it is invalid character
                return false;
            }
        }
       
        if (RFPcnt != LFPcnt)  // Unbalanced paranthesis?
            return false;
       
        return true; // If we made it here, the equation is valid; return true
       
    }



    int str2Int(const std::string& str) {
        std::string::const_iterator it = str.begin();
        if (it == str.end()) {
            std::cout << "Invalid parameter: str2Int() only accepts non-empty strings." << std::endl;
            return -1;
        }
        int sum = 0;
        while (it != str.end()) {
            if (!isdigit(*it)) {
                std::cout << "Invalid paramter: str2Int only accepts strings of chars '0' through '9'" << std::endl;
                return -1;
            } else {
                sum *= 10;
                sum += *it++ - '0';
            }
        }
        return sum;
    }

    bool isdigit(char c) {
        return (c >= '0' && c <= '9');
    }

     
     
    Last edited: Dec 23, 2013
  16. Dec 23, 2013 #15
  17. Dec 23, 2013 #16
    I think you should do this in several steps. First you tokenize the input. i.e.
    You turn this
    ~1^(2~&3) 0x3FFE 0x2FCE 0xFCC1
    into a list of tokens like so
    ~, 1, ^, (, 2, ~, &, 3, ), 0x3FFE, 0x2FCE, 0xFCC1
    For that you could also use regular expressions.
    Then you can turn that into a tree, or alternatively into rpn (also know as postfix) notation.
    That can then be interpreted very easily.

    Of course it would be a lot easier to write this, if your program get's the expression in rpn form.
     
  18. Dec 23, 2013 #17
    My mini addition and multiplication calculator needs a little debugging, but I'm almost there.

    Code (Text):


    std::string evaluate_equation(std::string eq) {
        /* By the order of operations:
           (1) Evaluate the expression in the innermost set of paranthesis, if there is one;
           (2) Evaluate multiplication expressions left-to-right;
           (3) Evaluate addition expressions left-to-right.
           Use recursion.
         */
       
        // (1):
        size_t FRFP = eq.find_last_of('('); // position of first right-facing paranthesis
        if (FRFP != std::string::npos) {
            // Set NLFP to be the position of the next left-facing paranthesis after the FRFPL
            size_t NLFP = FRFP;
            while (eq[++NLFP] != ')');
            /* Replace the space between FRFP and NLFP inclusive with the evalulation of the expression
               between FRFP and NLFP exclusive:
            */
            std::string paranstr = eq.substr(FRFP + 1, NLFP - 1);
            eq.replace(FRFP, NLFP - FRFP + 1, evaluate_equation(paranstr));
            evaluate_equation(eq);
        }
        // (2):
        size_t firstast = eq.find_first_of('*'); // position of first asterisk
        if (firstast != std::string::npos) {
            // Set num1 and num2 equal to the respective numbers to the left and right of the asterisk:
            std::string num1, num2;
            size_t num1begin(firstast), num2end(firstast);
            while (--num1begin >= 0 && isdigit(eq[num1begin]))
                num1.insert(0, 1, eq[num1begin]);
            while (++num2end < eq.size() && isdigit(eq[num2end]))
                num2.push_back(eq[num2end]);
            // Replace the space of the multiplication equation num1*num2 with its evaluation:
            eq.replace(num1begin + 1, num2end - num1begin + 1, multStrs(num1, num2));
            evaluate_equation(eq);
        }
        // (3):
        size_t firstplus = eq.find_first_of('+');
        if (firstplus != std::string::npos) {
            // Set num1 and num2 equal to the respective numbers to the left and right of the asterisk:
            std::string num1, num2;
            size_t num1begin(firstplus), num2end(firstplus);
            while (--num1begin >= 0 && isdigit(eq[num1begin]))
                num1.insert(0, 1, eq[num1begin]);
            while (++num2end < eq.size() && isdigit(eq[num2end]))
                num2.push_back(eq[num2end]);
            // Replace the space of the multiplication equation num1*num2 with its evaluation:
            eq.replace(num1begin + 1, num2end - num1begin + 1, addStrs(num1, num2));
            evaluate_equation(eq);
           
        }
       
        // If we made it to this point, all that's left of the equation is a single number. Return it;
        return eq;
    }

     
     
  19. Dec 23, 2013 #18
    From the console:

    Code (Text):

    3*(6+1)
    21
    2+2
    4
    2+(2)
    4
    5+6*7+8
    47
    (((6)))
    ((6))))
     
    Yea, those last two didn't come out right. Still got work to do!
     
  20. Dec 24, 2013 #19
    You need to find the outermost pair of parantheses, not the innermost one.

    btw. here is an expression evaluator I wrote in Java a while ago.
    http://pastebin.com/4y5ELtng
     
  21. Dec 24, 2013 #20
    You can use recursion by first evaluating the innermost parentheses and then replacing the space taken up by that parenthetical expression with its evaluation, and then reevaluating the entire expression, etc. That way is good because you can easily add a feature that prints out the steps of the evaluation. I got my program to work. I'm still gonna check out yours, though. Mine is a very crude implementation.

    Code (Text):

    #include <iostream>
    #include <string>

    std::string evaluate_equation(std::string);
    bool valid_equation(const std::string&);
    int str2Int(const std::string&);
    std::string int2Str(unsigned int);
    bool isdigit(char);
    std::string addStrs(const std::string&, const std::string&);
    std::string multStrs(const std::string&, const std::string&);

    int main (int argc, char* const argv[]) {
       
        // Evaluate expressions at the console
        std::string eqtn;
        while (std::cin >> eqtn) {
            if (!valid_equation(eqtn)) {
                std::cout << "Equation invalid; not evaluated" << std::endl;
            } else {
                std::cout << evaluate_equation(eqtn) << std::endl;
            }
        }
       
        return 0;
    }

    std::string evaluate_equation(std::string eq) {
        /* By the order of operations:
           (1) Evaluate the expression in the innermost set of paranthesis, if there is one;
           (2) Evaluate multiplication expressions left-to-right;
           (3) Evaluate addition expressions left-to-right.
           Use recursion.
         */
       
        // (1):
        size_t LRFP = eq.find_last_of('('); // position of last right-facing paranthesis
        if (LRFP != std::string::npos) {
            // Set NLFP to be the position of the next left-facing paranthesis after the FRFPL
            size_t NLFP = LRFP;
            while (eq[++NLFP] != ')');
            /* Replace the space between FRFP and NLFP inclusive with the evalulation of the expression
               between FRFP and NLFP exclusive:
            */
            std::string paranstr = eq.substr(LRFP + 1, NLFP - LRFP - 1);
            eq.replace(LRFP, NLFP - LRFP + 1, evaluate_equation(paranstr));
            return evaluate_equation(eq);
        }
        // (2):
        size_t firstast = eq.find_first_of('*'); // position of first asterisk
        if (firstast != std::string::npos) {
            // Set num1 and num2 equal to the respective numbers to the left and right of the asterisk:
            std::string num1, num2;
            size_t num1begin(firstast), num2end(firstast);
            while (num1begin > 0 && isdigit(eq[num1begin - 1]))  
                num1.insert(0, 1, eq[--num1begin]);
            int eqsz = eq.size();
            while ((num2end + 1) < eqsz && isdigit(eq[num2end + 1]))
                num2.push_back(eq[++num2end]);
            // Replace the space of the multiplication equation num1*num2 with its evaluation:
            eq.replace(num1begin, num2end - num1begin + 1, multStrs(num1, num2));
            return evaluate_equation(eq);
        }
        // (3):
        size_t firstplus = eq.find_first_of('+');
        if (firstplus != std::string::npos) {
            // Set num1 and num2 equal to the respective numbers to the left and right of the asterisk:
            std::string num1, num2;
            size_t num1begin(firstplus), num2end(firstplus);
            while (num1begin > 0 && isdigit(eq[num1begin - 1]))  
                num1.insert(0, 1, eq[--num1begin]);
            int eqsz = eq.size();
            while ((num2end + 1) < eqsz && isdigit(eq[num2end + 1]))
                num2.push_back(eq[++num2end]);
            // Replace the space of the multiplication equation num1*num2 with its evaluation:
            eq.replace(num1begin, num2end - num1begin + 1, addStrs(num1, num2));
            return evaluate_equation(eq);
        }
        // If we made it to this point, all that's left of the equation is a single number. Return it;
        return eq;
    }

    bool valid_equation(const std::string& eq) {
        std::string::const_iterator it = eq.begin();
        // Equation cannot be an empty string:
        if (it == eq.end())
            return false;
        // Count for the number of right-facing and left-facing parantheses:
        int RFPcnt(0), LFPcnt(0);
        /* Iterate through the characters of the equation to check that it is valid based on
           which characters are lined up next to which. For example, if two plus signs are found
           next to each other, that makes the equation meaningless and the our function
           valid_equation() immediately returns false.
        */
        while (it != eq.end()) {
            if (*it == '(') {
                /*
                   Rules for a right-facing paranthesis:
                   (1) Must be followed by a digit or another right-facing paranthesis
                   (2) Must be preceded by an operator or another right-facing paranthesis or nothing at all
                */
                if ((it + 1) == eq.end()) {
                    return false;
                } else {
                    if (!isdigit(*(it + 1)) && *(it + 1) != '(')
                        return false;
                }
                if (it != eq.begin()) {
                    if (*(it - 1) != '*' && *(it - 1) != '+' && *(it - 1) != '(')
                        return false;
                }
                ++it; // If we made it here, increment the iterator
                ++RFPcnt; // Increment the count of right-facing paranthesis as well
            } else if (*it == ')') {
                /*
                   Rules for a left-facing paranthesis:
                   (1) Cannot be the beginning of an equation
                   (2) Must be preceded by a digit or another left-facing paranthesis
                   (3) Must be followed by an operator or another left-facing paranthesis or nothing at all
                */
                if (it == eq.begin()) {
                    return false;
                } else {
                    if (!isdigit(*(it - 1)) && *(it - 1) != ')')
                        return false;
                }
                if ((it + 1) != eq.end()) {
                    if (*(it + 1) != '*' && *(it + 1) != '+' && *(it + 1) != ')')
                        return false;
                }
                ++it; // If we made it here, increment the iterator
                ++LFPcnt; // Increment the count of left-facing paranthesis as well
            } else if (*it == '*' || *it == '+') {
                /*
                   Rules for an operator:
                   (1) Cannot be the beginning or end of an equation
                   (2) Must be preceded by a digit or a left-facing paranthesis
                   (3) Must be followed by a digit or a right-facing paranthesis
                */
                if (it == eq.begin() || (it + 1) == eq.end()) {
                    return false;
                } else {
                    if (!isdigit(*(it - 1)) && *(it - 1) != ')')
                        return false;
                    if (!isdigit(*(it + 1)) && *(it + 1) != '(')
                        return false;
                }
                ++it;
            } else if (isdigit(*it)) {
                // Simply increment the iterator until it's no longer on a digit:
                while (isdigit(*++it));
            } else { // If we're here, *it is invalid character
                return false;
            }
        }
       
        if (RFPcnt != LFPcnt)  // Unbalanced paranthesis?
            return false;
       
        return true; // If we made it here, the equation is valid; return true
       
    }
           
    // Given a decimal representation as a string s, str2Int(s) returns the corresponding decimal as an int
    int str2Int(const std::string& str) {
        std::string::const_iterator it = str.begin();
        if (it == str.end()) {
            std::cout << "Invalid parameter: str2Int() only accepts non-empty strings." << std::endl;
            return -1;
        }
        int sum = 0;
        while (it != str.end()) {
            if (!isdigit(*it)) {
                std::cout << "Invalid paramter: str2Int only accepts strings of chars '0' through '9'" << std::endl;
                return -1;
            } else {
                sum *= 10;
                sum += *it++ - '0';
            }
        }
        return sum;
    }

    // Given an int n, int2Str(n) return the string representation
    std::string int2Str(unsigned int n) {
        std::string retstring;
        if (n == 0) {
            retstring.push_back('0');
            return retstring;
        } else {
            while (n != 0) {
                retstring.insert(0, 1, char('0' + n % 10));
                n /= 10;
            }  
            return retstring;
        }
    }

    // Function that checks whether a given character is in one of '0', '1', ..., '9'
    bool isdigit(char c) {
        return (c >= '0' && c <= '9');
    }

    // Adds 2 string decimal representations and returns the sum as a string decimal representation
    std::string addStrs(const std::string& s1, const std::string& s2) {
        return (int2Str(str2Int(s1) + str2Int(s2)));
    }

    // Multiplies 2 string decimal representations and returns the sum as a string decimal representation
    std::string multStrs(const std::string& s1, const std::string& s2) {
        return (int2Str(str2Int(s1) * str2Int(s2)));
    }
     
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Making a program to interpret equations. Am I going about this wrong?
Loading...