1. Dec 21, 2013

### Jamin2112

I'm making a program that interprets and evaluates equations. I won't go into all the specifications, but basically it would take a statement like

~1^(2~&3) 0x3FFE 0x2FCE 0xFCC1

and returns

4926 (0x133E).

As you may have guessed, the 1, 2 and 3 are where the parameters 0x3FFE, 0x2FCE and 0xFCC1 are substituted.

I'm right now working on a function that determines whether the equation is valid in the first place. For instance, there can't be any unbalanced parentheses, any of the numbers cannot exceed the number of parameters, there cannot be any invalid characters like '#', a left-facing paranthesis must always be followed by an operator, etc.

For that function function alone, I've already written 225 lines of code, probably 150 if you exclude comments. It's very tricky. And I'm only iterating through characters of the equation once and I've carefully eliminated redundancies.

2. Dec 21, 2013

### chiro

Hey Jamin2112.

What kind of data structure have you come up with to evaluate expressions?

If you have not come up with one, probably the simplest data structure to evaluate expressions like these would be a binary tree. You can evaluate an expression recursively at each node depending on the node type.

Examples of node types for standard arithmetic include +,-,*,/, and % (MOD). If you have unary operators then you will need to treat those in special cases.

3. Dec 21, 2013

### AlephZero

Start by learning Lex and Yacc (or whateever they are called in your favorite version of Unix).

A parser for the complete grammar of ANSI C is less than 750 lines.

http://www.quut.com/c/ANSI-C-grammar-l.html
http://www.quut.com/c/ANSI-C-grammar-y-2011.html

Of course you also have to write the code that actually "does something", but for a calculator type of application that is trivial. Here's a "make a calculator " tutorial: http://epaperpress.com/lexandyacc/

4. Dec 21, 2013

### Jamin2112

I have not gotten to the evaluation part yet. Should I be doing this in 1 fell swoop? That is, should I attempt to evaluate and check for errors along the way?

5. Dec 22, 2013

### AlephZero

The point of using tools like lex and yacc is that you don't have to do the error checking. You just write the grammar for the input language, and the tools do the work. Turning the grammar into an executable program (in C or C++) is a solved computer science problem, and that's what the tools do for you. With luck, you never need to look at the generated code - just compile and run it.

Hand-crafting a lexical parser and input analyzer is mildly interesting to do once in your life (but as you are finding out, it can be tediously long and repetitive), but it's about as useful for real life computing as making a model of the Eiffel tower out of matchsticks and glue.

6. Dec 22, 2013

### chiro

You can do it as part of the evaluation.

In fact you can use a parser like those mentioned above and use the structural aspects of the parsed sentence tree to evaluate the actual expression. You can decide whether you want to use the binary tree, yacc, or some other method and/or data structure.

7. Dec 22, 2013

### D H

Staff Emeritus
That's not true. Find one single parser for C++ that uses lex and yacc. You can't. Lex and yacc cannot parse C++. The language is too complex; it's beyond the scope of a LALR parser such as lex and yacc. The compilers from gcc, clang, Microsoft: They are all hand-written recursive descent parsers.

A recursive descent parser for a BNF such as the following is a standard exercise in many computer language classes.
Code (Text):

<full_expression> ::= <expression> <EOL>
<expression> ::= <term> | <expression> <additive_op> <term>
<term> ::= <factor> | <factor> <multiplicative_op> <term>
<factor> ::= <number> | <lparen> <expression> <rparen>
<number> ::= <digit> | <number> <digit>

<EOL> ::= "\n"
<additive_op > ::= "+" | "-"
<multiplicative_op > ::= "*" | "/"
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<lparen> ::= "("
<rparen> ::= ")"

8. Dec 22, 2013

### Integrand

Jamin2112, I suggest you see whether your local library has a copy of Programming: Principles and Practice Using C++ (Stroustrup, 2008). I had a look at it recently and feel that inexperienced programmers could benefit greatly from it, due to its emphasis on problem solving, industry-quality practices, and avoidance of toy examples.

I suggest it particularly because a simple calculator application is the first major example in the book. While writing it, he gives a gentle introduction to tokenization and grammars.

Although, as others have said, you're dealing with a solved problem, if you'd like to gain a token understanding of it*, I think that book is worth your time. (I also think it's a valuable book generally.)

* Sorry.

9. Dec 22, 2013

### D H

Staff Emeritus
++thanks on that suggestion. This book is particularly apropos given that it covers a simple parser of the type that Jamin2112 is trying to learn to write.

10. Dec 22, 2013

### AlephZero

Point taken. I should have said "Hand crafting a parser and analyzer for an input language where they can be automatically generated ....."

But if you are designing your own input language (rather than implementing a camel designed by somebody else), why design it so an LALR parser can't handle it? More language complexity or ambiguity = more user misunderstandings and errors, I think.

11. Dec 22, 2013

### AlephZero

The reason lex and yacc "cannot" (see below) parse C++ is because the syntax and semantics of C++ are horribly intertwined A well known example is
Code (Text):

int (x);

This is legal but ambiguous, without the context of the rest of the program source code. It could be a declaration of an integer variable x (with a redundant set of parentheses), or an expression statement which casts the value of x to an integer (and probably has no useful purpose, but that is beside the point).

You can make examples with much more complexity than the above by using features like templates.

One way to resolve the ambiguity is to mix up the semantic analysis with the parsing. Another way is to recast the grammar into something that can be parsed be context-free methods like lex and yacc, and create a parse tree that captures the semantic ambiguity so that it can be resolved later on. See section 5 of http://www.computing.surrey.ac.uk/research/dsrg/fog/FogThesis.pdf for more.

12. Dec 22, 2013

### Jamin2112

Good suggestions.

I think that I'm going to start by making a function that evaluates expressions limited to integers, multiplication, addition and parentheses. That will give me practice writing elegant recursive algorithms for this type of problem. I'll post my solution here.

13. Dec 22, 2013

### AlephZero

I'm a bit confused here. AFAIK the main advantage of recursive descent parsers is that they are easy to hand-craft, because the translation of the grammar into code is so simple it's hard to make mistakes, and easy to debug them just by reading the code.

The other advantage is that for a simple application like a calculator, you can combine the syntax analysis and semantic operations in one pass, and for a simple application you night not need any data structures at all, because the calling tree of the functions at run time is equivalent to the parse tree.

But apart from the ability to mix up syntax and semantic operations in an arbitrary way, I wasn't away that recursive descent was a more powerful parsing algorithm than LALR. In fact I thought it was less powerful. unless you put something sneaky in your example grammar that I missed, that example is simple to parse in lex and yacc.

14. Dec 23, 2013

### Jamin2112

Alright, guys. I mentioned in my last reply that I'm making a simple equation interpreter that does addition and multiplication, with parentheses allowed and with addition having a higher precedence than multiplication.

So far I've made a function that checks whether the equation is valid. It's passed every test that I've given it. The code is below and I'd like you to scrutinize it and give me input, even if condescendingly. I'm trying to avoid repeated logic but at the same time be somewhat elegant.

Code (Text):

#include <iostream>
#include <string>

bool valid_equation(const std::string&);
int str2Int(const std::string&);
bool isdigit(char);

int main (int argc, char* const argv[]) {

std::cout << valid_equation("4*(3+4)");

return 0;
}

bool valid_equation(const std::string& eq) {
std::string::const_iterator it = eq.begin();
// Equation cannot be an empty string:
if (it == eq.end())
return false;
// Count for the number of right-facing and left-facing parantheses:
int RFPcnt(0), LFPcnt(0);
/* Iterate through the characters of the equation to check that it is valid based on
which characters are lined up next to which. For example, if two plus signs are found
next to each other, that makes the equation meaningless and the our function
valid_equation immediately returns false;
*/
while (it != eq.end()) {
if (*it == '(') {
/*
Rules for a right-facing paranthesis:
(1) Must be followed by a digit or another right-facing paranthesis
(2) Must be preceded by an operator or another right-facing paranthesis or nothing at all
*/
if ((it + 1) == eq.end()) {
return false;
} else {
if (!isdigit(*(it + 1)) && *(it + 1) != '(')
return false;
}
if (it != eq.begin()) {
if (*(it - 1) != '*' && *(it - 1) != '+' && *(it - 1) != '(')
return false;
}
++it; // If we made it here, increment the iterator
++RFPcnt; // Increment the count of right-facing paranthesis as well
} else if (*it == ')') {
/*
Rules for a left-facing paranthesis:
(1) Cannot be the beginning of an equation
(2) Must be preceded by a digit or another left-facing paranthesis
(3) Must be followed by an operator or another left-facing paranthesis or nothing at all
*/
if (it == eq.begin()) {
return false;
} else {
if (!isdigit(*(it - 1)) && *(it - 1) != ')')
return false;
}
if ((it + 1) != eq.end()) {
if (*(it + 1) != '*' && *(it + 1) != '+' && *(it + 1) != ')')
return false;
}
++it; // If we made it here, increment the iterator
++LFPcnt; // Increment the count of left-facing paranthesis as well
} else if (*it == '*' || *it == '+') {
/*
Rules for an operator:
(1) Cannot be the beginning or end of an equation
(2) Must be preceded by a digit or a left-facing paranthesis
(3) Must be followed by a digit or a right-facing paranthesis
*/
if (it == eq.begin() || (it + 1) == eq.end()) {
return false;
} else {
if (!isdigit(*(it - 1)) && *(it - 1) != ')')
return false;
if (!isdigit(*(it + 1)) && *(it + 1) != '(')
return false;
}
++it;
} else if (isdigit(*it)) {
// Simply increment the iterator until it's no longer on a digit:
while (isdigit(*++it));
} else { // If we're here, *it is invalid character
return false;
}
}

if (RFPcnt != LFPcnt)  // Unbalanced paranthesis?
return false;

return true; // If we made it here, the equation is valid; return true

}

int str2Int(const std::string& str) {
std::string::const_iterator it = str.begin();
if (it == str.end()) {
std::cout << "Invalid parameter: str2Int() only accepts non-empty strings." << std::endl;
return -1;
}
int sum = 0;
while (it != str.end()) {
if (!isdigit(*it)) {
std::cout << "Invalid paramter: str2Int only accepts strings of chars '0' through '9'" << std::endl;
return -1;
} else {
sum *= 10;
sum += *it++ - '0';
}
}
return sum;
}

bool isdigit(char c) {
return (c >= '0' && c <= '9');
}

Last edited: Dec 23, 2013
15. Dec 23, 2013

### Jamin2112

16. Dec 23, 2013

### DrZoidberg

I think you should do this in several steps. First you tokenize the input. i.e.
You turn this
~1^(2~&3) 0x3FFE 0x2FCE 0xFCC1
into a list of tokens like so
~, 1, ^, (, 2, ~, &, 3, ), 0x3FFE, 0x2FCE, 0xFCC1
For that you could also use regular expressions.
Then you can turn that into a tree, or alternatively into rpn (also know as postfix) notation.
That can then be interpreted very easily.

Of course it would be a lot easier to write this, if your program get's the expression in rpn form.

17. Dec 23, 2013

### Jamin2112

My mini addition and multiplication calculator needs a little debugging, but I'm almost there.

Code (Text):

std::string evaluate_equation(std::string eq) {
/* By the order of operations:
(1) Evaluate the expression in the innermost set of paranthesis, if there is one;
(2) Evaluate multiplication expressions left-to-right;
Use recursion.
*/

// (1):
size_t FRFP = eq.find_last_of('('); // position of first right-facing paranthesis
if (FRFP != std::string::npos) {
// Set NLFP to be the position of the next left-facing paranthesis after the FRFPL
size_t NLFP = FRFP;
while (eq[++NLFP] != ')');
/* Replace the space between FRFP and NLFP inclusive with the evalulation of the expression
between FRFP and NLFP exclusive:
*/
std::string paranstr = eq.substr(FRFP + 1, NLFP - 1);
eq.replace(FRFP, NLFP - FRFP + 1, evaluate_equation(paranstr));
evaluate_equation(eq);
}
// (2):
size_t firstast = eq.find_first_of('*'); // position of first asterisk
if (firstast != std::string::npos) {
// Set num1 and num2 equal to the respective numbers to the left and right of the asterisk:
std::string num1, num2;
size_t num1begin(firstast), num2end(firstast);
while (--num1begin >= 0 && isdigit(eq[num1begin]))
num1.insert(0, 1, eq[num1begin]);
while (++num2end < eq.size() && isdigit(eq[num2end]))
num2.push_back(eq[num2end]);
// Replace the space of the multiplication equation num1*num2 with its evaluation:
eq.replace(num1begin + 1, num2end - num1begin + 1, multStrs(num1, num2));
evaluate_equation(eq);
}
// (3):
size_t firstplus = eq.find_first_of('+');
if (firstplus != std::string::npos) {
// Set num1 and num2 equal to the respective numbers to the left and right of the asterisk:
std::string num1, num2;
size_t num1begin(firstplus), num2end(firstplus);
while (--num1begin >= 0 && isdigit(eq[num1begin]))
num1.insert(0, 1, eq[num1begin]);
while (++num2end < eq.size() && isdigit(eq[num2end]))
num2.push_back(eq[num2end]);
// Replace the space of the multiplication equation num1*num2 with its evaluation:
eq.replace(num1begin + 1, num2end - num1begin + 1, addStrs(num1, num2));
evaluate_equation(eq);

}

// If we made it to this point, all that's left of the equation is a single number. Return it;
return eq;
}

18. Dec 23, 2013

### Jamin2112

From the console:

Code (Text):

3*(6+1)
21
2+2
4
2+(2)
4
5+6*7+8
47
(((6)))
((6))))

Yea, those last two didn't come out right. Still got work to do!

19. Dec 24, 2013

### DrZoidberg

You need to find the outermost pair of parantheses, not the innermost one.

btw. here is an expression evaluator I wrote in Java a while ago.
http://pastebin.com/4y5ELtng

20. Dec 24, 2013

### Jamin2112

You can use recursion by first evaluating the innermost parentheses and then replacing the space taken up by that parenthetical expression with its evaluation, and then reevaluating the entire expression, etc. That way is good because you can easily add a feature that prints out the steps of the evaluation. I got my program to work. I'm still gonna check out yours, though. Mine is a very crude implementation.

Code (Text):

#include <iostream>
#include <string>

std::string evaluate_equation(std::string);
bool valid_equation(const std::string&);
int str2Int(const std::string&);
std::string int2Str(unsigned int);
bool isdigit(char);
std::string multStrs(const std::string&, const std::string&);

int main (int argc, char* const argv[]) {

// Evaluate expressions at the console
std::string eqtn;
while (std::cin >> eqtn) {
if (!valid_equation(eqtn)) {
std::cout << "Equation invalid; not evaluated" << std::endl;
} else {
std::cout << evaluate_equation(eqtn) << std::endl;
}
}

return 0;
}

std::string evaluate_equation(std::string eq) {
/* By the order of operations:
(1) Evaluate the expression in the innermost set of paranthesis, if there is one;
(2) Evaluate multiplication expressions left-to-right;
Use recursion.
*/

// (1):
size_t LRFP = eq.find_last_of('('); // position of last right-facing paranthesis
if (LRFP != std::string::npos) {
// Set NLFP to be the position of the next left-facing paranthesis after the FRFPL
size_t NLFP = LRFP;
while (eq[++NLFP] != ')');
/* Replace the space between FRFP and NLFP inclusive with the evalulation of the expression
between FRFP and NLFP exclusive:
*/
std::string paranstr = eq.substr(LRFP + 1, NLFP - LRFP - 1);
eq.replace(LRFP, NLFP - LRFP + 1, evaluate_equation(paranstr));
return evaluate_equation(eq);
}
// (2):
size_t firstast = eq.find_first_of('*'); // position of first asterisk
if (firstast != std::string::npos) {
// Set num1 and num2 equal to the respective numbers to the left and right of the asterisk:
std::string num1, num2;
size_t num1begin(firstast), num2end(firstast);
while (num1begin > 0 && isdigit(eq[num1begin - 1]))
num1.insert(0, 1, eq[--num1begin]);
int eqsz = eq.size();
while ((num2end + 1) < eqsz && isdigit(eq[num2end + 1]))
num2.push_back(eq[++num2end]);
// Replace the space of the multiplication equation num1*num2 with its evaluation:
eq.replace(num1begin, num2end - num1begin + 1, multStrs(num1, num2));
return evaluate_equation(eq);
}
// (3):
size_t firstplus = eq.find_first_of('+');
if (firstplus != std::string::npos) {
// Set num1 and num2 equal to the respective numbers to the left and right of the asterisk:
std::string num1, num2;
size_t num1begin(firstplus), num2end(firstplus);
while (num1begin > 0 && isdigit(eq[num1begin - 1]))
num1.insert(0, 1, eq[--num1begin]);
int eqsz = eq.size();
while ((num2end + 1) < eqsz && isdigit(eq[num2end + 1]))
num2.push_back(eq[++num2end]);
// Replace the space of the multiplication equation num1*num2 with its evaluation:
eq.replace(num1begin, num2end - num1begin + 1, addStrs(num1, num2));
return evaluate_equation(eq);
}
// If we made it to this point, all that's left of the equation is a single number. Return it;
return eq;
}

bool valid_equation(const std::string& eq) {
std::string::const_iterator it = eq.begin();
// Equation cannot be an empty string:
if (it == eq.end())
return false;
// Count for the number of right-facing and left-facing parantheses:
int RFPcnt(0), LFPcnt(0);
/* Iterate through the characters of the equation to check that it is valid based on
which characters are lined up next to which. For example, if two plus signs are found
next to each other, that makes the equation meaningless and the our function
valid_equation() immediately returns false.
*/
while (it != eq.end()) {
if (*it == '(') {
/*
Rules for a right-facing paranthesis:
(1) Must be followed by a digit or another right-facing paranthesis
(2) Must be preceded by an operator or another right-facing paranthesis or nothing at all
*/
if ((it + 1) == eq.end()) {
return false;
} else {
if (!isdigit(*(it + 1)) && *(it + 1) != '(')
return false;
}
if (it != eq.begin()) {
if (*(it - 1) != '*' && *(it - 1) != '+' && *(it - 1) != '(')
return false;
}
++it; // If we made it here, increment the iterator
++RFPcnt; // Increment the count of right-facing paranthesis as well
} else if (*it == ')') {
/*
Rules for a left-facing paranthesis:
(1) Cannot be the beginning of an equation
(2) Must be preceded by a digit or another left-facing paranthesis
(3) Must be followed by an operator or another left-facing paranthesis or nothing at all
*/
if (it == eq.begin()) {
return false;
} else {
if (!isdigit(*(it - 1)) && *(it - 1) != ')')
return false;
}
if ((it + 1) != eq.end()) {
if (*(it + 1) != '*' && *(it + 1) != '+' && *(it + 1) != ')')
return false;
}
++it; // If we made it here, increment the iterator
++LFPcnt; // Increment the count of left-facing paranthesis as well
} else if (*it == '*' || *it == '+') {
/*
Rules for an operator:
(1) Cannot be the beginning or end of an equation
(2) Must be preceded by a digit or a left-facing paranthesis
(3) Must be followed by a digit or a right-facing paranthesis
*/
if (it == eq.begin() || (it + 1) == eq.end()) {
return false;
} else {
if (!isdigit(*(it - 1)) && *(it - 1) != ')')
return false;
if (!isdigit(*(it + 1)) && *(it + 1) != '(')
return false;
}
++it;
} else if (isdigit(*it)) {
// Simply increment the iterator until it's no longer on a digit:
while (isdigit(*++it));
} else { // If we're here, *it is invalid character
return false;
}
}

if (RFPcnt != LFPcnt)  // Unbalanced paranthesis?
return false;

return true; // If we made it here, the equation is valid; return true

}

// Given a decimal representation as a string s, str2Int(s) returns the corresponding decimal as an int
int str2Int(const std::string& str) {
std::string::const_iterator it = str.begin();
if (it == str.end()) {
std::cout << "Invalid parameter: str2Int() only accepts non-empty strings." << std::endl;
return -1;
}
int sum = 0;
while (it != str.end()) {
if (!isdigit(*it)) {
std::cout << "Invalid paramter: str2Int only accepts strings of chars '0' through '9'" << std::endl;
return -1;
} else {
sum *= 10;
sum += *it++ - '0';
}
}
return sum;
}

// Given an int n, int2Str(n) return the string representation
std::string int2Str(unsigned int n) {
std::string retstring;
if (n == 0) {
retstring.push_back('0');
return retstring;
} else {
while (n != 0) {
retstring.insert(0, 1, char('0' + n % 10));
n /= 10;
}
return retstring;
}
}

// Function that checks whether a given character is in one of '0', '1', ..., '9'
bool isdigit(char c) {
return (c >= '0' && c <= '9');
}

// Adds 2 string decimal representations and returns the sum as a string decimal representation
std::string addStrs(const std::string& s1, const std::string& s2) {
return (int2Str(str2Int(s1) + str2Int(s2)));
}

// Multiplies 2 string decimal representations and returns the sum as a string decimal representation
std::string multStrs(const std::string& s1, const std::string& s2) {
return (int2Str(str2Int(s1) * str2Int(s2)));
}