# Having trouble with algorithm to build a function tree

So I've been writing an algorithm to build a function tree and I believe I'm aaaaallllllmost there. The intent is to take a math function like

Code:
x^2+x+1*5
and build it into

Code:
            /        +         \
/ +  \               /  \
/ ^ \     x           1 *  5
x       2
My idea to parse the function is to think of it being like

Code:
[Exp_0][Op_1][Exp_1][Op_2][Exp_2].........[Op_n][Exp_n]
where [Exp_i] represents an expression (x, a number, or a function in parentheses) and [Op_i] represents an operator. In the above function

Code:
Exp_0 = 'x'
Op_1 = '^'
Exp_1 = 2
Op_2 = '+'
Exp_2 = 'x'
Op_3 = '+'
Exp_3 = 1
Op_4 = '*'
Exp_4 = 5
I know how to write the algorithm if the operators are in order of ascending or descending precedence. For instance, something like

x^2*2+1
is easy to build into a tree because as you grab the expressions from left to right you simply take the previous tree and make it be the left-hand side of a new tree and make the right-hand side of the tree be the current expression. The building of that tree (skipping a few steps) would look like

Code:
 / ^ \
x     2
,
Code:
    /      *    \
/ ^ \          2
x     2
,
Code:
                    /   +   \
/             1
/     *    \
/ ^ \          2
x     2
The algorithm is:

Code:
Get Exp_0

If there's no Op_1, return Exp_0;
Else,

Make a tree T like

/                  [Op_1]             \
[Exp_0]                                  [Exp_1]

for (i = 2; i <= n; ++i)
{
temp = T;
T = new Tree;
T.leftHandSide = temp;
T.oper = [Op_i];
T.rightHandSide = [Exp_i];
}
Hopefully that pseudo-code is legible to you.

The algorithm for making a tree when the precedence of the operators is reversed, e.g.

1+2*x^2
is slightly more complicated, but I know how to write. I can also write an algorithm that combines the 2 algorithms I mentioned.

The problem is that I can't think of a general algorithm that looks at the precedence of each operator found and then rebuilds the tree accordingly. It seems like it would be extremely complex, and I've gotten close to making it, but maybe I'm going about things completely wrong.

Can someone give me a hint to push me along?

Related Programming and Computer Science News on Phys.org
jim mcnamara
Mentor
Compiler developers face exactly the same problem - decomposing strings into atomic operations. Have a Google for "smallc.c" the name may vary slightly. It is a C interpreter written in C, functions the way the BASIC language originally did. It uses a descent parser - which is what you need. Somewhere on my shelves is a book on C that has the code. If I bump into it I will post the reference.

At any rate check out a descent parser.

jim mcnamara
Mentor
Found what you want: 'C: The complete reference Fourth Edition' Herbert Schildt, 2000, Obourne
You want Chapter 29.

I DO NOT fully reccommend the book as a C reference - the edition I have has more than several errors.

Found what you want: 'C: The complete reference Fourth Edition' Herbert Schildt, 2000, Obourne
You want Chapter 29.

I DO NOT fully reccommend the book as a C reference - the edition I have has more than several errors.
Awesome. I'll take a look at it.

Pseudocode for this would look like this.

Tokenise the input. I would use regex.
Code:
20 * (32 + 4) + 1 --> integer mulop bra integer addop integer ket addop integer
Now the grammar:
Code:
<expression> =
or  <term>

<term> =
<factor> <mulop> <factor>
or  <factor>

<factor> =
<integer>
or  <bra> <expression> <ket>
Now recursive descent parsing
Code:
def parse (tokens)
value = expression ()
return value

def expression ()
value = term ()
while 1
index += 1
value += term ()
else if tokens[index].type == SUBOP
index += 1
value -= term ()
else break
return value

def term ()
value = factor ()
while 1
if tokens[index].type == MULOP
index += 1
value *= factor ()
else if tokens[index].type == DIVOP
index += 1
value /= factor ()
else break
return value

def factor ()
if tokens[index].type == INTEGER
else if tokens[index].type == BRA
index += 1  # eat bracket
result = expression()
if tokens[index].type != KET
throw SyntaxError
index += 1  # eat bracket
return result

def main ()
tokens = lex ("  2 + 3 * (2+2)  ")
print (answer)