Having trouble with algorithm to build a function tree

AI Thread Summary
The discussion centers around developing an algorithm to construct a function tree from mathematical expressions, specifically focusing on parsing expressions like x^2+x+1*5. The proposed method involves breaking down the expression into components, where each expression and operator is identified in a sequence. The user describes a straightforward approach for building trees when operators are in a specific order of precedence, but faces challenges when operators have varying precedence levels. Suggestions include exploring descent parsers, which are commonly used in compiler design to decompose strings into operations. The conversation also touches on using regular expressions for tokenization and outlines a recursive descent parsing technique, providing pseudocode for parsing mathematical expressions. Recommendations for further reading include "C: The Complete Reference" and "Compilers" by Aho, which could offer insights into writing effective parsers.
Jamin2112
Messages
973
Reaction score
12
So I've been writing an algorithm to build a function tree and I believe I'm aaaaallllllmost there. The intent is to take a math function like

Code:
x^2+x+1*5

and build it into

Code:
            /        +         \
         / +  \               /  \
      / ^ \     x           1 *  5
    x       2

My idea to parse the function is to think of it being like

Code:
[Exp_0][Op_1][Exp_1][Op_2][Exp_2]...[Op_n][Exp_n]

where [Exp_i] represents an expression (x, a number, or a function in parentheses) and [Op_i] represents an operator. In the above function

Code:
Exp_0 = 'x'
Op_1 = '^'
Exp_1 = 2
Op_2 = '+'
Exp_2 = 'x'
Op_3 = '+'
Exp_3 = 1
Op_4 = '*'
Exp_4 = 5

I know how to write the algorithm if the operators are in order of ascending or descending precedence. For instance, something like

x^2*2+1

is easy to build into a tree because as you grab the expressions from left to right you simply take the previous tree and make it be the left-hand side of a new tree and make the right-hand side of the tree be the current expression. The building of that tree (skipping a few steps) would look like

Code:
 / ^ \
x     2
,
Code:
    /      *    \
 / ^ \          2
x     2
,
Code:
                    /   +   \
                 /             1
          /     *    \
       / ^ \          2
      x     2

The algorithm is:

Code:
Get Exp_0

If there's no Op_1, return Exp_0; 
Else, 

Make a tree T like

     /                  [Op_1]             \
[Exp_0]                                  [Exp_1]

for (i = 2; i <= n; ++i)
{
      temp = T;
      T = new Tree; 
      T.leftHandSide = temp;
      T.oper = [Op_i];
      T.rightHandSide = [Exp_i];
}

Hopefully that pseudo-code is legible to you.

The algorithm for making a tree when the precedence of the operators is reversed, e.g.

1+2*x^2

is slightly more complicated, but I know how to write. I can also write an algorithm that combines the 2 algorithms I mentioned.

The problem is that I can't think of a general algorithm that looks at the precedence of each operator found and then rebuilds the tree accordingly. It seems like it would be extremely complex, and I've gotten close to making it, but maybe I'm going about things completely wrong.

Can someone give me a hint to push me along?
 
Technology news on Phys.org
Compiler developers face exactly the same problem - decomposing strings into atomic operations. Have a Google for "smallc.c" the name may vary slightly. It is a C interpreter written in C, functions the way the BASIC language originally did. It uses a descent parser - which is what you need. Somewhere on my shelves is a book on C that has the code. If I bump into it I will post the reference.

At any rate check out a descent parser.
 
Found what you want: 'C: The complete reference Fourth Edition' Herbert Schildt, 2000, Obourne
You want Chapter 29.

I DO NOT fully recommend the book as a C reference - the edition I have has more than several errors.
 
jim mcnamara said:
Found what you want: 'C: The complete reference Fourth Edition' Herbert Schildt, 2000, Obourne
You want Chapter 29.

I DO NOT fully recommend the book as a C reference - the edition I have has more than several errors.

Awesome. I'll take a look at it.
 
Pseudocode for this would look like this.

Tokenise the input. I would use regex.
Code:
20 * (32 + 4) + 1 --> integer mulop bra integer addop integer ket addop integer
Now the grammar:
Code:
<expression> =
        <term> <addop> <term>
    or  <term>
 
<term> =
        <factor> <mulop> <factor>
    or  <factor>

<factor> =
        <integer>
    or  <bra> <expression> <ket>
Now recursive descent parsing
Code:
def parse (tokens)
    value = expression ()
    return value
 
def expression ()
    value = term ()
    while 1
        if tokens[index].type == ADDOP
            index += 1
            value += term ()
        else if tokens[index].type == SUBOP
            index += 1
            value -= term ()
        else break
    return value

def term ()
    value = factor ()
    while 1
        if tokens[index].type == MULOP
            index += 1
            value *= factor ()
        else if tokens[index].type == DIVOP
            index += 1
            value /= factor ()  
        else break
    return value

def factor ()
    if tokens[index].type == INTEGER
        return tokens[index++].value
    else if tokens[index].type == BRA
        index += 1  # eat bracket
        result = expression()
        if tokens[index].type != KET
            throw SyntaxError
        index += 1  # eat bracket
    return result
 
def main ()
    tokens = lex ("  2 + 3 * (2+2)  ")
    answer = parse (tokens)
    print (answer)

Read Compilers by Aho (the Dragon Book) or just google how to write a parser. I did this quick and there might be an error or two.

Edited. Fixed expression, term.
 
Last edited:
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I had a Microsoft Technical interview this past Friday, the question I was asked was this : How do you find the middle value for a dataset that is too big to fit in RAM? I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube...

Similar threads

Back
Top