# How do Computer Algebra Systems handle the scope of variables?

TL;DR Summary
How do Computer Algebra Systems (CAS) handle the scope of variables.
Computer languages handle the scope of variables in a precise way so that if one symbol, such as "k" is used in different contexts, the program keeps these separate. When sophisticated human beings re-use symbols in writing mathematics, they can keep things straight, but I don't think they follow precise rules. So how do CAS programs handle the translation from human written expressions to computer structures?

For example, a human given the information:
##k = 2m + 1##
##V = k^2 + \sum_{k=1}^2 B^k##
would probably understand that the "##k##" in the first equation corresponds to the "##k##" of the "##k^2##" in the second equation, but not the "##k##" in "##\sum_{k=1}^2 B^k##". But could this information be entered in a CAS program as it stands? - or would one need to avoid the duplicate use of "##k##"?

Mentor
Summary:: How do Computer Algebra Systems (CAS) handle the scope of variables.

Computer languages handle the scope of variables in a precise way so that if one symbol, such as "k" is used in different contexts, the program keeps these separate. When sophisticated human beings re-use symbols in writing mathematics, they can keep things straight, but I don't think they follow precise rules. So how do CAS programs handle the translation from human written expressions to computer structures?

For example, a human given the information:
##k = 2m + 1##
##V = k^2 + \sum_{k=1}^2 B^k##
would probably understand that the "##k##" in the first equation corresponds to the "##k##" of the "##k^2##" in the second equation, but not the "##k##" in "##\sum_{k=1}^2 B^k##". But could this information be entered in a CAS program as it stands? - or would one need to avoid the duplicate use of "##k##"?
Your example doesn't make sense to me, and I don't see how it would make any sense to a CAS. It's not reasonable for a CAS to comprehend that the first instance of k in the second equation means something different from the index k in the summation.
I would expect a CAS to issue a warning or throw an exception.

The difference between the two different k tokens is lexical scope.
It will depend on how the CAS analyses and handles scope.
Will it overwrite the first mentioned k during the independent accumulation loop?

I would expect a CAS to issue a warning or throw an exception.

I would expect a CAS to understand that there is no difference between ##\sum_{k=1}^2 B^k## and ##\sum_{w=1}^2 B^w##.

Similarly, ##x + \int_{x=0}^{1} x^2 dx = x + \int_{y=0}^{1} y^2 dy##.

Similarly:
##\sum_{k=m}^{m+2} ( k^2 \sum_{j=1}^2( (j+k) \sum_{n=1}^2 jn))##
=
##\sum_{k=m}^{m+2} ( k^2 \sum_{j=1}^2( (j+k) \sum_{k=1}^2 jk))##

Mentor
I would expect a CAS to understand that there is no difference between
##\sum_{k=1}^2 B^k## and ##\sum_{w=1}^2 B^w##.

Similarly, ##x + \int_{x=0}^{1} x^2 dx = x + \int_{y=0}^{1} y^2 dy##.
I haven't used a CAS in a long time, but I wouldn't expect such software to understand the difference. As I said, what I would expect is that the CAS would report these ambiguities as errors, but I don't have any way to verify my suspicions.

It will depend entirely on the CAS. Name the product. Read the documentation. Carry out an experiment to test the particular CAS software you will use.

• glappkaeft and pbuk
Homework Helper
2022 Award
If it has the potential to confuse a human (for example, using the same symbol as a free variable and a dummy variable in the same expression) then
(1) it's bad style and you shouldn't do it, and
(2) it has the potential to confuse a CAS.

• anorlunda
My question isn't intended as "How do I effectively interface with a CAS?"

For the pupose of writing a CAS, what advice can be given about data structures and algorithms? What kind of data structure would one use to represent symbolic expressions and how would information about the scope of variables be included in this structure? For example, if the user wishes to replace the symbol "y" by "2 cos x", what data would represent the "context" or "scope" of this request? What algorithm would be used to perform the replacement only in the correct context?

For example, a simplistic approach would be to assign each occurence of each symbol an integer that represents a context. This doesn't capture the aspect of contexts that one context can be a sub-context of another. Is that a critical drawback?

Homework Helper
Gold Member
The answers to your question are matters of opinion, so here are mine.
For the pupose of writing a CAS, what advice can be given about data structures and algorithms?

They should be made as simple as possible, but not simpler.

What kind of data structure would one use to represent symbolic expressions and how would information about the scope of variables be included in this structure? For example, if the user wishes to replace the symbol "y" by "2 cos x", what data would represent the "context" or "scope" of this request? What algorithm would be used to perform the replacement only in the correct context?
These are design choices involving trade-offs. Making choices that are 'loose' can result in an application/DSL that is easy to learn and quick for the user to achieve solutions particularly for simpler problems whereas making choices that are strict may achieve more robust solutions.

For example, a simplistic approach would be to assign each occurence of each symbol an integer that represents a context. This doesn't capture the aspect of contexts that one context can be a sub-context of another. Is that a critical drawback?
If that were a consequence then yes IMHO that would be a critical drawback.

I think the question comes down to the explicit declaration of symbol name and type.
Your examples appear to implicitly declare symbol type by it's position in the syntax.
Are iterators always local integers? Might you need to declare and manipulate real and complex differently?
I would expect a CAS to define operators as a function of functions, in a tree. You will need to somehow differentiate between a symbol being passed down to a function, or to be local within that function. That will need to be part of your function declaration.

• Tom.G
I think the question comes down to the explicit declaration of symbol name and type.
Your examples appear to implicitly declare symbol type by it's position in the syntax.

To clarify: My thinking about a CAS, doesn't include the ambition of having a program that can parse latex and automatically create a data structure that gives variables the appropriate scope. So, for example, a user who wanted to generate an expression representing a sum might have to call a procedure Create_A_Sum and specify appropriated inputs. Yes, in general, the symbol used for the iterator would have have a scope limited to that expression. But within the expression the same symbol might reappear as a different variable in a sub-expression.

I would expect a CAS to define operators as a function of functions, in a tree. You will need to somehow differentiate between a symbol being passed down to a function, or to be local within that function. That will need to be part of your function declaration.

I think there is an important distinction between algebra as an abstract language representing mathematical objects versus algebra as the use of symbols by human beings. For example, a human being might represent a 2x2 matrix of real numbers by a single letter "A". Or it might be represented by an array of symbols "a,b,c,d". (Or "2.2, 7.5 6, 9") Or it might be represented by an array of indexed variables "a, a, a, a".

I agree that a tree seems the appropriate data structure to handle the variety of possible representations that human beings use. It isn't clear to me what information should be kept on the nodes. In particular, how would scope be represented?

Mentor
Yes, in general, the symbol used for the iterator would have have a scope limited to that expression. But within the expression the same symbol might reappear as a different variable in a sub-expression.
With regard to having an expression in which a single symbol represents two different variables, I'll just restate what @pasmith wrote earlier:
If it has the potential to confuse a human (for example, using the same symbol as a free variable and a dummy variable in the same expression) then
(1) it's bad style and you shouldn't do it, and
(2) it has the potential to confuse a CAS.

With regard to having an expression in which a single symbol represents two different variables, I'll just restate what @pasmith wrote earlier:

I agree that certain conventions make it simpler for human beings to understand symbolic expressions. In fact, the "scope" of variables is hardly taught in courses on mathematics. ( Students are usually introduced to the topic in symbolic logic or computer science, where the scope of variables is a crucial concept.)

However, a program involving a computer algebra system may automatically create expressions and (to me) whether they are convenient for humans to read is secondary to whether the CAS correctly handles the scope of variables. So data structures and algorithms for doing that are what interests me.

Mentor
However, a program involving a computer algebra system may automatically create expressions and (to me) whether they are convenient for humans to read is secondary to whether the CAS correctly handles the scope of variables.
It's inconceivable to me that a CAS would create any data structure in which a variable would have two or more meanings.

• It's inconceivable to me that a CAS would create any data structure in which a variable would have two or more meanings

Are you talking about a variable or a symbol? Symbols are often given more than one meaning according to the surrounding context. In expressions like:
##\sum_{k=1}^2 B^k + \sum_{k=2}^4 C^k## we have the same symbol "k" used to denote two different variables.

I agree that the concept of "a variable" should denote a unique thing. However, it is common to use the same symbol to denote different variables in different contexts.

This suggests that one way to handle the scope of variables is to distinguish between the identifier of a variable and the symbol associated with a variable. Perhaps a CAS variable could be a data structure that includes some unique identifier (like B134R33) as well as an associated symbol such as "k".

Mentor
For example, a human being might represent a 2x2 matrix of real numbers by a single letter "A". Or it might be represented by an array of symbols "a,b,c,d". (Or "2.2, 7.5 6, 9") Or it might be represented by an array of indexed variables "a, a, a, a".
A computer program could represent a 2x2 matrix of reals either as an array of symbols (e.g. [a, b, c, d]) or as an array of numbers, or in some other way, such as an array of pointers to memory locations. It could also associate the symbol A with the number in the matrix. As a side note, most programming languages (Matlab and Fortran excepted) start their array indexes at 0, not 1.
If A is defined as a two-dimensional array, then the elements would be identified by A, etc., not a.

Are you talking about a variable or a symbol? Symbols are often given more than one meaning according to the surrounding context.
I'm not sure that there is a difference between "variable" and "symbol" as far as programming languages are concerned.

In expressions like: ##\sum_{k=1}^2 B^k + \sum_{k=2}^4 C^k## we have the same symbol "k" used to denote two different variables.
I don't think these are two different variables. In the first summation, k takes on the values 1 and 2, and in the second summation, k takes on the values 2, 3, and 4.

This is the one that seems flaky to me:
Stephen Tashi said:
For example, a human given the information:

##k = 2m + 1##
##V = k^2 + \sum_{k=1}^2 B^k##
Here you have k being used for two entirely different things. It's this example that I find inconceivable that a CAS could make sense of.

A computer program could represent a 2x2 matrix of reals either as an array of symbols (e.g. [a, b, c, d]) or as an array of numbers, or in some other way, such as an array of pointers to memory locations. It could also associate the symbol A with the number in the matrix. As a side note, most programming languages (Matlab and Fortran excepted) start their array indexes at 0, not 1.
If A is defined as a two-dimensional array, then the elements would be identified by A, etc., not a.

I agree that computer programs can use different representations of a matrix.
Are you saying that a way that a human represents a matrix can be mapped to a corresponding way that a computer program represents it? I agree. But to imitate how human beings dynamically vary the representations they use, a CAS needs to be able to switch between representations. With that goal in mind, will it be simpler to implement representations as patterns of symbols versus using specific computer structures (like arrays of real or arrays of strings or arrays of integers etc.) ?

I'm not sure that there is a difference between "variable" and "symbol" as far as programming languages are concerned.

As I said, the same symbol can be used to represent different variables. Consider the symbol "k" when it is used in two different functions. It might be an integer in one function and a more complicated data structure in a different function.

I don't think these are two different variables. In the first summation, k takes on the values 1 and 2, and in the second summation, k takes on the values 2, 3, and 4.
Thinking in terms of computer languages, it is unclear whether different occurrences of "k" would denote the same variable. If both summations were implemented by the same function that used the same local variable "k" then one could argue that the symbol "k" represents a single variable (i.e. the contents of a single memory address) However if the two summations were implemented by two different functions then the local variable "k" in one of the functions wold be a different variable than the local variable "k" in the other function.
Here you have k being used for two entirely different things. It's this example that I find inconceivable that a CAS could make sense of.

I don't find it inconceivable. As another example, consider logical expressions like:

For each x, there exists a y such that { G(y,x) and there exists a w such that L(y,w) }.

This could also be expressed as

For each x, there exists a y such that { G(y,x) and there exists an x such that L(y,x) }.

Quantifiers such as "for each" and "there exists" (and the expressions that follow them) assign a scope to the symbols that are quantified. The concept of scope makes it unnecessary to use a different symbol for each variable.

Homework Helper
Gold Member
This conversation is not taking place on the right level. If you were writing a novel then an analagous inquiry to this:
I agree that a tree seems the appropriate data structure to handle the variety of possible representations that human beings use. It isn't clear to me what information should be kept on the nodes. In particular, how would scope be represented?
could be

"I agree that a number of pieces of paper of equal size bound together with glue seems the appropriate medium to publish a book. It isn't clear to me what information should be provided in order to create the narrative. In particular, how would the motivation of characters be represented?"

I don't think you are currently in a position to consider writing a CAS. Why are you trying to do this?

• Mark44
Thinking in terms of computer languages, it is unclear whether different occurrences of "k" would denote the same variable.
The operators, functions, and variables will be internally identifiable by virtue of their position in the tree, combined with inheritance from their original declarations.

The only problem is with specification of the input equation system to the CAS. It is a problem because you have not defined a sufficient syntax.

• pbuk
Homework Helper
Gold Member
Here you have k being used for two entirely different things. It's this example that I find inconceivable that a CAS could make sense of.
Well Wolfram Alpha makes a fair stab at it Mentor
Well Wolfram Alpha makes a fair stab at it That's very interesting that WA is able to make some sense out of the two equations.

As another example, given:
k=3, x = k^2 + sum[2^k, {k,1,2}]
Wolfram Alpha computes x = 15

Can Wolfram Alpha do algebraic substitutions? The Mathematica examples I read on the web show "/." used in denoting substitution, but I can't get Wolfram Alpha to recognize such syntax.

Mentor
Can Wolfram Alpha do algebraic substitutions?
Evidently so. In this example, WA calculates the following, resulting in a calculated value for x of ##a^2 + 2ab + b^2 + 6##.
##k = a +b##
##x = c^2 + \sum_{k = 1}^2 2^k##
(https://www.wolframalpha.com/input/?i=k=a+++b;x=k^2+++sum(2^k),+k=1+to+2)
Clearly, my earlier replies were wrong about the capabilities of current CAS software.

So how do CAS programs handle the translation from human written expressions to computer structures?

For the pupose of writing a CAS, what advice can be given about data structures and algorithms? What kind of data structure would one use to represent symbolic expressions and how would information about the scope of variables be included in this structure?
I don't believe that we are in a position to answer these questions in much detail, given that both WA and Mathematica are commercial products whose inner workings are proprietary.

• berkeman
Mentor
If youre trying to implement scoping features then I’d look at how compilers and interpreters do it.

In math notation, for summations and products, the dummy indexing variable is given.

For Einstein summation, you’d have to tease it out of the expression where a given dummy variable will be used as a subscript and as a superscript.

Implementation wise, you could consider a dictionary lookup for variable names to get the stack associated with it to get its current value. When scope is changed then you have to push or pop values onto those specific variables.

Another implementation could be a scope stack where each entry is a dictionary of variable names and their values updated according to usage. Changing scope would mean pushing or popping a single scope value.

Last edited:
• pbuk
If youre trying to implement scoping features then I’d look at how compilers and interpreters do it.

I understand copying those techniques for variables that have one type of value. However, in symbolic manipulations, the concept of a value of a variable can be dynamic and it may not have a value restricted to one category of standard data structures (e.g. integers, floats, characters, strings).

For example in symbolic manipulations we may have variable representing a 2x2 matrix of reals and it's "value" might be the string "A". This is sufficient for doing algebraic manipulations on expressions like ##A + A##, ##(A)(A)##, ##A(B+A^{-1})##. At some point in a calculation, a person may wish to introduce a more detailed representation of the matrix ##A##. For example, suppose we want ##A## to be ##\begin{pmatrix} r & 3.2 \\ s & t \end{pmatrix}##

Thinking of an expression as a tree structure the expression ##A+A## could be represented as a node "+" with two daughter nodes, each with the "value" "##A##". To introduce the more detailed representation of ##A## in that expression, I think we would make each of the daughter nodes with value ##A## a parent node with four of its own daughter nodes that have values ##r, 3.2, s, t##.

There is the question of how to keep track of the fact that the scope of the variable ##r## is defined by the scope of the variable ##A##.

For example, an expression like ##\sum_{A \in S} A ## implies that ##A## is a variable being used as an "iterator" over some set ##S##. So ##r## (for that particular variable ##A##) is variable that is an iterator over the same scope that ##A## has- or should I say "a similar" scope?

Mentor
What language are you implementing the CAS in?

In python, one could use lists and tuples to manage the scope of objects like A. There is no one way or known preferred way to tackle this.

Perhaps an abstract syntax tree could help where you decompose the math expression to create the tree and then walk the tree to evaluate it. As the A gets evaluated, the code recognizes the k value and other expressions and evaluates them too. There will likely be a lot of recursion going on during the evaluation as well as scope changes. Scope state could be a tulle with one component being a dictionary of variables and values.

It seems like very interesting problem. I have done some expression evaluation but never with a matrix with embedded expressions.