# Complexity for generating strings of length k from CNF grammar

• Comp Sci
CGandC
Homework Statement:
For the algorithm below, explain whether the time and space complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##. Assume the size of the rules set is constant.
Relevant Equations:
Big-Oh notation, context-free languages
Here's the following code I've written for generating strings of length k given a CNF grammar ( Chomsky Normal Form grammar ) ( The code is a recursive variant of the CYK algorithm ). The algorithm is recursive and uses memoization.

Python:
def generate_language(rule_dict, start_var, k):
mem = dict()
return generate_language_rec(rule_dict, start_var, k, mem)

def generate_language_rec(rule_dict, var, k, mem):
if (var, k) in mem:
return mem[(var, k)]

s = set()
if k == 0:
mem[(var, k)] = s
return s

if k == 1:
for x in rule_dict[var]:
if len(x) == 1:
s.update(x[0])
mem[(var, k)] = s
return s

for var_rule in rule_dict[var]:
if len(var_rule) == 2:
X, Y = var_rule[0], var_rule[1]
for j in range(1, k):
s1 = generate_language_rec(rule_dict, X, j  , mem)
s2 = generate_language_rec(rule_dict, Y, k-j, mem)
s.update( { str1 + str2 for str1 in s1 for str2 in s2 }.union({ str2 + str1 for str1 in s1 for str2 in s2 } ) ) ## concatenating strings in s1 and s2 in both ways , i.e., concatenating strings from st1 with strings from st2 and then I concatenate strings from st2 with strings in st1
mem[(var, k)] = s
return s

The algorithm works fine, here's an input example:
Python:
rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
res = generate_language(rule_dict, "S", 5)

Next, I was asked to analyze both the Time-complexity of the algorithm and its Space-complexity ( Space-complexity excluding the arguments of the function ), under the assumption that the rules set ( rules_dict ) of the grammar is of constant length. I was asked to explain whether the complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##.

My answer was that the both the time and space complexities are Polynomial ( I coudn't explain for the space-complexity). Here's my explanation:

For Time-complexity:

For each recursive call with input ##k## , we pass over all rules whose derivation leads to two variables X and Y,
For variable X we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##
For variable Y we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##

Note, because of the Memoization, each recursive call with input ## k ## will have ## O(2(k-1))## recursive calls.
Hence,
For variable X, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total
For variable Y, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total

Since we assumed the size of the set of rules to be constant then the total time-complexity of the algorithm is ## O(k^2) + O(k^2) = O(k^2) ##
so the time-complexity is polynomial.

For Space-complexity:

I couldn't figure it out, maybe the space complexity is also ## O(k^2) ##? I couldn't explain it, but I think the answer lies in the lines:
Python:
                s1 = generate_language_rec(rule_dict, X, j  , mem)
s2 = generate_language_rec(rule_dict, Y, k-j, mem)

Questions:
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
How can I proceed to analyze the space-complexity? I feel lost.

Thanks in advance for any help!

Last edited:

Homework Helper
Gold Member
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
It looks credible, but I haven't checked that the analysis matches the code. Why don't you check it experimentally?

How can I proceed to analyze the space-complexity? I feel lost.
Add up the size of all the memoized items?

CGandC
CGandC
1.
I've checked the running time of the code experimentally for different inputs, the code part of the analysis is:

Python:
rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
t_0 = time.perf_counter()
strings  = generate_language(rule_dict, "S", 10)
t_1 = time.perf_counter()
print(" DELTA ",t_1-t_0)
For k=5: DELTA 0.0005285000000001538
For k=6: DELTA 0.001191899999999968
For k=7: DELTA 0.0022890999999999884
For k=8: DELTA 0.004781599999999997
For k=9: DELTA 0.011498800000000031
For k=10: DELTA 0.022812100000000002

It seems the time-complexity is linear ( of the form y=2x ) Instead of polynomial. Maybe I was wrong in my analysis that the time-complexity is polynomial?

2. I think the Space-complexity will be linear, explanation:
The depth of the recursion tree is ## O(k) ##.
The function 'generate_language_rec' has a space-complexity of ## O(1) ## if we disregard recursive calls and input sizes.
We store all recursive calls for a constant number of variables ( since we assumed the size of the set of rules to be constant ) in a memoization table.
For input ## k ##, for two variables X, Y, for each of them will be produced ## O(2(k-1)) \in O(k) ## recursive calls ( the calls will be with inputs ##1,...,k-1## for X, ## 1,...,k-1 ## for Y ), each such call will be put in a memoization table. So each variable causes ## O(k) ## calls in total ( which is what determines the depth of the recursion ) because of the memoization , hence, since the size of the set of rules is constant, the total space complexity is ## O(k) \cdot O(1) \in O(k) ##

( Maybe the explanation is a little hard to understand, but I could write it better if I sat more time on it )

This is more reliable experimentally: in this case just print sum(map(len, mem)) at the end.