# Complexity for generating strings of length k from CNF grammar

• Comp Sci
CGandC
Homework Statement:
For the algorithm below, explain whether the time and space complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##. Assume the size of the rules set is constant.
Relevant Equations:
Big-Oh notation, context-free languages
Here's the following code I've written for generating strings of length k given a CNF grammar ( Chomsky Normal Form grammar ) ( The code is a recursive variant of the CYK algorithm ). The algorithm is recursive and uses memoization.

Python:
def generate_language(rule_dict, start_var, k):
mem = dict()
return generate_language_rec(rule_dict, start_var, k, mem)

def generate_language_rec(rule_dict, var, k, mem):
if (var, k) in mem:
return mem[(var, k)]

s = set()
if k == 0:
mem[(var, k)] = s
return s

if k == 1:
for x in rule_dict[var]:
if len(x) == 1:
s.update(x)
mem[(var, k)] = s
return s

for var_rule in rule_dict[var]:
if len(var_rule) == 2:
X, Y = var_rule, var_rule
for j in range(1, k):
s1 = generate_language_rec(rule_dict, X, j  , mem)
s2 = generate_language_rec(rule_dict, Y, k-j, mem)
s.update( { str1 + str2 for str1 in s1 for str2 in s2 }.union({ str2 + str1 for str1 in s1 for str2 in s2 } ) ) ## concatenating strings in s1 and s2 in both ways , i.e., concatenating strings from st1 with strings from st2 and then I concatenate strings from st2 with strings in st1
mem[(var, k)] = s
return s

The algorithm works fine, here's an input example:
Python:
rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
res = generate_language(rule_dict, "S", 5)

Next, I was asked to analyze both the Time-complexity of the algorithm and its Space-complexity ( Space-complexity excluding the arguments of the function ), under the assumption that the rules set ( rules_dict ) of the grammar is of constant length. I was asked to explain whether the complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##.

My answer was that the both the time and space complexities are Polynomial ( I coudn't explain for the space-complexity). Here's my explanation:

For Time-complexity:

For each recursive call with input ##k## , we pass over all rules whose derivation leads to two variables X and Y,
For variable X we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##
For variable Y we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##

Note, because of the Memoization, each recursive call with input ## k ## will have ## O(2(k-1))## recursive calls.
Hence,
For variable X, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total
For variable Y, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total

Since we assumed the size of the set of rules to be constant then the total time-complexity of the algorithm is ## O(k^2) + O(k^2) = O(k^2) ##
so the time-complexity is polynomial.

For Space-complexity:

I couldn't figure it out, maybe the space complexity is also ## O(k^2) ##? I couldn't explain it, but I think the answer lies in the lines:
Python:
                s1 = generate_language_rec(rule_dict, X, j  , mem)
s2 = generate_language_rec(rule_dict, Y, k-j, mem)

Questions:
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
How can I proceed to analyze the space-complexity? I feel lost.

Thanks in advance for any help!

Last edited:

Homework Helper
Gold Member
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
It looks credible, but I haven't checked that the analysis matches the code. Why don't you check it experimentally?

How can I proceed to analyze the space-complexity? I feel lost.
Add up the size of all the memoized items?

• CGandC
CGandC
1.
I've checked the running time of the code experimentally for different inputs, the code part of the analysis is:

Python:
rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
t_0 = time.perf_counter()
strings  = generate_language(rule_dict, "S", 10)
t_1 = time.perf_counter()
print(" DELTA ",t_1-t_0)
For k=5: DELTA 0.0005285000000001538
For k=6: DELTA 0.001191899999999968
For k=7: DELTA 0.0022890999999999884
For k=8: DELTA 0.004781599999999997
For k=9: DELTA 0.011498800000000031
For k=10: DELTA 0.022812100000000002

It seems the time-complexity is linear ( of the form y=2x ) Instead of polynomial. Maybe I was wrong in my analysis that the time-complexity is polynomial?

2. I think the Space-complexity will be linear, explanation:
The depth of the recursion tree is ## O(k) ##.
The function 'generate_language_rec' has a space-complexity of ## O(1) ## if we disregard recursive calls and input sizes.
We store all recursive calls for a constant number of variables ( since we assumed the size of the set of rules to be constant ) in a memoization table.
For input ## k ##, for two variables X, Y, for each of them will be produced ## O(2(k-1)) \in O(k) ## recursive calls ( the calls will be with inputs ##1,...,k-1## for X, ## 1,...,k-1 ## for Y ), each such call will be put in a memoization table. So each variable causes ## O(k) ## calls in total ( which is what determines the depth of the recursion ) because of the memoization , hence, since the size of the set of rules is constant, the total space complexity is ## O(k) \cdot O(1) \in O(k) ##

( Maybe the explanation is a little hard to understand, but I could write it better if I sat more time on it )

This is more reliable experimentally: in this case just print sum(map(len, mem)) at the end.