Complexity for generating strings of length k from CNF grammar

Click For Summary
SUMMARY

The discussion centers on analyzing the time and space complexities of a recursive algorithm for generating strings of length k from a Chomsky Normal Form (CNF) grammar using memoization. The algorithm, implemented in Python, exhibits a time complexity of O(k²) based on the recursive calls made for each variable in the grammar. However, further analysis suggests that the time complexity may actually be linear, O(k), as indicated by experimental execution times. The space complexity is determined to be O(k) due to the depth of the recursion tree and the memoization of results for a constant number of grammar rules.

PREREQUISITES
  • Understanding of Chomsky Normal Form (CNF) grammar
  • Familiarity with recursive algorithms and memoization techniques
  • Basic knowledge of time and space complexity analysis
  • Proficiency in Python programming, particularly with functions and dictionaries
NEXT STEPS
  • Study the CYK algorithm for parsing CNF grammars
  • Learn about memoization techniques in depth
  • Explore time complexity analysis methods, including theoretical and empirical approaches
  • Investigate space complexity considerations in recursive algorithms
USEFUL FOR

Computer scientists, algorithm developers, and students studying formal languages and automata theory will benefit from this discussion, particularly those interested in string generation from grammars and complexity analysis.

CGandC
Messages
326
Reaction score
34
Homework Statement
For the algorithm below, explain whether the time and space complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##. Assume the size of the rules set is constant.
Relevant Equations
Big-Oh notation, context-free languages
Here's the following code I've written for generating strings of length k given a CNF grammar ( Chomsky Normal Form grammar ) ( The code is a recursive variant of the CYK algorithm ). The algorithm is recursive and uses memoization.

Python:
def generate_language(rule_dict, start_var, k):
    mem = dict()
    return generate_language_rec(rule_dict, start_var, k, mem)def generate_language_rec(rule_dict, var, k, mem):
    if (var, k) in mem:
        return mem[(var, k)]

    s = set()
    if k == 0:
        s.add("")
        mem[(var, k)] = s
        return s

    if k == 1:
        for x in rule_dict[var]:
            if len(x) == 1:
                s.update(x[0])
        mem[(var, k)] = s
        return s

    for var_rule in rule_dict[var]:
        if len(var_rule) == 2:
            X, Y = var_rule[0], var_rule[1]
            for j in range(1, k):
                s1 = generate_language_rec(rule_dict, X, j  , mem)
                s2 = generate_language_rec(rule_dict, Y, k-j, mem)
                s.update( { str1 + str2 for str1 in s1 for str2 in s2 }.union({ str2 + str1 for str1 in s1 for str2 in s2 } ) ) ## concatenating strings in s1 and s2 in both ways , i.e., concatenating strings from st1 with strings from st2 and then I concatenate strings from st2 with strings in st1
    mem[(var, k)] = s
    return s

The algorithm works fine, here's an input example:
Python:
rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
res = generate_language(rule_dict, "S", 5)

Next, I was asked to analyze both the Time-complexity of the algorithm and its Space-complexity ( Space-complexity excluding the arguments of the function ), under the assumption that the rules set ( rules_dict ) of the grammar is of constant length. I was asked to explain whether the complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##.

My answer was that the both the time and space complexities are Polynomial ( I coudn't explain for the space-complexity). Here's my explanation:

For Time-complexity:

For each recursive call with input ##k## , we pass over all rules whose derivation leads to two variables X and Y,
For variable X we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##
For variable Y we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##

Note, because of the Memoization, each recursive call with input ## k ## will have ## O(2(k-1))## recursive calls.
Hence,
For variable X, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total
For variable Y, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total

Since we assumed the size of the set of rules to be constant then the total time-complexity of the algorithm is ## O(k^2) + O(k^2) = O(k^2) ##
so the time-complexity is polynomial.

For Space-complexity:

I couldn't figure it out, maybe the space complexity is also ## O(k^2) ##? I couldn't explain it, but I think the answer lies in the lines:
Python:
                s1 = generate_language_rec(rule_dict, X, j  , mem)
                s2 = generate_language_rec(rule_dict, Y, k-j, mem)
Questions:
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
How can I proceed to analyze the space-complexity? I feel lost.

Thanks in advance for any help!
 
Last edited:
Physics news on Phys.org
CGandC said:
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
It looks credible, but I haven't checked that the analysis matches the code. Why don't you check it experimentally?

CGandC said:
How can I proceed to analyze the space-complexity? I feel lost.
Add up the size of all the memoized items?
 
  • Like
Likes   Reactions: CGandC
1.
I've checked the running time of the code experimentally for different inputs, the code part of the analysis is:

Python:
rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
t_0 = time.perf_counter()
strings  = generate_language(rule_dict, "S", 10)
t_1 = time.perf_counter()
print(" DELTA ",t_1-t_0)
For k=5: DELTA 0.0005285000000001538
For k=6: DELTA 0.001191899999999968
For k=7: DELTA 0.0022890999999999884
For k=8: DELTA 0.004781599999999997
For k=9: DELTA 0.011498800000000031
For k=10: DELTA 0.022812100000000002

It seems the time-complexity is linear ( of the form y=2x ) Instead of polynomial. Maybe I was wrong in my analysis that the time-complexity is polynomial?

2. I think the Space-complexity will be linear, explanation:
The depth of the recursion tree is ## O(k) ##.
The function 'generate_language_rec' has a space-complexity of ## O(1) ## if we disregard recursive calls and input sizes.
We store all recursive calls for a constant number of variables ( since we assumed the size of the set of rules to be constant ) in a memoization table.
For input ## k ##, for two variables X, Y, for each of them will be produced ## O(2(k-1)) \in O(k) ## recursive calls ( the calls will be with inputs ##1,...,k-1## for X, ## 1,...,k-1 ## for Y ), each such call will be put in a memoization table. So each variable causes ## O(k) ## calls in total ( which is what determines the depth of the recursion ) because of the memoization , hence, since the size of the set of rules is constant, the total space complexity is ## O(k) \cdot O(1) \in O(k) ##

( Maybe the explanation is a little hard to understand, but I could write it better if I sat more time on it )
 
CGandC said:
I've checked the running time of the code experimentally for different inputs
For a number of reasons measuring execution time is not a good way of experimentally estimating time complexity, particularly with small values of n. Instead set a counter inside the tightest loop and consider the theoretical time complexity of the slowest action in that loop.
CGandC said:
I think the Space-complexity will be linear
This is more reliable experimentally: in this case just print sum(map(len, mem)) at the end.
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 75 ·
3
Replies
75
Views
7K
  • · Replies 26 ·
Replies
26
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K