Complexity for generating strings of length k from CNF grammar

CGandC · Dec 28, 2021

Here's the following code I've written for generating strings of length k given a CNF grammar ( Chomsky Normal Form grammar ) ( The code is a recursive variant of the CYK algorithm ). The algorithm is recursive and uses memoization.

Python:

def generate_language(rule_dict, start_var, k):
    mem = dict()
    return generate_language_rec(rule_dict, start_var, k, mem)def generate_language_rec(rule_dict, var, k, mem):
    if (var, k) in mem:
        return mem[(var, k)]

    s = set()
    if k == 0:
        s.add("")
        mem[(var, k)] = s
        return s

    if k == 1:
        for x in rule_dict[var]:
            if len(x) == 1:
                s.update(x[0])
        mem[(var, k)] = s
        return s

    for var_rule in rule_dict[var]:
        if len(var_rule) == 2:
            X, Y = var_rule[0], var_rule[1]
            for j in range(1, k):
                s1 = generate_language_rec(rule_dict, X, j  , mem)
                s2 = generate_language_rec(rule_dict, Y, k-j, mem)
                s.update( { str1 + str2 for str1 in s1 for str2 in s2 }.union({ str2 + str1 for str1 in s1 for str2 in s2 } ) ) ## concatenating strings in s1 and s2 in both ways , i.e., concatenating strings from st1 with strings from st2 and then I concatenate strings from st2 with strings in st1
    mem[(var, k)] = s
    return s

The algorithm works fine, here's an input example:

Python:

rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
res = generate_language(rule_dict, "S", 5)

Next, I was asked to analyze both the Time-complexity of the algorithm and its Space-complexity ( Space-complexity excluding the arguments of the function ), under the assumption that the rules set ( rules_dict ) of the grammar is of constant length. I was asked to explain whether the complexities are Linear, Polynomial or Exponential ( find a tight upper-bound ) in ##k ##.

My answer was that the both the time and space complexities are Polynomial ( I coudn't explain for the space-complexity). Here's my explanation:

For Time-complexity:

For each recursive call with input ##k## , we pass over all rules whose derivation leads to two variables X and Y,
For variable X we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##
For variable Y we check if it can be used to produce a set of strings of lengths ## 1,...,k-1 ##

Note, because of the Memoization, each recursive call with input ## k ## will have ## O(2(k-1))## recursive calls.
Hence,
For variable X, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total
For variable Y, we'll have ## \sum_{i=0}^{k-1} O(2(i-1)) = O(k^2) ## recursive calls in total

Since we assumed the size of the set of rules to be constant then the total time-complexity of the algorithm is ## O(k^2) + O(k^2) = O(k^2) ##
so the time-complexity is polynomial.

For Space-complexity:

I couldn't figure it out, maybe the space complexity is also ## O(k^2) ##? I couldn't explain it, but I think the answer lies in the lines:

Python:

                s1 = generate_language_rec(rule_dict, X, j  , mem)
                s2 = generate_language_rec(rule_dict, Y, k-j, mem)

Questions:
What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?
How can I proceed to analyze the space-complexity? I feel lost.

Thanks in advance for any help!

pbuk · Dec 29, 2021

CGandC said:

What do you think about the time-complexity of the above algorithm? am I correct in my analysis that it is polynomial?

It looks credible, but I haven't checked that the analysis matches the code. Why don't you check it experimentally?

CGandC said:

How can I proceed to analyze the space-complexity? I feel lost.

Add up the size of all the memoized items?

CGandC · Dec 30, 2021

1.
I've checked the running time of the code experimentally for different inputs, the code part of the analysis is:

Python:

rule_dict = {"S": {"AB", "BC"}, "A": {"BA", "a"}, "B": {"CC", "b"}, "C": {"AB", "a"}}
t_0 = time.perf_counter()
strings  = generate_language(rule_dict, "S", 10)
t_1 = time.perf_counter()
print(" DELTA ",t_1-t_0)

For k=5: DELTA 0.0005285000000001538
For k=6: DELTA 0.001191899999999968
For k=7: DELTA 0.0022890999999999884
For k=8: DELTA 0.004781599999999997
For k=9: DELTA 0.011498800000000031
For k=10: DELTA 0.022812100000000002

It seems the time-complexity is linear ( of the form y=2x ) Instead of polynomial. Maybe I was wrong in my analysis that the time-complexity is polynomial?

2. I think the Space-complexity will be linear, explanation:
The depth of the recursion tree is ## O(k) ##.
The function 'generate_language_rec' has a space-complexity of ## O(1) ## if we disregard recursive calls and input sizes.
We store all recursive calls for a constant number of variables ( since we assumed the size of the set of rules to be constant ) in a memoization table.
For input ## k ##, for two variables X, Y, for each of them will be produced ## O(2(k-1)) \in O(k) ## recursive calls ( the calls will be with inputs ##1,...,k-1## for X, ## 1,...,k-1 ## for Y ), each such call will be put in a memoization table. So each variable causes ## O(k) ## calls in total ( which is what determines the depth of the recursion ) because of the memoization , hence, since the size of the set of rules is constant, the total space complexity is ## O(k) \cdot O(1) \in O(k) ##

( Maybe the explanation is a little hard to understand, but I could write it better if I sat more time on it )

pbuk · Dec 30, 2021

CGandC said:

I've checked the running time of the code experimentally for different inputs

For a number of reasons measuring execution time is not a good way of experimentally estimating time complexity, particularly with small values of n. Instead set a counter inside the tightest loop and consider the theoretical time complexity of the slowest action in that loop.

CGandC said:

I think the Space-complexity will be linear

This is more reliable experimentally: in this case just print sum(map(len, mem)) at the end.

Complexity for generating strings of length k from CNF grammar

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Bending Stress and Shear Stress

Four L-shaped members: Mechanical Analysis Problem

Engineering Joint and Marginal Distributions of a Randomly Selected Test Answer

Engineering Full bridge circuit with inductor and resistor

Engineering AM-AM and AM-PM graph generation in LTSpice

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight