Arithmetic representation of symbols according to certain rules

Adel Makram · Jun 28, 2022

Hi,
Suppose I am given a sequence of special symbols and I want to produce the next sequence of symbols according to certain rules that transform one symbol into one or more symbols.
They are 10 symbols in total, say; A, B, C, D, E, F, G, H, I, J
The rules are:
A transformed into B and it is written as A -> B. Here are the rules:
A -> B
B -> C
C -> D E
D -> F C
F -> G A
G -> C
E -> H I
H -> J
I -> D
J -> D

The order matter so, D E is not the same as E D.

When given an initial sequence of symbols of a particular length (say 1000), how can I produce the next n-sequence of symbols using Excel?
I thought to represent each symbol with numbers using modular arithmetics or even complex numbers but not sure how far this will lead me. Prime factorization may not help because using the rules one or more symbols can be a prime and composite number at the same time.

fresh_42 · Jun 28, 2022

Does D E mean D or E or D concatenated E?

Adel Makram · Jun 28, 2022

fresh_42 said:

Does D E mean D or E or D concatenated E?

It means C is transformed into D and followed by E in this order (not or). Similarly, F C, G A and H I.

fresh_42 · Jun 28, 2022

Adel Makram said:

It means D is followed by E absolutely. Similarly, F C, G A and H I.

I think it would be easier to write a little program instead of Excel. But be aware of the cycles you have in there.

mfb · Jun 29, 2022

I would use a programming language. What's the order of these replacements? The result will depend on that.

SSequence · Jun 29, 2022

This seems to be an example of a CFG (context free grammar), unless perhaps you wish to use your production rules in a more peculiar or different manner ... or your interpretation of rules is different somehow. I don't remember the definition of CFG off-hand but apparently it seems to be an example.

The problem of word recognition (for strings generated by CFGs) is known to be decidable. However, I think different algorithms would probably lead to somewhat different (theoretical and practical) efficiency. I studied[the basic idea only, not the efficiency analysis] one of the well-known algorithms many years ago, but I don't remember any of it now.

Some useful links:
Some examples: https://en.wikipedia.org/wiki/Context-free_grammar#Examples
Word recognition: https://en.wikipedia.org/wiki/Context-free_grammar#Parsing

pbuk · Jun 29, 2022

Adel Makram said:

When given an initial sequence of symbols of a particular length (say 1000), how can I produce the next n-sequence of symbols using Excel?

Dealing with the fact that the next "sequence" could have, say, 1234 symbols is very tricky to achieve in Excel, I recommend almost any other scripting language in which this would be trivial (Python, JavaScript, Perl...). You could do this using Visual Basic for Applications (VBA) which works with Excel but I really, really don't recommend that.

Note that the word "sequence" has a special meaning in this context, I would call the list of symbols a "sentence" or perhaps a "string".

Adel Makram said:

I thought to represent each symbol with numbers using modular arithmetics or even complex numbers but not sure how far this will lead me. Prime factorization may not help because using the rules one or more symbols can be a prime and composite number at the same time.

I cannot see how this would make it any easier, even if it were possible.

fresh_42 said:

But be aware of the cycles you have in there.

mfb said:

I would use a programming language. What's the order of these replacements? The result will depend on that.

I don't think either cycles or the order of replacements are a problem here, we only apply the rules once in transforming a sentence, so ABCDFGEHIJ -> BCDEFCGACHIJDD.

Adel Makram · Jun 29, 2022

SSequence said:

This seems to be an example of a CFG (context free grammar), unless perhaps you wish to use your production rules in a more peculiar or different manner ... or your interpretation of rules is different somehow. I don't remember the definition of CFG off-hand but apparently it seems to be an example.

The problem of word recognition (for strings generated by CFGs) is known to be decidable. However, I think different algorithms would probably lead to somewhat different (theoretical and practical) efficiency. I studied[the basic idea only, not the efficiency analysis] one of the well-known algorithms many years ago, but I don't remember any of it now.

Some useful links:
Some examples: https://en.wikipedia.org/wiki/Context-free_grammar#Examples
Word recognition: https://en.wikipedia.org/wiki/Context-free_grammar#Parsing

I need to study the statistical structure of these sequences but can not afford to do that when I only got a few symbols. That is why I am looking for a gadget to produce longer and longer sequences based on those rules. Once I am done, I will plot a Markov transitional matrix to see the likelihood of transition of one symbol to all other symbols and can then draw a conclusion about the statistical nature of these language-like symbols.

Also, one potential application is that this context-free language can be used in cryptography. Because it is easier to produce future generations from a given sequence but harder to undo the sequence to a preceding generation.

fresh_42 · Jun 29, 2022

pbuk said:

I don't think either cycles or the order of replacements are a problem here, we only apply the rules once in transforming a sentence, so ABCDFGEHIJ -> BCDEFCGACHIJDD.

If we apply the rules only once, then the entire thread is obsolete since it can be done by hand immediately. If we apply the rules more than once, then order and cycles become important.

jbriggs444 · Jun 29, 2022

pbuk said:

I don't think either cycles or the order of replacements are a problem here, we only apply the rules once in transforming a sentence, so ABCDFGEHIJ -> BCDEFCGACHIJDD.

Each rule in the grammar here chomps exactly one character from the input string and inserts a replacement substring which is (for this iteration) immune from further chomping.

We apply the rules as many times as possible for any given iteration, but each character can only be acted upon once during that iteration.

The obvious way to code this is:

Code:

Input is in an input buffer, IB
Allocate an empty output buffer, OB
foreach character c in IB {
    foreach rule r in the grammar {
        if rule r applies to character c {
            append the rule r translated value to OB
            break to next character
        }
    }
    append character c to OB
}

fresh_42 · Jun 29, 2022

jbriggs444 said:

Each rule in the grammar here chomps exactly one character from the input string and inserts a replacement substring which is (for this iteration) immune from further chomping.

We apply the rules as many times as possible for any given iteration, but each character can only be acted upon once during that iteration.

Oh! I thought we would concatenate the input string depending on the last character and iterate.

pbuk · Jun 29, 2022

fresh_42 said:

If we apply the rules only once, then the entire thread is obsolete since it can be done by hand immediately. If we apply the rules more than once, then order and cycles become important.

The rules are only applied once in each generation ## s_{k+1} = f(s_k) ##. Neither order nor cycles affect the definition of the function ## f(s) ##.

(Edit: not sure why I posted this late, it's less relevant now after #10 and #11.

pbuk · Jun 29, 2022

Adel Makram said:

I need to study the statistical structure of these sequences but can not afford to do that when I only got a few symbols. That is why I am looking for a gadget to produce longer and longer sequences based on those rules. Once I am done, I will plot a Markov transitional matrix to see the likelihood of transition of one symbol to all other symbols and can then draw a conclusion about the statistical nature of these language-like symbols.

If you want to create a transition matrix then you can do that directly from the rules. But if you want to create a "gadget" to create a sequence of longer and longer sentences then it should take no longer than it took @jbriggs to write the pseudocode in #10.

SSequence · Jun 29, 2022

@Adel Makram
Looking at OP again, it seems that my post#6 is a little incomplete. In a CFG there has to be a start symbol and there also has to be terminal symbols. I don't still fully understand the intention in OP, because it seems that start symbol hasn't been mentioned?

But in case a start symbol is assumed (say A or some other capital letter), then rephrasing it in terms of CFG might not be too difficult. For example, we can keep each of the original rules in OP. And, on top of that, we introduce/add corresponding transitions that convert each non-terminal (A to J) into a terminal (a to j). Essentially:
A--->a
B--->b
C--->c
...
...
J--->j
Of course if all symbols can be start symbols then that can be handled too. We add one more transition (with S as a starting non-terminal symbol):
S--->A | B | C | D | E | F | G | H | I | J

===================================But though, I am not fully certain what the intention in OP is.

jbriggs444 · Jun 29, 2022

Adel Makram said:

Because it is easier to produce future generations from a given sequence but harder to undo the sequence to a preceding generation.

Not just hard. Impossible. You have cipher that not only makes the ciphertext exponentially (in the number of rounds) longer than the plaintext, but which destroys information while doing so.

Adel Makram said:

I -> D
J -> D

After 1000 rounds, with an initial plaintext of "IJIJIJIJIJIJIJ", you are going to have a very long ciphertext with pretty much zero information.

Roughly speaking, that ciphertext will be ##10^{150}## characters in length and will be indistinguishable from the ciphertext for "JJJJJJJJJJJJJJJ".

[On average, your rules double the ciphertext size for half of your alphabet. So (with a quick handwave) we can say that the ciphertext size will approximately double every two generations. For 1000 generations, that's 500 doublings. Divide by ln(10)/ln(2) and you get a factor of ##10^{150}##]

If you decide to control your ciphertext size by rolling the output through a crypographically strong hash function, you might have something. However, the idea of losing some fraction of information for each round worries me. It smells like homeopathy -- after 1000 rounds, nothing may be left.

Edit 1: If the number of plausible plaintexts is low enough and your hash size is large enough then the information lost in each round may be well covered by the redundancy in the block. Rather than losing five percent of your information going from a 10 character alphabet to a 9 character alphabet in each round, the redundancy [which might be thought of as offering a possibility for error correction] means that you may be losing a much lower fraction. Possibly under the 0.1% per round that would make 1000 rounds somewhat viable. So I suspect that the "homeopathy problem" is not that bad.

Edit 2: If you had a cryptographically strong hash function, we could just use it. So you must be planning to use a fast and cryptographically weak hash function in a hope that multiple rounds and a substitution cipher between each round will jack up the strength.

Edit 3: If you have a hash size that is large enough to be useful then your "transition matrix" for cipher blocks may get too large to effectively enumerate. So perhaps you need to be looking at alternative means to estimate the collision rate for a composition of one round of your cipher with one round of your hash. Actually, that seems to be what you are aiming this entire thread at. The collision rate for a hypothetical ideal (meaning perfectly random) hash with a particular output size should be easy to estimate. The collision rate for your cipher when populated from random input of a size given by your hash seems to be the figure you want to derive.

Edit 4: I know next to nothing about cryptography. But I do know enough not to try to home brew my own ciphers or hashes. Dunning and Kruger can eat my shorts.

Adel Makram · Jun 29, 2022

pbuk said:

If you want to create a transition matrix then you can do that directly from the rules.

The rules describe the transformation from one symbol in a generation to one or more symbols in the next generation. These rules are deterministic and well known. What is not known, is the probability that a symbol in a generation follows another symbol in the same generation.
The symbols do not come in random, for example, a string of symbols comes like so; FCHIDECBFCHID. You notice that FCHID is repeated twice in this string suggesting that F is usually followed by C which is usually followed by H and so on. How about D? E follows D here in this short string but if I show you a longer version of the same string you will find that D is followed by F somewhere.

There must be some statistical structure in these strings that dictate the chance that a particular symbol is followed by other symbols.

What is interesting, is that this statistical structure is the nearly same for all generations. Using the 10 rules of transformation, the generation-n will still show the same statistical structure which could be described by Markov transitional matrix as the generation-1. I said "nearly", because this is what I assumed after calculating the Markov matrix by hand for 4 generations. That is why I am looking to produce a longer version of this string to confirm that.

Adel Makram · Jun 29, 2022

jbriggs444 said:

Not just hard. Impossible. You have cipher that not only makes the ciphertext exponentially (in the number of rounds) longer than the plaintext, but which destroys information while doing so

Not sure why impossible? Actually, I hope so as this would be a good method of cryptography, for me at least.

Just to confirm that the symbols do not come in random.
Suppose I give you generation-1 with this string; FCHIDECB, it is easy to get the generation-2 by applying the above rules and you will get; GADEJDFCHIDEC. But can you get me generation-0 from generation-1 with that ease? Let`s find out by scanning each string symbol by symbol. Generation-1 starts with F but nothing in the rule produces F on its own which allows us to scan FC together. We know that D uniquely produces FC, so D must be the first symbol of generation-0. The same logic applies for HI and we find that these are produced uniquely by E. So the first 2 symbols of generation-0 are DE. But when we scan the next symbol in generation-1; D we run into trouble because both I and J produce D from the rules; so which one? Moreover, we know that C produces DE and E is the next symbol after D. So should we scan DE and get C in generation-0 or scan D alone and get I? If I choose I, I will not find any symbol to produce E or even EC in generation-1. So the answer must be C. You see this is not an easy computation meaning it is much easier to produce generation-2 from generation-1 but not the other way around. And if I have to get generation-0 from generation-n where n is a larger number the process will be extremely difficult.

jbriggs444 · Jun 29, 2022

Adel Makram said:

Not sure why impossible? Actually, I hope so as this would be a good method of cryptography, for me at least.

Just to confirm that the symbols do not come in random.
Suppose I give you generation-1 with this string; FCHIDECB

If generation-1 is D, generation-0 can be either I or J. The reversion is not unique, so it is impossible to guarantee a correct reversion. The cipher is not a bijection. It destroys information. Maybe that fact is unimportant to your cryptosystem. Maybe it is even useful. But it is still a fact.

Adel Makram · Jun 29, 2022

jbriggs444 said:

If generation-1 is D, generation-0 can be either I or J. The reversion is not unique, so it is impossible to guarantee a correct reversion. The cipher is not a bijection. It destroys information. Maybe that fact is unimportant to your cryptosystem. Maybe it is even useful. But it is still a fact.

My method is as follows:
G0 (Generation-0) is shared publicly with Alice and Bob. Alice, applied the rules to G0 n-times to get Gn and share it publicly. Bob applied the rules to G0 m-times and get Gm and shared it. He also does so on Gn to get Gmn, likewise, Alice takes Gm from Bob and produces Gnm. Because the rule is transitive, Gnm=Gmn thus Alice and Bob come to the same final string of symbols. This happened without sharing n or m which are like private keys. If Eve is a hacker, she can not easily find Gmn as she is missing two pieces of information; n and m.
Eve can use the brute force technique to get m and n. She needs to run the calculation on G0 many times and check the result with Gn and/or Gm at each step. I think that checking the result is not easy also, but I am not familiar with this kind of complexity.

jbriggs444 · Jun 29, 2022

Adel Makram said:

My method is as follows:
G0 (Generation-0) is shared publicly with Alice and Bob. Alice, applied the rules to G0 n-times to get Gn and share it publicly. Bob applied the rules to G0 m-times and get Gm and shared it. He also does so on Gn to get Gmn, likewise, Alice takes Gm from Bob and produces Gnm. Because the rule is transitive, Gnm=Gmn thus Alice and Bob come to the same final string of symbols. This happened without sharing n or m which are like private keys. If Eve is a hacker, she can not easily find Gmn as she is missing two pieces of information; n and m.

Let me be sure that I have this straight.

G0 is shared publicly. Everyone knows it. Alice knows it. Bob knows it. Evil attacker E knows it.

Gn is shared publicly. Everyone knows it. Alice knows it. Bob knows it. Evil attacker E knows it.

n is not shared publicly. Alice knows it. No one else does. It is a secret upon which the security of the system relies.

Gm is shared publicly. Everyone knows it. Alice knows it. Bob knows it. Evil attacker E knows it.

m is not shared publicly. Bob knows it. No one else does. It is a secret upon which the security of the system relies.

Now things get fuzzy. We have this thing called "Gmn". Apparently this is Gn iterated forward another m times, thus yielding G(n+m). And we have this thing called "Gnm". Apparently this is Gm iterated forward another n times, thus yielding G(m+n).

Apparently, the idea is that evil attacker E who possesses G0, Gm and Gn will not know n or m and, thus, will not know how many times to iterate forward to produce G(m+n).

But we know that n and m must each be much less than 1000. So exhaustive search is easy. Evil attacker E can simply iterate G0 forward until finding a match for Gn (thus harvesting n) and until finding a match for Gm (thus harvesting m) and then possessing all of the keys to the puzzle.

Adel Makram · Jun 29, 2022

jbriggs444 said:

But we know that n and m must each be much less than 1000. So exhaustive search is easy. Evil attacker E can simply iterate G0 forward until finding a match for Gn (thus harvesting n) and until finding a match for Gm (thus harvesting m) and cracking the scheme.

n and m can be as large as possible not bounded by 1000.

jbriggs444 · Jun 29, 2022

Adel Makram said:

n and m can be as large as possible not bounded by 1000.

Wrong. Try calculating the size of G1000. There are not enough elementary particles in the observable universe to write it down.

Diffie Hellman evades this problem by using modular exponentation. Because it is modular, the size of the result is fixed. And because it is modular exponentiation, there are ways to use huge exponents in time proportional to the exponent's size rather than in time proportional to the exponent's value.

Adel Makram · Jun 29, 2022

jbriggs444 said:

Wrong. Try calculating the size of G1000.

Everyone can agree about the size of the string. If G0 is a string of 1-million symbols, Gn, Gm and Gnm will be definitely much larger but the system will only calculate the first million symbols and halts afterwards. So the question; what is the calculation complexity Evil will find to reach n or m?

jbriggs444 · Jun 29, 2022

Adel Makram said:

Everyone can agree about the size of the string. If G0 is a string of 1-million symbols, Gn, Gm and Gnm will be definitely much larger but the system will only calculate the first million symbols and halts afterwards. So the question; what is the calculation complexity Evil will find to reach n or m?

So now you are invoking the hash function which reduces the size of the intermediate results to keep things manageable. Fair enough. [The hash function that you've introduced is "truncate to one million symbols"].

Still, both Alice and Bob must run their iterations n and m times respectively. So it must be computationally feasible to transform a 1-million symbol string through that many generations. If Alice and Bob can do it, why can't the Evil Attacker? His only extra workload is the requirement to do a string compare after each generation. That's just a small (less than 2) constant multiplier on his required effort.

Or are you trying to say that you have a clever acceleration technique in mind that can, given G0, Gn and Gm and n, accelerate a calculation of G(mn) so that the result falls out after computational effort proportional to n and, given G0, Gn, Gm and m, accelerate similarly so that the computational effort is proportional to m.

Adel Makram · Jun 29, 2022

jbriggs444 said:

His only extra workload is the requirement to do a string compare after each generation. That's just a small (less than 2) constant multiplier on his required effort.

Can you explain why less than 2? So, is checking the result every time against Gn or Gm a polynomial complexity?

jbriggs444 · Jun 29, 2022

Adel Makram said:

Can you explain why less than 2? So, is checking the result every time against Gn or Gm a polynomial complexity?

Comparing a 1 million symbol string against another 1 million symbol string has complexity of order 1.
Rolling a 1 million symbol symbol string forward by one generation has complexity of order 1.
Doing either ##n## times or ##m## times has complexity of order ##n## or ##m## respectively.

All of these are polynomial in n+m.

The effort required to roll a 1 million symbol string forward by one generation is (in my estimation), significantly higher than the effort to compare one 1 million symbol string against another. Especially when the comparison is likely to be a failure to match which will be detectable after comparing only a small number of symbols.

With naive coding, it is perhaps 3.7 million character comparisons on average (checking each character against the grammar rules and aborting early on the first match for an average of 5.5 comparisons per input character and needing about 666,667 input characters to fill the output buffer) to roll forward by one generation.

The string comparison will likely fail within (generously) 10 symbols each (for Gm and Gn) for a total of 20 symbol comparisons needed. So the relative effort is about 180,000 to one in favor of comparisons being easy compared to generations.

Ohhh, wait a minute. If we double the string length every two generations on average and start with one character then after 40 generations the resulting string length will exceed 1 million. We truncate at 1 million. Every symbol in G0 other than the first has ceased to matter.

What does this mean?

It means that the cipher must be cyclic. Once you are in 40 generations, after another 40 generations, you are guaranteed to be back where you were 40 generations ago. It might not be exactly 40 generations. But for each possible first character in G0, the cycle length will be fixed. The first character in G0 is known to the attacker. So the cycle length is known to the attacker. So the total key space has less than 80 possibilities with every one of them known to the attacker.

This is a crippling lack of diffusion.

Adel Makram · Jun 30, 2022

jbriggs444 said:

Ohhh, wait a minute. If we double the string length every two generations on average and start with one character then after 40 generations the resulting string length will exceed 1 million. We truncate at 1 million. Every symbol in G0 other than the first has ceased to matter.

What does this mean?

It means that the cipher must be cyclic.

I do not see why the cipher should be cyclic? What if we start with a million symbols already and let them reproduce and then keep truncating the result to the first million at each generation? How to assert that the first symbol in G41, on average, will be exactly the first symbol of G0?

jbriggs444 · Jun 30, 2022

Adel Makram said:

I do not see why the cipher should be cyclic? What if we start with a million symbols already and let them reproduce and then keep truncating the result to the first million at each generation? How to assert that the first symbol in G41, on average, will be exactly the first symbol of G0?

I've implemented the cipher. It is cyclic. I used a 10 character string size as a test case. The eventual cycle length is 5. I equipped the code with a conservative (larger than needed) cycle limit.

Code:

C:\Users\John\Documents>cipher.pl
String length is 10
Generation limit is 25.6788735872676
Generation      Cipher Content
0       A
1       B
2       C
3       DE
4       FCHI
5       GADEJD
6       CBFCHIDFC
7       DECGADEJDF << Cycle starts
8       FCHIDECBFC
9       GADEJDFCHI
10      CBFCHIDFCG
11      DECGADEJDF << Cycle repeats.
12      FCHIDECBFC
13      GADEJDFCHI
14      CBFCHIDFCG
15      DECGADEJDF << Cycle repeats.
16      FCHIDECBFC
17      GADEJDFCHI
18      CBFCHIDFCG
19      DECGADEJDF
20      FCHIDECBFC
21      GADEJDFCHI
22      CBFCHIDFCG
23      DECGADEJDF
24      FCHIDECBFC
25      GADEJDFCHI
26      CBFCHIDFCG

Edit: Note that I fixed the estimation of the generation limit that had caused the original version to go through too many generations. This means that the output here properly matches the code in the next post. Posting the code was a major hassle due to a bug in the forum software.

jbriggs444 · Jun 30, 2022

Cipher code:

These forums do not like to see "substr" in a CODE section. So I had to mess with two lines in the following.

Code:

use strict;

# We want to implement a sort of cipher in the following manner.
#
# There is an alphabet of ten symbols, "A" through "J".
# Each "generation" of the cipher is a string of these symbols with a fixed maximum length.
# Each generation is generated from the previous by applying a replacement rule to each symbol in the string.
#
# The rules are:
#
# A => B
# B => C
# C => DE
# D => FC
# E => HI
# F => GA
# G => C
# H => J
# I => D
# J => D
#
my $start_string = "A";
my $string_length = 10;
my @generations = ();

# It is my contention that this cipher is cyclic and will repeat with a cycle based on the log of the string length
#
# The rationale is that each generation expands the cipher text length by about 50% When this is truncated,
# the result is that 1/3 of content of the previous generation's content is relevant. Iterate this back
# enough generations and you find that only the first character of the first generation is significant.
#
# We need 10 (or fewer) generations to assure that the first character has cycled through all ten possibilities.
# We need about the base 1.5 log of the string length to get the string out to the full string length.
# We need that much again so that the now filled out string is dependent only on the first character at the
# generation where it filled out.
# And we need another 10 (or fewer) generations so that a first character has cycled through all ten possibilities.
# This is likely to be an over-estimate.
#
my $generation_limit = 20 + log($string_length) / log(1.5);

print STDERR "String length is $string_length\n";
print STDERR "Generation limit is $generation_limit\n";

my $current_generation = 0;
push ( @generations, $start_string);

print "Generation\tCipher Content\n";
print $current_generation, "\t", $generations[$current_generation], "\n";

# Main loop -- over the set of generations up to the generation limit
while ( $current_generation < $generation_limit ) {

    my $IB = $generations[$current_generation];
    my $OB = "";

    while ( $IB ne "" ) {
        my $c = s u b s t r($IB,0,1); # Physics Forums does not like "s u b s t r" in a code section
        my $oc;
        if ( $c eq "A" ) {
            $oc = "B";
        } elsif ( $c eq "B" ) {
            $oc = "C";
        } elsif ( $c eq "C" ) {
            $oc = "DE";
        } elsif ( $c eq "D" ) {
            $oc = "FC";
        } elsif ( $c eq "E" ) {
            $oc = "HI";
        } elsif ( $c eq "F" ) {
            $oc = "GA";
        } elsif ( $c eq "G" ) {
            $oc = "C";
        } elsif ( $c eq "H" ) {
            $oc = "J";
        } elsif ( $c eq "I" ) {
            $oc = "D";
        } elsif ( $c eq "J" ) {
            $oc = "D";
        } else {
            print STDERR "Unexpected character in generation $current_generation: $oc\n";
            exit;
        };
   
        $OB = $OB . $oc;
        $IB =~ s/^.//;
    };

    $OB = s u b s t r($OB,0,$string_length);
       
    $current_generation++;
    $generations[$current_generation] = $OB;
       
    print $current_generation, "\t", $generations[$current_generation], "\n";

};

Adel Makram · Jun 30, 2022

jbriggs444 said:
I've implemented the cipher. It is cyclic. I used a 10 character string size as a test case. The eventual cycle length is 5. I equipped the code with a conservative (larger than needed) cycle limit.
Code:
C:\Users\John\Documents>cipher.pl
String length is 10
Generation limit is 25.6788735872676
Generation      Cipher Content
0       A
1       B
2       C
3       DE
4       FCHI
5       GADEJD
6       CBFCHIDFC
7       DECGADEJDF << Cycle starts
8       FCHIDECBFC
9       GADEJDFCHI
10      CBFCHIDFCG
11      DECGADEJDF << Cycle repeats.
12      FCHIDECBFC
13      GADEJDFCHI
14      CBFCHIDFCG
15      DECGADEJDF << Cycle repeats.
16      FCHIDECBFC
17      GADEJDFCHI
18      CBFCHIDFCG
19      DECGADEJDF
20      FCHIDECBFC
21      GADEJDFCHI
22      CBFCHIDFCG
23      DECGADEJDF
24      FCHIDECBFC
25      GADEJDFCHI
26      CBFCHIDFCG
Edit: Note that I fixed the estimation of the generation limit that had caused the original version to go through too many generations. This means that the output here properly matches the code in the next post. Posting the code was a major hassle due to a bug in the forum software.

This is amazing thank you very much.
These rules are the same local rules of Penrose tiling. No wondering the number 5 pops up.

jbriggs444 · Jun 30, 2022

Adel Makram said:

View attachment 303541

This is amazing thank you very much.
These rules are the same local rules of Penrose tiling. No wondering the number 5 pops up.

You are correct that the cycle size of five seems to be consistent. I had not expected that.

C:\Users\John\Documents>cipher.pl
String length is 20
Generation limit is 17.388384878619 << Tuned the estimate further downward.
Generation Cipher Content
0 A
1 B
2 C
3 DE
4 FCHI
5 GADEJD
6 CBFCHIDFC
7 DECGADEJDFCGADE
8 FCHIDECBFCHIDFCGADEC
9 GADEJDFCHIDECGADEJDF
10 CBFCHIDFCGADEJDFCHID
11 DECGADEJDFCGADECBFCH
12 FCHIDECBFCHIDFCGADEC
13 GADEJDFCHIDECGADEJDF
14 CBFCHIDFCGADEJDFCHID
15 DECGADEJDFCGADECBFCH
16 FCHIDECBFCHIDFCGADEC
17 GADEJDFCHIDECGADEJDF
18 CBFCHIDFCGADEJDFCHID

jbriggs444 · Jun 30, 2022

*BLUSH*. This is actually a cycle period of 4.

Danged fencepost errors.

Adel Makram · Jun 30, 2022

Now you found the period of the cycle is 4. Is it possible to run your program to find the minimum length of symbols in the same generation that repeats itself? I know that Penrose tiling is not periodic so there may be no period. But I just want to study the statistical transition from one symbol to another in the same generation. So, having that “wavelength” will allow me to draw exact figures of the Markov matrix.

jbriggs444 · Jun 30, 2022

Adel Makram said:

Now you found the period of the cycle is 4. Is it possible to run your program to find the minimum length of symbols in the same generation that repeats itself? I know that Penrose tiling is not periodic so there may be no period. But I just want to study the statistical transition from one symbol to another in the same generation. So, having that “wavelength” will allow me to draw exact figures of the Markov matrix.

Let me try to understand the request.

Instead of going vertically down the rows (generation after generation) looking for repeated rows, you want to sweep horizontally across a large string size looking for periodicity in the rows?

So basically, I need to iterate forward enough generations to reach the stable cycle, pick a row and then look for a cycle length in that row.

As is my usual habit, I will proceed a step at a time, working incrementally toward the goal. We can go for 1,000,000 length strings and iterate enough times to get to the eventual cycle. That much is a pretty easy tweak to the code.

If I can trust the resulting code, we get a one million character string at generation 31. I looked for periodicity in generation 32 and found none. That is, no leading sustring of length up to 500,000 was repeated after a period of between 1 and 500000.

The million character string in generation 32 was strongly patterned, but not periodic.

Adel Makram · Jun 30, 2022

jbriggs444 said:

Instead of going vertically down the rows (generation after generation) looking for repeated rows, you want to sweep horizontally across a large string size looking for periodicity in the rows?

Yes, this is correct.

Arithmetic representation of symbols according to certain rules

Attachments

1. What is arithmetic representation of symbols according to certain rules?

2. Why is arithmetic representation of symbols important?

3. What are some common arithmetic symbols and their meanings?

4. How do we follow the order of operations in arithmetic representation of symbols?

5. Can arithmetic representation of symbols be used in real-life situations?

Similar threads

Hot Threads

Recent Insights