Wondering how secure this encoding scheme is

ktoz · Jul 4, 2017

Hi

While working on a Javascript project, I found that I needed to use base64-like strings with alternate alphabets. I wrote a custom function to do what I needed and wrote up a little demo app to test it out. (see here: http://nowser.net/MapEncode/index.html)

An unexpected side effect is that you can choose literally any alphabet, phrase or character string that pops into your head as the encoding set, which seems to me to be a pretty secure way to encode text. You can choose a phrase, encode the text and transmit the result without the original phrase and although I'm nowhere even close to an expert, I don't see how someone could decode it without the original phrase.

Basically, I'm just curious what people with actual security experience have to say about it. Security isn't the main goal, just an interesting side effect.

thanks for any feedback

anorlunda · Jul 4, 2017

I hate to rain on your parade, but:

Security experts almost all say that amateur encryption schemes are almost always much less secure than their inventors believe.

The ways to evaluate it involve mathematical tests, not opinions from strangers on the Internet.By the way, if the phrase you choose is very long (like an entire book), it becomes a one-time-pad. In WWII that was thought to be unbreakable, but it too has problems. See the problems section in this article. https://en.wikipedia.org/wiki/One-time_pad

DrZoidberg · Jul 4, 2017

What you have their is effectively a caesar cipher. Because you encode one character at a time and the same character is encoded as the same output string.

ktoz · Jul 4, 2017

DrZoidberg said:

What you have their is effectively a caesar cipher. Because you encode one character at a time and the same character is encoded as the same output string.

I'm not sure it is a Caesar cipher, it seems more like a division problem. Here's the meat of the encoder:

JavaScript:

for (var i = 0; i < inRun.length; i++)
   {
       code   = inRun.charCodeAt(i) + carry;
               
       if (code > maxRem)
       {
           while (code > maxRem)
           {
               rem       = code % inBase;
               code   = (code - rem) / inBase;
               result += inMap[rem];
           }
       
           carry   = code << 8;
       }
       else
           carry   = code << 8;
   }

The reason I thought it might be somewhat secure is that much like if you were handed the result of a division problem, but not the numerator, or denominator, it would be very difficult to deduce them just from the answer. Or more concretely, determine the values of a, b given that a/b = 12345. There are an infinite number of As and Bs which satisfy that equation.

There aren't an infinite number of solutions in this instance, but the number of possible alphabets grows factorially. With just the characters A-M, there are 13! possible arrangements. If you could test 1000 alphabets per second, it would take 72 days to brute force them all. Using A-Z and processing 1 million alphabets per second, it would take about 13 trillion years to test them all.

You could get lucky and choose the right alphabet on the first try, but that's not really a system.

Anyway, security is not the reason I wrote the encoder, I just thought it might be an interesting artifact.

jedishrfu · Jul 4, 2017

One simple way to test your cipher is to take some clear text get some stats from it ie # of times a letter is used.

alphabet
a=2
l=1
p=1
h=1
b=1
e=1
t=1

Apply your algorithm and do the same to the encoded text.

rsturqwe
r=2
s=1
t=1
u=1
q=1
w=1
e=1

The stats should match up effectively giving you the translation from one letter to another.

If that's the case then you effectively have a substitution cipher. A caesar is a subset of a substitution cipher using simple letter shifts.

https://en.wikipedia.org/wiki/Caesar_cipher

https://en.wikipedia.org/wiki/Substitution_cipher

FactChecker · Jul 4, 2017

It's not a simple mapping of characters. Repeated a's do not map to a repeating character.
Using a charactor map of 'abcdefghijklmnopqrstuvwxyz', 'aaaaaaaaa' results in 'wpjdgwcfjqybvpjb'.
Using a charactor map of 'qwerty', 'aaaaaaaaa' results in 'etqrtwrqttwtwttrrqttwtwttrrqte'.

It's not clear to me what the algorithm does, what the 'character map' means, and how much it differs from well established encoding algorithms.

jack action · Jul 4, 2017

It is really not secure.

First, you must understand that brute force attacks do not use all possibilities, they begin with the most probable possibilities. Great expertise is available to what patterns humans are naturally attracted to. For example, if you tell someone that his password must contain at least one non-alphanumeric character, chances are that the password will contain one exclamation point (!) at the end of the password. No matter how smart you think you are in making a difficult pattern, chances are another human being thought of it as well (we mostly all think alike, especially when in the same culture).

And since computers are fast, then the only power of encryption lies into making the decryption process as slow as possible. It is usually done by re-encryptying the encrypted message many times. You don't do that.

Worst, you give out almost all characters of the password in the encoded string! (With long messages, it's usually all characters except the last one.) And since the hacker knows each character is used only once (with your normalize map), then that can reduce significantly the numbers of passwords to try.

An example would be with the password "banana". This password becomes "ban" once normalized. Worst of all, you will find that for any message, the last character is irrelevant. So if I encode the string "This is my password", I get the encoded string "bbababbbbabaababbabaabaabbaaabbbbbbabbabbabaabaabbaaabbbbbbabbabaabaababbaaaabbbbbbabbbbbbaaababbbbaabaabbaaabaabbaaabaaabaaabaaaabaabbabbaaabbbabbaaba". As a hacker (which I'm not), I know there are 3 characters involved: "a", "b" and another one which can be anything and is necessarily in the last position. So I try "abc", then "bac", Boum! String decoded.

Even worst, solving for one password, you can solved for many passwords with your coding scheme. That means that they are less possibilities to try. For example, if I use the password 123456 and encode the letter "a", it will give the encoded message "354". Although I don't have all the character of the code, it turns out that any code of the form "AB345C" where A, B and C are any character different from 3, 4, 5 and the other two characters will give the encoded string "354". So someone thinking he has a very clever password like "f~345D" would be decoded on the first try with the most popular password "123456". Even longer passwords with repeated character like "f~345Df5~D3" will give the same encoded string for the letter "a".

By studying it some more, I'm sure one can find that there are few sets of patterns to try first to be able to solve your coding scheme.

ktoz · Jul 4, 2017

jack action said:

It is really not secure.

Thanks for the comprehensive reply. I guess its a good thing this wasn't intended for security!

Several years ago, I wrote a base64 encoder/decoder, just to see how it was done. This time around, I needed a base26 and base52 encoder, so I thought why not just write one set of functions to handle any baseN encoding that might prove useful in the future. MapEncode is the result..

The reason it drops the last character was it needs a way to distinguish between runs of one byte and two byte characters, so it claims the last char as a toggle flag to tell it when to change from two byte decoding to one byte decoding.

Anyway, thanks again.

ktoz · Jul 4, 2017

FactChecker said:

It's not clear to me what the algorithm does, what the 'character map' means, and how much it differs from well established encoding algorithms.

The map is just the set of characters it uses to to represent values in the chosen base. Base 64 uses the characters 0-9, A-Z, a-z and +/ with '=' used as padding at the end of strings to make the coded text divisible by 4.

Because it uses a different algorithm, MapEncode doesn't require the pad characters at the end. It counts the number of characters in the normalized map and uses that as the "base" in baseN string encoding. So the map "0123456789" creates base10 encodings. Since "ABCDEFGHIJ" contains 10 characters, its also base10 it just uses different characters in place of the numbers. If you wanted to use base17 for some reason, you could just choose any 17 characters for your map. For base41, choose 41 characters for your map, etc...

stoomart · Jul 5, 2017

"Secure" is relative to your data and the adversaries who want it. For example: storing the wifi password in a text file on your desktop using obfuscation/encoding is fine to protect it from your kids.

newjerseyrunner · Jul 6, 2017

Your key has to change with each message or else, with enough intercepted messages, you can figure it out. This requires secure two way communication. You need at least two keys to do that, here is how it works at the most basic level.

In this analogy, you'll have user A and B, key A and B, and lock A and B. You want to somehow get client B a copy of key A without key A ever being exposed to the outside world. So user A locks key A in a box with lock A, then sends it to user B. User B then locks the package again with lock B and sends it back to user A. User A then unlocks lock A with his own key, then sends it back. User B then unlocks lock B and retrieves key A. Key A was never unsecured.

Modern behavior is way more complicated than that (it actually uses 4 keys,) but the idea is the same.

FactChecker · Jul 8, 2017

newjerseyrunner said:

Your key has to change with each message or else, with enough intercepted messages, you can figure it out. This requires secure two way communication. You need at least two keys to do that, here is how it works at the most basic level.

In this analogy, you'll have user A and B, key A and B, and lock A and B. You want to somehow get client B a copy of key A without key A ever being exposed to the outside world. So user A locks key A in a box with lock A, then sends it to user B. User B then locks the package again with lock B and sends it back to user A. User A then unlocks lock A with his own key, then sends it back. User B then unlocks lock B and retrieves key A. Key A was never unsecured.

Modern behavior is way more complicated than that (it actually uses 4 keys,) but the idea is the same.

This is a higher level issue of how to make a system where the keys are transmitted and used securely with others. It uses a combination of symmetric and asymmetric keys.
(see https://blog.vrypan.net/2013/08/28/public-key-cryptography-for-non-geeks/ and https://en.wikipedia.org/wiki/Public-key_cryptography )
Although it is important, it is not really the same issue as the OP was asking about.

Wondering how secure this encoding scheme is

Discussion

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Python Applying Accelerated Raymarching to Reduce Rendering Time

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Wondering how secure this encoding scheme is

Discussion

Similar threads

What is the Encoding Scheme for this Code?