Understanding the binary transformation of strings and integers

Arman777 · Jul 24, 2021

For fun, I have decided to implement a simple XOR encryption algorithm. The first step is to convert messages into bytes to perform XOR operation on each bit. The problem has started here. For instance, I want to encrypt this message.

Code:

I hiked 24 miles.

Now I need to turn this text into binary. It seems that there are two different ways to do it for '24';

a) Take 2 and 4 as separate and as integers, so this means

Code:

>>> bin(2)
'0b10'
>>> bin(4)
'0b100'

or
[00000010, 00000100]

b) take 2 and 4 as strings, so by doing something like this

Code:

>>> bin(ord('2'))
'0b110010'
>>> bin(ord('4'))
'0b110100'

or
[00110010, 00110100]

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?

Baluncore · Jul 24, 2021

Arman777 said:

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?

'I hiked 24 miles.' is a string of characters or bytes. You have no idea what format should be used for a numeric field, so encryption must encrypt the character string, independent of the number present.

pasmith · Jul 24, 2021

Arman777 said:
For fun, I have decided to implement a simple XOR encryption algorithm. The first step is to convert messages into bytes to perform XOR operation on each bit. The problem has started here. For instance, I want to encrypt this message.
Code:
I hiked 24 miles.
Now I need to turn this text into binary. It seems that there are two different ways to do it for '24';

a) Take 2 and 4 as separate and as integers, so this means
Code:
>>> bin(2)
'0b10'
>>> bin(4)
'0b100'
or
[00000010, 00000100]

b) take 2 and 4 as strings, so by doing something like this

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?

Note that 0x04 is the "end of transmission" marker, which UNIX uses to indicate the end of a file, and 0x02 is the "start of text" marker, which doesn't have any meaning to UNIX. So your decrypted message would result in a file containing the string b'I hiked \x02' if stored on a UNIX system. In fact all bytes in the range 0x00 to 0x09 and beyond represent control characters which may have a special meaning for whatever application is going to interpret your decrypted data, if not to the operating system itself. So '2' -> 0x02 etc. is a very bad idea.

@Baluncore's approach is the only viable approach, and ensures that your decrypted data will be byte for byte identical with the input.

Arman777 · Jul 24, 2021

then I will do that. Thanks for the answers.

Mark44 · Jul 24, 2021

What seems to be missing here is an understanding of the difference between numeric digit characters, such as '2' or '4', and the numerals 2 or 4.

The '2' and '4' characters are stored as ASCII values of 50 and 52 respectively. The numbers 2 and 4 are stored as their own values.

Baluncore · Jul 24, 2021

In most cases of numbers embedded in text, the leading zeros can be dropped, but imagine trying to correctly restore data that had critical leading or trailing zeros in a part number.

An encryption - decryption algorithm should always regenerate the input exactly. The message should never be arbitrarily abbreviated or compressed.

Understanding the binary transformation of strings and integers

Thread 'Claude used to facilitate a cyberattack'

Similar threads

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

How to calculate Tension for a series of connected points?

Learning Assembly and computer architecture for x86

Sequential Analog Computers?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers