Understanding the binary transformation of strings and integers

AI Thread Summary
The discussion centers on implementing a simple XOR encryption algorithm, focusing on the conversion of text messages into binary format for encryption. The main issue arises from how to convert the numeric characters '2' and '4' in the message "I hiked 24 miles." into binary. Two approaches are considered: treating the digits as separate integers or as string characters. The consensus is that the correct method is to treat '2' and '4' as string characters, which correspond to their ASCII values (50 and 52), rather than as numeric values. This distinction is crucial because it ensures that the encryption and decryption processes yield byte-for-byte identical results, preserving the integrity of the original message. Additionally, the discussion highlights the importance of avoiding control characters in the output, as they can have special meanings in certain systems, potentially corrupting the data. The emphasis is on maintaining the exact representation of the input during encryption and decryption to avoid data loss or misinterpretation.
Arman777
Insights Author
Gold Member
Messages
2,163
Reaction score
191
For fun, I have decided to implement a simple XOR encryption algorithm. The first step is to convert messages into bytes to perform XOR operation on each bit. The problem has started here. For instance, I want to encrypt this message.

Code:
I hiked 24 miles.

Now I need to turn this text into binary. It seems that there are two different ways to do it for '24';

a) Take 2 and 4 as separate and as integers, so this means
Code:
>>> bin(2)
'0b10'
>>> bin(4)
'0b100'
or
[00000010, 00000100]

b) take 2 and 4 as strings, so by doing something like this

Code:
>>> bin(ord('2'))
'0b110010'
>>> bin(ord('4'))
'0b110100'
or
[00110010, 00110100]

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?
 
Technology news on Phys.org
Arman777 said:
Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?
'I hiked 24 miles.' is a string of characters or bytes. You have no idea what format should be used for a numeric field, so encryption must encrypt the character string, independent of the number present.
 
  • Like
Likes Arman777 and pasmith
Arman777 said:
For fun, I have decided to implement a simple XOR encryption algorithm. The first step is to convert messages into bytes to perform XOR operation on each bit. The problem has started here. For instance, I want to encrypt this message.

Code:
I hiked 24 miles.

Now I need to turn this text into binary. It seems that there are two different ways to do it for '24';

a) Take 2 and 4 as separate and as integers, so this means
Code:
>>> bin(2)
'0b10'
>>> bin(4)
'0b100'
or
[00000010, 00000100]

b) take 2 and 4 as strings, so by doing something like this

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?

Note that 0x04 is the "end of transmission" marker, which UNIX uses to indicate the end of a file, and 0x02 is the "start of text" marker, which doesn't have any meaning to UNIX. So your decrypted message would result in a file containing the string b'I hiked \x02' if stored on a UNIX system. In fact all bytes in the range 0x00 to 0x09 and beyond represent control characters which may have a special meaning for whatever application is going to interpret your decrypted data, if not to the operating system itself. So '2' -> 0x02 etc. is a very bad idea.

@Baluncore's approach is the only viable approach, and ensures that your decrypted data will be byte for byte identical with the input.
 
then I will do that. Thanks for the answers.
 
What seems to be missing here is an understanding of the difference between numeric digit characters, such as '2' or '4', and the numerals 2 or 4.

The '2' and '4' characters are stored as ASCII values of 50 and 52 respectively. The numbers 2 and 4 are stored as their own values.
 
In most cases of numbers embedded in text, the leading zeros can be dropped, but imagine trying to correctly restore data that had critical leading or trailing zeros in a part number.

An encryption - decryption algorithm should always regenerate the input exactly. The message should never be arbitrarily abbreviated or compressed.
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...

Similar threads

Replies
7
Views
3K
Replies
13
Views
3K
Replies
2
Views
2K
Replies
1
Views
2K
Replies
1
Views
3K
Replies
7
Views
3K
Back
Top