Understanding the binary transformation of strings and integers

Click For Summary

Discussion Overview

The discussion revolves around the binary transformation of strings and integers in the context of implementing an XOR encryption algorithm. Participants explore the implications of converting numeric characters versus numeric values into binary for encryption purposes.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant proposes two methods for converting the numeric characters '2' and '4' into binary: treating them as separate integers or as string representations using their ASCII values.
  • Another participant argues that encryption should focus on the character string itself, independent of how numeric fields are formatted.
  • A later reply emphasizes the importance of understanding the difference between numeric digit characters and their numeric values, noting that characters are stored as ASCII values.
  • Concerns are raised about the potential issues with control characters in the binary representation, suggesting that certain conversions could lead to unintended consequences in file interpretation.
  • One participant asserts that an encryption-decryption algorithm must regenerate the input exactly, without arbitrary changes to the data.

Areas of Agreement / Disagreement

Participants express differing views on the correct approach to converting numeric characters for encryption, with no consensus reached on a single method. Some emphasize the importance of character representation, while others focus on numeric values.

Contextual Notes

Participants highlight potential limitations in understanding the implications of using control characters in binary data and the necessity of preserving data integrity during encryption and decryption processes.

Arman777
Insights Author
Gold Member
Messages
2,163
Reaction score
191
For fun, I have decided to implement a simple XOR encryption algorithm. The first step is to convert messages into bytes to perform XOR operation on each bit. The problem has started here. For instance, I want to encrypt this message.

Code:
I hiked 24 miles.

Now I need to turn this text into binary. It seems that there are two different ways to do it for '24';

a) Take 2 and 4 as separate and as integers, so this means
Code:
>>> bin(2)
'0b10'
>>> bin(4)
'0b100'
or
[00000010, 00000100]

b) take 2 and 4 as strings, so by doing something like this

Code:
>>> bin(ord('2'))
'0b110010'
>>> bin(ord('4'))
'0b110100'
or
[00110010, 00110100]

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?
 
Technology news on Phys.org
Arman777 said:
Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?
'I hiked 24 miles.' is a string of characters or bytes. You have no idea what format should be used for a numeric field, so encryption must encrypt the character string, independent of the number present.
 
  • Like
Likes   Reactions: Arman777 and pasmith
Arman777 said:
For fun, I have decided to implement a simple XOR encryption algorithm. The first step is to convert messages into bytes to perform XOR operation on each bit. The problem has started here. For instance, I want to encrypt this message.

Code:
I hiked 24 miles.

Now I need to turn this text into binary. It seems that there are two different ways to do it for '24';

a) Take 2 and 4 as separate and as integers, so this means
Code:
>>> bin(2)
'0b10'
>>> bin(4)
'0b100'
or
[00000010, 00000100]

b) take 2 and 4 as strings, so by doing something like this

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?

Note that 0x04 is the "end of transmission" marker, which UNIX uses to indicate the end of a file, and 0x02 is the "start of text" marker, which doesn't have any meaning to UNIX. So your decrypted message would result in a file containing the string b'I hiked \x02' if stored on a UNIX system. In fact all bytes in the range 0x00 to 0x09 and beyond represent control characters which may have a special meaning for whatever application is going to interpret your decrypted data, if not to the operating system itself. So '2' -> 0x02 etc. is a very bad idea.

@Baluncore's approach is the only viable approach, and ensures that your decrypted data will be byte for byte identical with the input.
 
  • Like
Likes   Reactions: Arman777
then I will do that. Thanks for the answers.
 
What seems to be missing here is an understanding of the difference between numeric digit characters, such as '2' or '4', and the numerals 2 or 4.

The '2' and '4' characters are stored as ASCII values of 50 and 52 respectively. The numbers 2 and 4 are stored as their own values.
 
In most cases of numbers embedded in text, the leading zeros can be dropped, but imagine trying to correctly restore data that had critical leading or trailing zeros in a part number.

An encryption - decryption algorithm should always regenerate the input exactly. The message should never be arbitrarily abbreviated or compressed.
 

Similar threads

Replies
29
Views
5K
  • · Replies 1 ·
Replies
1
Views
7K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 7 ·
Replies
7
Views
4K