Understanding the binary transformation of strings and integers

Click For Summary
SUMMARY

The discussion centers on the implementation of a simple XOR encryption algorithm, specifically focusing on the conversion of strings and integers into binary format. Two methods are presented for converting the numeric characters '2' and '4': treating them as separate integers or as ASCII string values. The consensus is that using ASCII values (ord function) is the correct approach for XOR encryption, ensuring that the decrypted output matches the original input byte-for-byte. This method avoids potential issues with control characters that could arise from using numeric values directly.

PREREQUISITES
  • Understanding of XOR encryption algorithms
  • Familiarity with binary representation of data
  • Knowledge of ASCII encoding and character representation
  • Basic programming skills in Python, particularly using the bin() and ord() functions
NEXT STEPS
  • Research the implementation of XOR encryption in Python
  • Learn about ASCII encoding and its implications in data encryption
  • Explore the handling of control characters in binary data
  • Study best practices for data integrity in encryption and decryption processes
USEFUL FOR

Software developers, cybersecurity professionals, and anyone interested in understanding data encryption techniques, particularly those involving XOR operations and binary data manipulation.

Arman777
Insights Author
Gold Member
Messages
2,163
Reaction score
191
For fun, I have decided to implement a simple XOR encryption algorithm. The first step is to convert messages into bytes to perform XOR operation on each bit. The problem has started here. For instance, I want to encrypt this message.

Code:
I hiked 24 miles.

Now I need to turn this text into binary. It seems that there are two different ways to do it for '24';

a) Take 2 and 4 as separate and as integers, so this means
Code:
>>> bin(2)
'0b10'
>>> bin(4)
'0b100'
or
[00000010, 00000100]

b) take 2 and 4 as strings, so by doing something like this

Code:
>>> bin(ord('2'))
'0b110010'
>>> bin(ord('4'))
'0b110100'
or
[00110010, 00110100]

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?
 
Technology news on Phys.org
Arman777 said:
Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?
'I hiked 24 miles.' is a string of characters or bytes. You have no idea what format should be used for a numeric field, so encryption must encrypt the character string, independent of the number present.
 
  • Like
Likes   Reactions: Arman777 and pasmith
Arman777 said:
For fun, I have decided to implement a simple XOR encryption algorithm. The first step is to convert messages into bytes to perform XOR operation on each bit. The problem has started here. For instance, I want to encrypt this message.

Code:
I hiked 24 miles.

Now I need to turn this text into binary. It seems that there are two different ways to do it for '24';

a) Take 2 and 4 as separate and as integers, so this means
Code:
>>> bin(2)
'0b10'
>>> bin(4)
'0b100'
or
[00000010, 00000100]

b) take 2 and 4 as strings, so by doing something like this

Does this makes a difference in the perspective of XOR encryption (or in general) ? What is the correct approach ?

Note that 0x04 is the "end of transmission" marker, which UNIX uses to indicate the end of a file, and 0x02 is the "start of text" marker, which doesn't have any meaning to UNIX. So your decrypted message would result in a file containing the string b'I hiked \x02' if stored on a UNIX system. In fact all bytes in the range 0x00 to 0x09 and beyond represent control characters which may have a special meaning for whatever application is going to interpret your decrypted data, if not to the operating system itself. So '2' -> 0x02 etc. is a very bad idea.

@Baluncore's approach is the only viable approach, and ensures that your decrypted data will be byte for byte identical with the input.
 
  • Like
Likes   Reactions: Arman777
then I will do that. Thanks for the answers.
 
What seems to be missing here is an understanding of the difference between numeric digit characters, such as '2' or '4', and the numerals 2 or 4.

The '2' and '4' characters are stored as ASCII values of 50 and 52 respectively. The numbers 2 and 4 are stored as their own values.
 
In most cases of numbers embedded in text, the leading zeros can be dropped, but imagine trying to correctly restore data that had critical leading or trailing zeros in a part number.

An encryption - decryption algorithm should always regenerate the input exactly. The message should never be arbitrarily abbreviated or compressed.
 

Similar threads

Replies
29
Views
5K
  • · Replies 1 ·
Replies
1
Views
7K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 7 ·
Replies
7
Views
4K