C# C# GetString method from Encoding's Unicode class

  • Thread starter Thread starter Silicon Waffle
  • Start date Start date
  • Tags Tags
    Class Method
Click For Summary
The discussion centers on decoding a byte array using C#'s Encoding.Unicode class. A user demonstrates that a byte array containing characters like 'z', 'ý', and 'ó' results in the string "zýó" when decoded, despite confusion over byte size and encoding. Another participant clarifies that the byte array should be correctly initialized with explicit casting to byte and mentions that the original code does not compile. Additionally, it is suggested that if the byte array represents UTF32 data, the correct method to decode it would be using Encoding.UTF32 instead of Encoding.Unicode. The conversation emphasizes the importance of understanding byte representation and encoding formats in C#.
Silicon Waffle
Messages
160
Reaction score
203
I have a byte array (each character consumes 4 bytes) of size 64 for example,
Now I decode it using
Code:
byte[] bytes={'z','\0','\0','\0','ý','\0','\0','\0','ó','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0',...};
string s=Encoding.Unicode.GetString(bytes);

Amazingly, after the code is executed, s="zýó";

But if my bytes contains string e.g "劉三好" (byte array should be all numbers representing these Chinese characters), then after I execute the above code line I get this string too. How can that be done ?
 
Technology news on Phys.org
Checkout the Unicode tables:

http://en.wikipedia.org/wiki/Unicode

Unicode characters need two-byte if in the lowest plane so I'd expect two non-zero byte followed by two zero bytes... Also it looks like your string is in least byte order ie little endian order.
 
Silicon Waffle said:
I have a byte array (each character consumes 4 bytes) of size 64 for example
A byte is an 8-bit unsigned integer (see https://msdn.microsoft.com/en-us/library/system.byte(v=vs.100).aspx). The characters you show in your byte array are one byte each, not 4 bytes. Also, 4 bytes isn't 64 bits, it's 32 bits.
Silicon Waffle said:
,
Now I decode it using
Code:
byte[] bytes={'z','\0','\0','\0','ý','\0','\0','\0','ó','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0',...};
string s=Encoding.Unicode.GetString(bytes);

Amazingly, after the code is executed, s="zýó";

But if my bytes contains string e.g "劉三好" (byte array should be all numbers representing these Chinese characters), then after I execute the above code line I get this string too. How can that be done ?
 
Silicon Waffle said:
I have a byte array (each character consumes 4 bytes) of size 64 for example,
Now I decode it using
Code:
byte[] bytes={'z','\0','\0','\0','ý','\0','\0','\0','ó','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0',...};
string s=Encoding.Unicode.GetString(bytes);

Amazingly, after the code is executed, s="zýó";

But if my bytes contains string e.g "劉三好" (byte array should be all numbers representing these Chinese characters), then after I execute the above code line I get this string too. How can that be done ?
This code do not compile. The correct way to initialize a byte array is
Code:
byte[] bytes={(byte)'z', (byte)'\0',(byte)'\0',(byte)'\0',(byte)'ý',...
string s=Encoding.Unicode.GetString(bytes);
Such a call will return s="z\0ý\0......" and not what you said.
I suppose the bytes comes from a file. If this is UTF32 (Not Unicode / UTF16) you should use
Code:
string s=Encoding.UTF32.GetString(bytes)
Note that this string is tailed by \0 that probably needs cleaning...
 
Learn If you want to write code for Python Machine learning, AI Statistics/data analysis Scientific research Web application servers Some microcontrollers JavaScript/Node JS/TypeScript Web sites Web application servers C# Games (Unity) Consumer applications (Windows) Business applications C++ Games (Unreal Engine) Operating systems, device drivers Microcontrollers/embedded systems Consumer applications (Linux) Some more tips: Do not learn C++ (or any other dialect of C) as a...

Similar threads

  • · Replies 5 ·
Replies
5
Views
8K
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
9K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 11 ·
Replies
11
Views
5K