Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

C# GetString method from Encoding's Unicode class

  1. Jun 11, 2015 #1
    I have a byte array (each character consumes 4 bytes) of size 64 for example,
    Now I decode it using
    Code (Text):
    byte[] bytes={'z','\0','\0','\0','ý','\0','\0','\0','ó','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0','\0',....};
    string s=Encoding.Unicode.GetString(bytes);
    Amazingly, after the code is executed, s="zýó";

    But if my bytes contains string e.g "劉三好" (byte array should be all numbers representing these Chinese characters), then after I execute the above code line I get this string too. How can that be done ?
     
  2. jcsd
  3. Jun 11, 2015 #2

    jedishrfu

    Staff: Mentor

    Checkout the Unicode tables:

    http://en.wikipedia.org/wiki/Unicode

    Unicode characters need two-byte if in the lowest plane so I'd expect two non-zero byte followed by two zero bytes... Also it looks like your string is in least byte order ie little endian order.
     
  4. Jun 11, 2015 #3

    Mark44

    Staff: Mentor

    A byte is an 8-bit unsigned integer (see https://msdn.microsoft.com/en-us/library/system.byte(v=vs.100).aspx). The characters you show in your byte array are one byte each, not 4 bytes. Also, 4 bytes isn't 64 bits, it's 32 bits.
     
  5. Jun 22, 2015 #4

    Boing3000

    User Avatar
    Gold Member

    This code do not compile. The correct way to initialize a byte array is
    Code (Text):
    byte[] bytes={(byte)'z', (byte)'\0',(byte)'\0',(byte)'\0',(byte)'ý',....
    string s=Encoding.Unicode.GetString(bytes);
    Such a call will return s="z\0ý\0......................" and not what you said.
    I suppose the bytes comes from a file. If this is UTF32 (Not Unicode / UTF16) you should use
    Code (Text):
    string s=Encoding.UTF32.GetString(bytes)
    Note that this string is tailed by \0 that probably needs cleaning...
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: C# GetString method from Encoding's Unicode class
Loading...