Encrypting Chinese Characters

January 26, 2009

Question:
Why is it the return is blank when encrypting chinese characters?
Here’s a snippet of my code:

  crypt.KeyLength := 256;
  crypt.SecretKey := Password;
  crypt.CryptAlgorithm := 'aes';
  crypt.EncodingMode := 'base64';
  OutPutStr := crypt.EncryptStringENC(StringToEncrypt);

Answer:

Strings in some programming languages such as Visual Basic, C#, VB.NET, Delphi, Foxpro, etc. should be thought of as objects. The object contains a string (i.e. a sequence of characters that renders to a sequence of glyphs). The representation of the string within the object is private — the application shouldn’t care. For these languages it happens to be Unicode (the 2-byte per char encoding), so the string object is capable of containing characters in any spoken language. (Of course, just because the string may contain characters in any spoken language doesn’t mean glyphs of any language are renderable, and this is a big problem in older programming languages such as VB6, Delphi, etc. where the visual controls are not capable of mixing glyphs of any language — i.e. they are not Unicode capable controls even though the string data type (i.e. object) holds characters represented internally in Unicode.

OK, back to the main point…

The representation of the string (i.e. the encoding used to represent each character as a sequence of 1 or more bytes) within the string object is private — the application shouldn’t care. With encryption however, it matters greatly. Encryption algorithms operate on bytes. (The same goes for hash algorithms) Therefore, when you encrypt Chinese characters, did you intend to encrypt 2-byte per char Unicode? Did you intend to encrypt the utf-8 representation of the characters? What about the “big5” or “gb2312” character encoding representations? All would provide different results (of course).

The Crypt.Charset property controls the charset (character encoding) used for encrypting strings. The string passed to EncryptString* is first converted (internally) to a byte array using the specified character encoding, and then this byte array is encrypted. The default value for Crypt.Charset is “ANSI”. In most cases, this is what you expect — you’re expecting a typical European accented character to be represented as a single byte in the default charset of the computer. This doesn’t work with Chinese (or other Asian languages), or any language that doesn’t match the locale of the computer. The internal conversion from Unicode to ANSI is dropping the characters where there is no 1-byte/char representation.

The solution: Set Crypt.Charset equal to the encoding desired. For Chinese it would be one of the following: “utf-8”, “Unicode”, “big5”, “gb2312”.

admin

Encrypting Chinese Characters

Blogroll

Tags