Understanding EncryptStringENC and DecryptStringENC in Python and C/C++

Chilkat provides API’s that are identical across a variety of different programming languages. One difficulty in doing this is in handling strings. Different programming languages pass strings in different ways. In some programming languages, such as Python or C/C++, a “string” is simply a sequence of bytes terminated by a null. (I’m referring to “multibyte” strings, not Unicode (utf-16) strings. The term “multibyte” means any charset such that each letter or symbol is represented by one or more bytes without using nulls.) A Python or C/C++ application must indicate how the bytes are going to be interpreted. There are two choices: ANSI or utf-8. Each Chilkat class has a “Utf8” property that controls whether the bytes are interpreted as ANSI or utf-8. Note: The Utf8 property only exists in programming languages where strings are passed as a sequence of bytes. For example, in .NET strings are objects and are always passed as objects (and returned as objects). If the ActiveX is used, then strings are always passed as utf-16. However, in the case of Python or C/C++, strings are simply sequences of bytes and some additional mechanism must be used to indicate how the bytes are to be interpreted.

To encrypt a string, we must precisely specify the exact byte representation of the string we want to be encrypted. This is achieved via the Charset property. For example, maybe it is the ANSI byte representation that is to be encrypted. Or maybe it is the utf-16 byte representation. Or maybe utf-8, or anything else. The mechanism to specify the byte representation of the string to be encrypted must be entirely separate from the mechanism used to unambiguously pass the string to the Chilkat method. These are two separate things. Therefore, string encryption/decryption happens in these steps:

Encrypting a String (EncryptStringENC)

1) Unambiguously pass the string to the EncryptStringENC method.
2) (Internal to the Chilkat method) Convert the string to the byte representation specified by the Charset property.
3) Encrypt
4) Encode the binary encrypted bytes according to the EncodingMode property (which can be base64, hex, etc.) and return this string.

Decrypting a String (DecryptStringENC)

1) Pass the encoded string to DecryptStringENC method. Note that all possible encodings (base64, hex, etc.) use only us-ascii chars. In all multibyte charsets, it is only the non-us-ascii chars that are different. us-ascii chars are always represented by a single byte that is less than 0x80. Therefore, the Utf8 property can be either true or false because us-ascii chars have the same byte representation in both utf-8 and ANSI.
2) (Internal to the Chilkat method) Decode the base64/hex/etc. to get the binary encrypted bytes.
3) Decrypt to get the string in the byte representation as was indicated by the Charset property when encrypting. (The Charset property must be set to this same value when decrypting.)
4) Unambiguously return the string. For a languages such as Python or C/C++, this means examining the Utf8 property setting, and performing whatever conversion is necessary (if any) to convert from the charset indicated by the Charset property, to return the string in the ANSI or utf-8 encoding. (For languages such as C#, Chilkat will convert as appropriate to return as string object to the .NET language.)

Chinese Character String Literals in VC++ 8, 9, 10, …

It is possible to use string literals within your C++ code — as long as you save your C++ source file using the utf-8 character encoding.

For example, open a .cpp source file and add this line:

	CkString str1;
	str1.appendU(L"京");

When you try to save the .cpp source file, you may get a message such as:

“Some Unicode characters in this file could not be saved in the current codepage. Do you want to resave this file as Unicode in order to maintain your data?”

Click on the “Save with Other Encoding” button on the dialog box that pops up, and then select “Unicode (UTF-8 with signature) – Codepage 65001” for the encoding.

You may use the Unicode string in a Chilkat object like this:

int _tmain(int argc, _TCHAR* argv[])
{
	CkString str1;
	str1.appendU(L"京");

	CkEmail email;

	// Tell the Chilkat object that the bytes pointed to by
	// "const char *" are utf-8:
	email.put_Utf8(true);

	// Set the email's subject:
	email.put_Subject(str1.getUtf8());

	// Set the email's body:
	email.put_Body(str1.getUtf8());

	// Let's use the big5 charset for the email...
	email.put_Charset("big5");

	email.SaveEml("out.eml");

	return 0;
}

Zip with Unicode Filenames (utf-8)

New examples demonstrating how to create a Zip archive using Unicode filenames:

ASP: Create Zip with utf-8 Filenames (Unicode filenames)
SQL Server: Create Zip with utf-8 Filenames (Unicode filenames)
C#: Create Zip with utf-8 Filenames (Unicode filenames)
C++: Create Zip with utf-8 Filenames (Unicode filenames)
MFC: Create Zip with utf-8 Filenames (Unicode filenames)
C: Create Zip with utf-8 Filenames (Unicode filenames)
Delphi: Create Zip with utf-8 Filenames (Unicode filenames)
Visual FoxPro: Create Zip with utf-8 Filenames (Unicode filenames)
Java: Create Zip with utf-8 Filenames (Unicode filenames)
Perl: Create Zip with utf-8 Filenames (Unicode filenames)
PHP: Create Zip with utf-8 Filenames (Unicode filenames)
Python: Create Zip with utf-8 Filenames (Unicode filenames)
Ruby: Create Zip with utf-8 Filenames (Unicode filenames)
VB.NET: Create Zip with utf-8 Filenames (Unicode filenames)
Visual Basic: Create Zip with utf-8 Filenames (Unicode filenames)
VBScript: Create Zip with utf-8 Filenames (Unicode filenames)