Unzipping zips with Unicode Filenames

Question:

I see your example code includes the ability to create a zip with Unicode filenames.
Does your product support unzipping files with unicode filenames, such as Chinese?

Answer:

Yes, it can unzip files w/ Unicode names — assuming the .zip was correctly created.  Your first test should be to unzip without trying anything differently.  Check to see if the files are created w/ the correct filenames.  If not, try setting the zip.OemCodePage property = 65001 (for utf-8), then re-try.  If that doesn’t work, it may be that the .zip was created such that the filenames are embedded using a specific charset.  You would need to set the OemCodePage property to match that charset.  See this: http://www.chilkatsoft.com/p/p_453.asp

Chinese Character String Literals in VC++ 8, 9, 10, …

It is possible to use string literals within your C++ code — as long as you save your C++ source file using the utf-8 character encoding.

For example, open a .cpp source file and add this line:

	CkString str1;
	str1.appendU(L"京");

When you try to save the .cpp source file, you may get a message such as:

“Some Unicode characters in this file could not be saved in the current codepage. Do you want to resave this file as Unicode in order to maintain your data?”

Click on the “Save with Other Encoding” button on the dialog box that pops up, and then select “Unicode (UTF-8 with signature) – Codepage 65001” for the encoding.

You may use the Unicode string in a Chilkat object like this:

int _tmain(int argc, _TCHAR* argv[])
{
	CkString str1;
	str1.appendU(L"京");

	CkEmail email;

	// Tell the Chilkat object that the bytes pointed to by
	// "const char *" are utf-8:
	email.put_Utf8(true);

	// Set the email's subject:
	email.put_Subject(str1.getUtf8());

	// Set the email's body:
	email.put_Body(str1.getUtf8());

	// Let's use the big5 charset for the email...
	email.put_Charset("big5");

	email.SaveEml("out.eml");

	return 0;
}

Zip with Unicode Filenames (utf-8)

New examples demonstrating how to create a Zip archive using Unicode filenames:

ASP: Create Zip with utf-8 Filenames (Unicode filenames)
SQL Server: Create Zip with utf-8 Filenames (Unicode filenames)
C#: Create Zip with utf-8 Filenames (Unicode filenames)
C++: Create Zip with utf-8 Filenames (Unicode filenames)
MFC: Create Zip with utf-8 Filenames (Unicode filenames)
C: Create Zip with utf-8 Filenames (Unicode filenames)
Delphi: Create Zip with utf-8 Filenames (Unicode filenames)
Visual FoxPro: Create Zip with utf-8 Filenames (Unicode filenames)
Java: Create Zip with utf-8 Filenames (Unicode filenames)
Perl: Create Zip with utf-8 Filenames (Unicode filenames)
PHP: Create Zip with utf-8 Filenames (Unicode filenames)
Python: Create Zip with utf-8 Filenames (Unicode filenames)
Ruby: Create Zip with utf-8 Filenames (Unicode filenames)
VB.NET: Create Zip with utf-8 Filenames (Unicode filenames)
Visual Basic: Create Zip with utf-8 Filenames (Unicode filenames)
VBScript: Create Zip with utf-8 Filenames (Unicode filenames)

FTP Unicode Directory Listings

Question:
Files on the FTP server contain Unicode characters (Chinese, Japanese, Russian, etc.). How do I get the correct filenames in my (Chilkat FTP2) client?

Answer:

It is very dependent on the capabilities of the FTP server. Many servers are incapable of sending Unicode directory listings.

If an FTP server supports Unicode directory listings, it will use utf-8 (which is the multibyte encoding for Unicode). An FTP server indicates UTF8 support in the response to the FEAT command:

C> FEAT
S> 211-Extensions supported
S> SIZE
S> MDTM
S> MLST size*;type*;perm*;create*;modify*;
S> LANG EN*
S> REST STREAM
S> UTF8
S> 211 end

Note: Many FTP servers don’t even support the FEAT command.

The Chilkat FTP2 component automatically sends a FEAT command after connecting. One reason for doing this is to auto-detect UTF8 support. If present, Chilkat FTP2 will automatically receive directory listings in the utf-8 encoding. (For programming languages that use Unicode strings, Chilkat will automatically convert utf-8 to a proper Unicode string in your programming language, such as C#, VB.NET, ASP, VB6, Delphi, etc.)

The FTP2 component’s AutoFeat property (true/false) controls whether the FEAT command is automatically sent. It defaults to true.

The DirListingCharset property can be set to override the auto-detection. You would only use DirListingCharset if the FTP server sends ANSI directory listings where the ANSI charset on the server is different from the ANSI charset on the client. For example, perhaps the FTP server is in Japan where the ANSI charset is Shift-JIS, but your FTP client is in France where the ANSI charset is iso-8859-1 (or Windows-1252). The DirListingCharset might also be used if the FTP server sends utf-8 listings but does not support the FEAT command, or does not list UTF8 in the FEAT response.

To see the FEAT response of your FTP server, examine the contents of the FTP2’s SessionLog property after connecting. (Be sure to enable session logging by setting the KeepSessionLog property = true prior to connecting.)