Understanding EncryptStringENC and DecryptStringENC in Python and C/C++

Chilkat provides API’s that are identical across a variety of different programming languages. One difficulty in doing this is in handling strings. Different programming languages pass strings in different ways. In some programming languages, such as Python or C/C++, a “string” is simply a sequence of bytes terminated by a null. (I’m referring to “multibyte” strings, not Unicode (utf-16) strings. The term “multibyte” means any charset such that each letter or symbol is represented by one or more bytes without using nulls.) A Python or C/C++ application must indicate how the bytes are going to be interpreted. There are two choices: ANSI or utf-8. Each Chilkat class has a “Utf8” property that controls whether the bytes are interpreted as ANSI or utf-8. Note: The Utf8 property only exists in programming languages where strings are passed as a sequence of bytes. For example, in .NET strings are objects and are always passed as objects (and returned as objects). If the ActiveX is used, then strings are always passed as utf-16. However, in the case of Python or C/C++, strings are simply sequences of bytes and some additional mechanism must be used to indicate how the bytes are to be interpreted.

To encrypt a string, we must precisely specify the exact byte representation of the string we want to be encrypted. This is achieved via the Charset property. For example, maybe it is the ANSI byte representation that is to be encrypted. Or maybe it is the utf-16 byte representation. Or maybe utf-8, or anything else. The mechanism to specify the byte representation of the string to be encrypted must be entirely separate from the mechanism used to unambiguously pass the string to the Chilkat method. These are two separate things. Therefore, string encryption/decryption happens in these steps:

Encrypting a String (EncryptStringENC)

1) Unambiguously pass the string to the EncryptStringENC method.
2) (Internal to the Chilkat method) Convert the string to the byte representation specified by the Charset property.
3) Encrypt
4) Encode the binary encrypted bytes according to the EncodingMode property (which can be base64, hex, etc.) and return this string.

Decrypting a String (DecryptStringENC)

1) Pass the encoded string to DecryptStringENC method. Note that all possible encodings (base64, hex, etc.) use only us-ascii chars. In all multibyte charsets, it is only the non-us-ascii chars that are different. us-ascii chars are always represented by a single byte that is less than 0x80. Therefore, the Utf8 property can be either true or false because us-ascii chars have the same byte representation in both utf-8 and ANSI.
2) (Internal to the Chilkat method) Decode the base64/hex/etc. to get the binary encrypted bytes.
3) Decrypt to get the string in the byte representation as was indicated by the Charset property when encrypting. (The Charset property must be set to this same value when decrypting.)
4) Unambiguously return the string. For a languages such as Python or C/C++, this means examining the Utf8 property setting, and performing whatever conversion is necessary (if any) to convert from the charset indicated by the Charset property, to return the string in the ANSI or utf-8 encoding. (For languages such as C#, Chilkat will convert as appropriate to return as string object to the .NET language.)

Posts about Matching Encryption Output for Different Systems

http://www.chilkatsoft.com/p/p_123.asp

http://www.chilkatsoft.com/p/p_506.asp

http://www.chilkatsoft.com/p/p_103.asp

http://www.chilkatsoft.com/p/p_459.asp

http://www.chilkatsoft.com/p/p_458.asp

http://www.chilkatsoft.com/p/p_457.asp

http://www.chilkatsoft.com/p/p_355.asp

http://www.chilkatsoft.com/p/p_160.asp

http://www.chilkatsoft.com/p/p_102.asp

http://www.chilkatsoft.com/p/php_aes.asp

Format of AES, Blowfish, Twofish, 3DES, etc. Symmetric Encrypted Data?

Question:

I know it isn’t listed in the documentation, but is there any method to test whether a file has been previously encrypted or not?  I would like to perform decryption on a file, but only if it is already encrypted.

Answer:

A symmetric encryption algorithm is simply a transformation of bytes such that the output has the properties of randomly generated byte data.  There is no file format, and each byte value from 0x00 to 0xFF is virtually equally probable.

There is no single test that can be performed to determine if a file is already encrypted.  There are two solutions:

  1. Create your own test based on the  type of file being encrypted.  For example, if XML files are encrypted, then test to see if “<xml” is found in the beginning.
  2. Create your own simple “encrytped file format”.  For example, it could be as easy as writing a 4-byte “marker” at the beginning of every file containing encrypted data.  The marker would be a constant value, such as 0x01020304.  Your application could read the 1st 4 bytes of a file, and if equal to 0x01, 0x02, 0x03, 0x04, then it discards the marker and knows the remainder is the encrypted file data..

How to Compute a URL Signature for the Google Maps API

Examples for computing a URL signature for the Google Maps API:

ASP: URL Signing for Google Maps API
SQL Server: URL Signing for Google Maps API
C#: URL Signing for Google Maps API
C++: URL Signing for Google Maps API
Objective-C: URL Signing for Google Maps API
PowerShell: URL Signing for Google Maps API
MFC: URL Signing for Google Maps API
C: URL Signing for Google Maps API
Delphi: URL Signing for Google Maps API
Visual FoxPro: URL Signing for Google Maps API
Java: URL Signing for Google Maps API
Perl: URL Signing for Google Maps API
PHP: URL Signing for Google Maps API
Python: URL Signing for Google Maps API
Ruby: URL Signing for Google Maps API
VB.NET: URL Signing for Google Maps API
Visual Basic: URL Signing for Google Maps API
VBScript: URL Signing for Google Maps API

How to tell if Data is AES Encrypted Data?

Question:

I really like your component but there’s one problem with Crypt2. Is there anyway to verify if the file is encrypted before decrypt? I’m asking this question is because if I try to decrypt an unencrypted file, that file will be corrupted.

Answer:

The output of any Chilkat Crypt2 method that does symmetric encryption (AES, Blowfish, Triple-DES, etc.) is simply encrypted data.  There is no file format.  Encrypted data will resemble random binary data where all byte values (from 0 to 0xFF) are used.   The typical way to determine if a file is already encrypted is to use knowledge about the file that was encrypted.  For example, if HTML was encrypted, one might first load the file and check to see if common HTML tags are present  — if so, then the HTML file is not encrypted.   You’ll need to devise your own ad-hoc method for determining whether data is encrypted or not.

Getting Started with AES Decryption

This is a common question: You receive encrypted data and a key and want to decrypt. The person providing the encrypted data has provided little information, perhaps only that the encryption algorithm is AES. Where to do you begin, and what additional information, if any, do you need?

Answer:

AES encryption comes in 3 key sizes: 128-bit, 192-bit, and 256-bit. Look at the key you received. Which of the following does it look like:

  1. zxcv1234abcdQWER
  2. 7A786376313233346162636451574552
  3. enhjdjEyMzRhYmNkUVdFUg==

The strings above are all the same key encoded differently.

#1 is a us-ascii string that is exactly 16 characters.  This is a clue that the person gave you a 128-bit key (16 bytes * 8 bits/byte = 128) and that the bytes used for the key are the ascii values of the characters in the string.

#2 is a hexidecimal representation of #1.  If you have a hexidecimal representation of the key, you’ll notice that only the characters 0-9 and A-F (or a-f) are used.  Each byte of the key is represented by 2 ascii bytes.  If your hex string is 32 characters, you have a 16-byte key (and therefore 128-bit encryption).

#3 is a base64 encoded representation of #1.  The tell-tale signs of Base64 are:  It is often a string ending in “=” or “==”, and it is not a multiple of 16 characters in length, and it uses characters not valid in a hex string.  A base64 string will be about 1/3rd longer than the binary bytes it represents.  Thus it is longer than our ascii representation, but shorter than the hex representation.  Therfore, if it’s between 16 and 32 bytes, you can guess 128-bit encryption.  if longer than 32-bytes, it’s 256-bit encryption.

So… once you understand the key, you can set the KeyLength and secret key:

cryptObject.KeyLength = 128;

// If the key is represented as an ascii string:
cryptObject.SetEncodedKey(keyStr, "ascii");

// If the key is represented as an hexidecimal string:
cryptObject.SetEncodedKey(keyStr, "hex");

// If the key is represented as an base64 string:
cryptObject.SetEncodedKey(keyStr, "base64");

OK, the KeyLength and the secret key are specified. What’s left?
You need to know the following:

  • CBC or ECB mode?
  • If CBC mode, what is the initialization vector (IV)
  • Padding scheme?
  • Format of your encrypted data?

Chances are more likely that it is CBC mode (which stands for cipher block chaining).  If so, you need an initialization vector.  This will always be 16 bytes long, regardless of the key length.  If no IV is provided, then it’s probable that it is assumed to be all NULL bytes, and this is the default w/ the Chilkat component.

If you have the IV, then examine it just like you did for the key, and call SetEncodedIV just like you called SetEncodedKey, passing the correct encoding (“ascii”, “hex”, or “base64”) for the 2nd argument.

If ECB mode is used, then set the CipherMode property = “ecb”

cryptObject.CipherMode = "ecb";

The PaddingScheme property may be initially left at the default value (which is the most commonly used).  My suggestion is to test with an amount of data that is more than 16 bytes.  The reason is that if everything is correct *except* the PaddingScheme, then your decrypted output will be correct except for the very last 16 bytes.  Once you know that all is correct except for the padding scheme, you can test with different PaddingScheme values.  If you only have a very short amount of data for testing, then it’s not possible to make this distinction.

Finally, look at the encrypted data itself.  Is it hex or base64?  If it is a “string” it must be one or the other.  You’ll want to set the EncodingMode property equal to the encoding of the encrypted data:

cryptObject.EncodingMode = "hex";

Assuming the decrypted result is a string, you’ll call DecryptStringENC.  The “ENC” in the function name indicates that the input is an encoded string and that the encoding is specified by the EncodingMode property.  It returns a string — your decrypted data.

string decryptedStr = cryptObject.DecryptStringENC(encryptedStr);

Encryption Progress Monitoring

Question:

Is there a way to encrypt a file with progress monitoring?  Huge files can take a while and it seems like the app is hanging.

Answer:

Yes, here is the sample VB6 code:

Public WithEvents myCrypt As ChilkatCrypt2

' ....

Private Sub myCrypt_PercentDone(ByVal pctDone As Long)

    ProgressBar1.Value = pctDone
    
End Sub

Private Sub Command2_Click()

    Set myCrypt = New ChilkatCrypt2
    
    success = myCrypt.UnlockComponent("test")
    
    ' ...

    success = myCrypt.CkEncryptFile("c:/temp/big.txt", "c:/temp/bigEncrypted.dat")
        
End Sub

Matching MySQL’s AES_ENCRYPT Functions

The following example programs demonstrate how to match MySQL’s AES_ENCRYPT function in different programming languages:

ASP: Match MySQL AES_ENCRYPT Function

SQL Server: Match MySQL AES_ENCRYPT Function

C#: Match MySQL AES_ENCRYPT Function

C++: Match MySQL AES_ENCRYPT Function

MFC: Match MySQL AES_ENCRYPT Function

C: Match MySQL AES_ENCRYPT Function

Delphi: Match MySQL AES_ENCRYPT Function

Visual FoxPro: Match MySQL AES_ENCRYPT Function

Java: Match MySQL AES_ENCRYPT Function

Perl: Match MySQL AES_ENCRYPT Function

PHP: Match MySQL AES_ENCRYPT Function

Python: Match MySQL AES_ENCRYPT Function

Ruby: Match MySQL AES_ENCRYPT Function

VB.NET: Match MySQL AES_ENCRYPT Function

Visual Basic: Match MySQL AES_ENCRYPT Function

VBScript: Match MySQL AES_ENCRYPT Function

Block Encryption Algorithms and Encoding

Question:
In our application we have the need to encrypt a 19 character simple string value.
When running through the sample test program I found that the resulting
encrypted string varied in size from at least 44 to 64 characters depending
on the encoding type parameter.

Is there a way to generate an encrypted value that will generate a string of
the same length as the input string? If that was the case we could just
replace the clear text with the cipher text.

If not, will the resulting encrypted value always be the same length so that
we can expand the database to house a new string value of constant size?

Answer:

All block encryption algorithms, such as AES, Triple-DES, Blowfish, etc. will produce encrypted output that is a multiple of the algorithm’s block size. For AES, the block size is 16 bytes. For Triple-DES and Blowfish, the block-size is 8 bytes. Regardless of the key length (128-bit, 256-bit, etc.) the block size is constant for the algorithm.

Therefore, if you encrypt 19 bytes using AES encryption, the result will be 32 bytes.

All encryption algorithms produce output that resembles random binary data. In other words, all byte values (0x00 – 0xFF) are equally likely in the output. Obviously, if you want the encrypted output in printable string format, it must be encoded using an encoding algorithm such as Base64 or Hexadecimalization. Base64 is the most efficient means of encoding binary data to printable strings. Base64 encoding uses 4 printable characters for every 3 binary characters. Depending on what’s leftover at the end (if the amount of binary data is not evenly divisible by 3), one or two extra characters will be output (these are the “=” characters you typically see at the end of Base64 encoded output).

Therefore, 19 bytes encrypted using AES results in 32 binary bytes. The 32 bytes represented as a printable string in Base64 is 44 characters. In summary: Encrypting 19 bytes with AES will always result in a 44-character Base64 string.

You may wonder: how is it possible to know the length of the original data when decrypting? The PaddingScheme property defaults to a value of 1, which specifies RFC 1423 padding, also known as PKCS7 padding. In this scheme, the last block of data is padded with bytes having a value equal to the number of padding bytes. If the original data is already a multiple of the block size, then a full extra block of padding is added. Decrypting software would examine the last byte of the decrypted output and discard this many bytes to arrive at the exact original data (in both content and length).
(This is exactly what the Chilkat decryption methods do when the PaddingScheme = 1.)

Encrypting Chinese Characters

Question:
Why is it the return is blank when encrypting chinese characters?
Here’s a snippet of my code:

  crypt.KeyLength := 256;
  crypt.SecretKey := Password;
  crypt.CryptAlgorithm := 'aes';
  crypt.EncodingMode := 'base64';
  OutPutStr := crypt.EncryptStringENC(StringToEncrypt);

Answer:

Strings in some programming languages such as Visual Basic, C#, VB.NET, Delphi, Foxpro, etc. should be thought of as objects.  The object contains a string (i.e. a sequence of characters that renders to a sequence of glyphs).  The representation of the string within the object is private — the application shouldn’t care.  For these languages it happens to be Unicode (the 2-byte per char encoding), so the string object is capable of containing characters in any spoken language.  (Of course, just because the string may contain characters in any spoken language doesn’t mean glyphs of any language are renderable, and this is a big problem in older programming languages such as VB6, Delphi, etc. where the visual controls are not capable of mixing glyphs of any language — i.e. they are not Unicode capable controls even though the string data type (i.e. object) holds characters represented internally in Unicode.

OK, back to the main point…

The representation of the string (i.e. the encoding used to represent each character as a sequence of 1 or more bytes) within the string object is private — the application shouldn’t care.   With encryption however, it matters greatly.  Encryption algorithms operate on bytes.  (The same goes for hash algorithms)   Therefore, when you encrypt Chinese characters, did you intend to encrypt 2-byte per char Unicode?  Did you intend to encrypt the utf-8 representation of the characters?  What about the “big5” or “gb2312” character encoding representations?  All would provide different results (of course).

The Crypt.Charset property controls the charset (character encoding) used for encrypting strings.  The string passed to EncryptString* is first converted (internally) to a byte array using the specified character encoding, and then this byte array is encrypted.  The default value for Crypt.Charset is “ANSI”.  In most cases, this is what you expect — you’re expecting a typical European accented character to be represented as a single byte in the default charset of the computer.  This doesn’t work with Chinese (or other Asian languages), or any language that doesn’t match the locale of the computer.  The internal conversion from Unicode to ANSI is dropping the characters where there is no 1-byte/char representation.

The solution:  Set Crypt.Charset equal to the encoding desired.  For Chinese it would be one of the following:  “utf-8”, “Unicode”, “big5”, “gb2312”.