Block Encryption Algorithms and Encoding
Question:
In our application we have the need to encrypt a 19 character simple string value.
When running through the sample test program I found that the resulting
encrypted string varied in size from at least 44 to 64 characters depending
on the encoding type parameter.
Is there a way to generate an encrypted value that will generate a string of
the same length as the input string? If that was the case we could just
replace the clear text with the cipher text.
If not, will the resulting encrypted value always be the same length so that
we can expand the database to house a new string value of constant size?
Answer:
All block encryption algorithms, such as AES, Triple-DES, Blowfish, etc. will produce encrypted output that is a multiple of the algorithm’s block size. For AES, the block size is 16 bytes. For Triple-DES and Blowfish, the block-size is 8 bytes. Regardless of the key length (128-bit, 256-bit, etc.) the block size is constant for the algorithm.
Therefore, if you encrypt 19 bytes using AES encryption, the result will be 32 bytes.
All encryption algorithms produce output that resembles random binary data. In other words, all byte values (0x00 – 0xFF) are equally likely in the output. Obviously, if you want the encrypted output in printable string format, it must be encoded using an encoding algorithm such as Base64 or Hexadecimalization. Base64 is the most efficient means of encoding binary data to printable strings. Base64 encoding uses 4 printable characters for every 3 binary characters. Depending on what’s leftover at the end (if the amount of binary data is not evenly divisible by 3), one or two extra characters will be output (these are the “=” characters you typically see at the end of Base64 encoded output).
Therefore, 19 bytes encrypted using AES results in 32 binary bytes. The 32 bytes represented as a printable string in Base64 is 44 characters. In summary: Encrypting 19 bytes with AES will always result in a 44-character Base64 string.
You may wonder: how is it possible to know the length of the original data when decrypting? The PaddingScheme property defaults to a value of 1, which specifies RFC 1423 padding, also known as PKCS7 padding. In this scheme, the last block of data is padded with bytes having a value equal to the number of padding bytes. If the original data is already a multiple of the block size, then a full extra block of padding is added. Decrypting software would examine the last byte of the decrypted output and discard this many bytes to arrive at the exact original data (in both content and length).
(This is exactly what the Chilkat decryption methods do when the PaddingScheme = 1.)