Classical ciphers are used to encrypt plaintext messages written in a natural language in such a way that they are readable for sender or intended recipient only. Many classical ciphers can be broken by brute-force search through the key-space. One of the pertinent problems arising in automated cryptanalysis is the plaintext recognition. A computer should be able to decide which of many possible decrypts are meaningful. This can be accomplished by means of a text scoring function, based, e.g. on n-grams or other text statistics. A scoring function can also be used in conjunction with AI methods to speedup cryptanalysis.
In the process of automated cryptanalysis we decrypt the ciphertext with many possible keys to obtain candidate plaintexts. Most of the candidates are incorrect, having no meaning in a natural language. On the other hand, even the correct plaintext can be hard to recognize and with the wrong recognition routine can be missed altogether.
The basic type of algorithm suitable for automated cryptanalysis is a brute force attack. This attack is only feasible when key-space is searchable on computational resources available to an attacker. The average time needed to verify a candidate strongly influences the size of searchable key-space. Thus, the plaintext recognition is the most critical part of the algorithm from the performance point of view. On the other hand, only the most complex algorithms achieve really high accuracy of the plaintext recognition. Thus the complexity and accuracy of plaintext recognition algorithms must be carefully balanced.
A generic brute force algorithm with plaintext recognition can be described by the pseudo-code in Figure 1.
Algorithm integrates the three layers of plaintext recognition, namely negative test predicate, fast scoring function and precise scoring function, as a three-layer filter. The final scoring function is also used to sort the outputs. First filter should be very fast, with very low error probability. Fast score should be easy to compute, but it is not required to precisely identify the correct plaintext. Correct plaintext recognition is the role of precise scoring function. In the algorithm, the best score is the highest one. If the score is computed in the opposite meaning, the algorithm must be rewritten accordingly.
Key Terms in this Chapter
Candidate Text: The text that was obtained by application of decryption algorithm on ciphertext using some key k ? K. If k is the correct key (or the equivalent key to) K, then candidate text is a valid plaintext x, otherwise it is a text encrypted by concatenation of dk (eK(x)).
Ciphertext: The encrypted text, a string of letters from alphabet C of a given cryptosystem by a given key K ? K..
Brute-Force Attack: Exhaustive cryptanalytic technique that searches the whole key-space to find the correct key.
Plaintext: The unencrypted text, a string of letters from alphabet P of a given cryptosystem.
Key-Space: Set of all possible keys for a given ciphertext. Key-space can be limited to a subspace of the whole K by some prior knowledge.
Cryptanalysis: Is a process of trying to decrypt given ciphertext and/or find the key without, or with only partial knowledge of the key. It is also a research area studying techniques of cryptanalysis.
Classical Cipher: A classical cipher system is a five-tuple (P,C,K,E,D), where P, C, define plaintext and ciphertext alphabet, K is the set of possible keys, and for each K ? K, there exists an encryption algorithm eK ? E, and a corresponding decryption algorithm dK ? D such that dK (eK(x)) = x for every input x?P and K ? K..
Scoring Function: Scoring function is used to evaluate fitness of a candidate text for a key k ? K.. Ideal scoring function has global extreme in the correct plaintext, i.e. when k = K.
Plaintexts Filter: An algorithm, or predicate, used to determine, which texts are not valid plaintexts. Ideal plaintexts filter never produces answer INVALID for a correct plaintext.