 | Substitution cipher: Encyclopedia II - Substitution cipher - Simple substitution
Substitution cipher - Simple substitution
Substitution over a single letter—simple substitution—can be demonstrated by writing out the alphabet in some order to represent the substitution. This is termed a substitution alphabet. The cipher alphabet may be shifted or reversed (creating the Caesar and Atbash ciphers, respectively) or scrambled in a more complex fashion, in which case it is called a mixed alphabet or deranged alphabet. Traditionally, mixed alphabets are created by first writing out a keyword, then all the remaining letters.
Substitution cipher - Examples
Using this system, the keyword "zebras" gives us the following alphabets:
A message of
flee at once. we are discovered!
enciphers to
SIAA ZQ LKBA. VA ZOA RFPBLUAOAR!
Traditionally, the ciphertext is written out in blocks of fixed length, omitting punctuation and spaces; this is done to help avoid transmission errors and to disguise word boundaries from the plaintext. These blocks are called "groups", and sometimes a "group count" (i.e., the number of groups) is given as an additional check. Five letter groups are traditional, dating from when messages used to be transmitted by telegraph:
SIAAZ QLKBA VAZOA RFPBL UAOAR
If the length of the message happens not to be divisible by five, it may be padded at the end with "nulls". These can be any characters that decrypt to obvious nonsense, so the receiver can easily spot them and discard them.
The ciphertext alphabet is sometimes different from the plaintext alphabet; for example, in the pigpen cipher, the ciphertext consists of a set of symbols derived from a grid. For example:
Such features make little difference to the security of a scheme, however — at the very least, any set of strange symbols can be transcribed back into an A-Z alphabet and dealt with as normal.
Substitution cipher - Security for simple substitution ciphers
A disadvantage of this method of derangement is that the last letters of the alphabet (which are mostly low frequency) tend to stay at the end. A stronger way of constructing a mixed alphabet is to perform a columnar transposition on the ordinary alphabet using the keyword, but this is not often done.
Although the number of possible keys is very large (, or about 88 bits), this cipher is not very strong, being easily broken. Provided the message is of reasonable length (see below), the cryptanalyst can deduce the probable meaning of the most common symbols by analysing the frequency distribution of the ciphertext—frequency analysis. This allows formation of partial words, which can be tentatively filled in, progressively expanding the (partial) solution (see frequency analysis for a demonstration of this). In some cases, underlying words can also be determined from the pattern of their letters; for example, attract, osseous, and words with those two as the root are the only common English words with the pattern ABBCADB. Many people solve such ciphers for recreation, as with cryptogram puzzles in the newspaper.
According to the unicity distance of English, 27.6 letters of ciphertext are required to crack a mixed alphabet simple substitution. In practice, typically about 50 letters are needed, although some messages can be broken with fewer if unusual patterns are found. In other cases, the plaintext can be contrived to have a nearly flat frequency distribution, and much longer plaintexts will then be required.
Other related archives1467, 1500, 1563, 1585, 1854, 1863, 1919, 1929, 1930, 1940s, 1950s, 1960s, 1975, 88 bits, AES, Algebraic, Allies, Ancient Greek, Atbash, Blaise de Vigenère, Bletchley Park, Boer War, Caesar, Charles Babbage, Charles Wheatstone, Classical ciphers, Claude Shannon, Crimean War, Cuban missile crisis, DES, Dilwyn Knox, English, Enigma, Feistel cipher, Felix Delastelle, French, Friedrich Kasiski, GC&CS, German military, Gilbert Vernam, Giovanni Battista della Porta, Great Cypher, Hebern's rotor machine, Hill cipher, Johannes Trithemius, Joseph Mauborgne, Latin, Leone Battista Alberti, Louis XIV of France, Marian Rejewski, Moscow, One-time pad, Playfair cipher, Poe, Poland, Polyalphabetic cipher, ROCKEX, Rossignols, S-boxes, SIGABA, SIS, Soviet, The Gold Bug, Topics in cryptography, Typex, VENONA, Vigenère cipher, Voynich manuscript, WWI, WWII, Washington, William F. Friedman, World War II, XOR, archives, autokey cipher, base 26, basis, bifid, binary, block ciphers, book cipher, cipher, ciphertext, code, codebook, codewords, columnar transposition, computers, conspiracy, cryptanalysis, cryptanalyst, cryptanalysts, cryptogram, cryptography, diffusive, dimensions, diplomatic, disks, eighteenth century, electrically, encryption, espionage, fifteenth century, four-square ciphers, frequency analysis, frequency distribution, geometric, glyphs, government, historical research, intelligence, keys, known-plaintext attack, linear, linear algebra, matrix, modulo, non-linear, nulls, one-time pad, periodicity, pigpen cipher, plaintext, prime, random, rotor cipher machines, running key cipher, sixteenth century, straddling checkerboard, stream ciphers, substitution-permutation network, table, tabula recta, telegraph, transposition ciphers, trifid cipher, unicity distance, vector
 Adapted from the Wikipedia article "Simple substitution", under the G.N U Free Docmentation License. Please also see http://en.wikipedia.org/wiki |