Substitution Cipher
The resources on this page were originally created by Dr. Aaron Bradley of Summit Middle School. I've done some reformatting to add clarity but that is about it. Enjoy!

?oPPxruSoBoa
This message may not make sense to you until you learn that it's enciphered according to the following substitutions:
H > ?
e > o
l > P
o > x
' ' > r (that is, a space becomes an 'r')
t > u
h > S
r > B
! > a
Applying the substitutions in reverse reveals the message.
This exploration builds a Python module to encipher plaintext (that is, the human readable message) and decipher ciphertext (that is, the unreadable form of the message) using a substitution cipher, like the one above. For example, if your friend sends you a message like the following,
42
QMNlFSMNoOsN;tNSOratOaAlONSySbO AzSOAO
MlFApSOaAlONSySbOFbMSzOAN;FaMNmONSrR
you will be able to fire up your Python interpreter and do the following:
>>> import cipher
>>> cipher.build(42)
>>> cipher.decipher('QMNlFSMNoOsN;tNSOratOaAlONSySbO AzSOAO MlFApSOaAlONSySbOFbMSzOAN;FaMNmONSrR'
During the process of building this module, you will learn
This message may not make sense to you until you learn that it's enciphered according to the following substitutions:
H > ?
e > o
l > P
o > x
' ' > r (that is, a space becomes an 'r')
t > u
h > S
r > B
! > a
Applying the substitutions in reverse reveals the message.
This exploration builds a Python module to encipher plaintext (that is, the human readable message) and decipher ciphertext (that is, the unreadable form of the message) using a substitution cipher, like the one above. For example, if your friend sends you a message like the following,
42
QMNlFSMNoOsN;tNSOratOaAlONSySbO AzSOAO
MlFApSOaAlONSySbOFbMSzOAN;FaMNmONSrR
you will be able to fire up your Python interpreter and do the following:
>>> import cipher
>>> cipher.build(42)
>>> cipher.decipher('QMNlFSMNoOsN;tNSOratOaAlONSySbO AzSOAO MlFApSOaAlONSySbOFbMSzOAN;FaMNmONSrR'
During the process of building this module, you will learn
 more about lists and strings,
 about maps (not the geographic kind),
 about using random number generation,
 and about structuring a module.
Copying and Modifying Lists
1. In previous lessons, you used lists to some extent. For example,
>>> x = [1, 2, 3]
>>> x[0]
1
>>> x[2]
3
>>> len(x)
3
Suppose that we assign another variable to refer to the same list:
>>> y = x
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]
2. What if we change an entry of one of them?
>>> x[0] = 42
>>> x
[42, 2, 3]
>>> y
[42, 2, 3]
What happened?!
In this case, variables x and y refer to the very same list. It's as if you and a friend shared one piece of paper with a list on it. When your friend erases an entry and replaces it with something else (e.g., preferring "pickles" to "cucumbers"), your list (the very same one) changes, too.
3. To copy a list requires making it clear that you really want a copy. Recall how to obtain sublists:
>>> x
[42, 2, 3]
>>> x[0:2]
[42, 2]
>>> x[1:3]
[2, 3]
4. Sublists are copies of part of the original list. To copy an entire list, do the following:
>>> y = x[0:len(x)]
Now there are two lists:
>>> x
[42, 2, 3]
>>> y
[42, 2, 3]
>>> x[1] = 1
>>> x
[42, 1, 3]
>>> y
[42, 2, 3]
>>> y[1] = 2012
>>> x
[42, 1, 3]
>>> y
[42, 2012, 3]
A shorthand for y = x[0:len(x)] is just y = x[:].
1. In previous lessons, you used lists to some extent. For example,
>>> x = [1, 2, 3]
>>> x[0]
1
>>> x[2]
3
>>> len(x)
3
Suppose that we assign another variable to refer to the same list:
>>> y = x
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]
2. What if we change an entry of one of them?
>>> x[0] = 42
>>> x
[42, 2, 3]
>>> y
[42, 2, 3]
What happened?!
In this case, variables x and y refer to the very same list. It's as if you and a friend shared one piece of paper with a list on it. When your friend erases an entry and replaces it with something else (e.g., preferring "pickles" to "cucumbers"), your list (the very same one) changes, too.
3. To copy a list requires making it clear that you really want a copy. Recall how to obtain sublists:
>>> x
[42, 2, 3]
>>> x[0:2]
[42, 2]
>>> x[1:3]
[2, 3]
4. Sublists are copies of part of the original list. To copy an entire list, do the following:
>>> y = x[0:len(x)]
Now there are two lists:
>>> x
[42, 2, 3]
>>> y
[42, 2, 3]
>>> x[1] = 1
>>> x
[42, 1, 3]
>>> y
[42, 2, 3]
>>> y[1] = 2012
>>> x
[42, 1, 3]
>>> y
[42, 2012, 3]
A shorthand for y = x[0:len(x)] is just y = x[:].
Lists of Strings
1. To obtain the "characters" (that is, individual letters or punctuation) of a string, do the following:
>>> list("Hello Summit!")
['H', 'e', 'l', 'l', 'o', ' ', 'S', 'u', 'm', 'm', 'i', 't', '!']
We'll use this technique shortly.
1. To obtain the "characters" (that is, individual letters or punctuation) of a string, do the following:
>>> list("Hello Summit!")
['H', 'e', 'l', 'l', 'o', ' ', 'S', 'u', 'm', 'm', 'i', 't', '!']
We'll use this technique shortly.
Maps
1. A map is a data structure that maps values of one type to values of another. For example,
>>> age = {}
>>> age["Brian"] = 12
>>> age["Amy"] = 13
>>> age["Ronald"] = 12
>>> age["Ronald"]
12
>>> age["Brian"]
12
>>> age
{'Amy': 13, 'Brian': 12, 'Ronald': 12}
2. In Python, maps are called "dictionaries," because they behave like dictionaries:
>>> animals = {}
>>> animals["duck"] = "A quacker."
>>> animals["cat"] = "A meower."
>>> animals["duck"]
'A quacker.'
3. For our application, maps provide a great way of building a table of substitutions. For example, to build (the beginning) of the substitution table in the introduction to this exploration, we could do the following:
>>> forward = {}
>>> forward["H"] = '?'
>>> forward["e"] = 'o'
>>> forward["l"] = 'P'
and so on.
4. To decode, we'd want a backward map as well:
>>> backward = {}
>>> backward["?"] = 'H'
>>> backward["o"] = 'e'
>>> backward["P"] = 'l'
and so on.
5. Then a slow way of deciphering '?oPPxruSoBoa' is to go symbolbysymbol:
>>> backward["?"]
'H'
>>> backward["o"]
'e'
and so on.
1. A map is a data structure that maps values of one type to values of another. For example,
>>> age = {}
>>> age["Brian"] = 12
>>> age["Amy"] = 13
>>> age["Ronald"] = 12
>>> age["Ronald"]
12
>>> age["Brian"]
12
>>> age
{'Amy': 13, 'Brian': 12, 'Ronald': 12}
2. In Python, maps are called "dictionaries," because they behave like dictionaries:
>>> animals = {}
>>> animals["duck"] = "A quacker."
>>> animals["cat"] = "A meower."
>>> animals["duck"]
'A quacker.'
3. For our application, maps provide a great way of building a table of substitutions. For example, to build (the beginning) of the substitution table in the introduction to this exploration, we could do the following:
>>> forward = {}
>>> forward["H"] = '?'
>>> forward["e"] = 'o'
>>> forward["l"] = 'P'
and so on.
4. To decode, we'd want a backward map as well:
>>> backward = {}
>>> backward["?"] = 'H'
>>> backward["o"] = 'e'
>>> backward["P"] = 'l'
and so on.
5. Then a slow way of deciphering '?oPPxruSoBoa' is to go symbolbysymbol:
>>> backward["?"]
'H'
>>> backward["o"]
'e'
and so on.
______________________________________________________________________________________________________________
Building a Substitution (and deSubstitution) Map with random
Okay, now we want to start keeping everything we do. Let's start a new module called "Cipher DRAFT". This isn't going to be out actual program; it will be our workspace to make sure the pieces work before putting everything together.
Building maps by hand isn't fun, so let's explore a method to instruct the computer to build one. First, let's build a list of letters and common punctuation:
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ,.;?!:")
Now let's build a second list that is a shuffling of symbols:
import random
shuffled = symbols[:]
random.shuffle(shuffled)
print (shuffled)
You should an output of letters: ['t', 'S', 'M', 'T', 'u', ... You will (probably) get a different outcome than I did for your shuffled list.
The first line imports the random module, which you can read about here, if you wish. The second line should be familiar from this exploration; if not, reread the section on "Copying and Modifying Lists." The third line uses the random module's shuffle function, which shuffles a list, much as you shuffle a deck of cards.
The idea is that the two lists symbols and shuffled represent a substitution cipher. For example, symbols[0]that is, the letter 'a'is replaced with shuffled[0]that is, the letter 'v'in messages. Let's build forward and backward maps to accomplish this substitution (and desubstitution for deciphering):
forward = {}
backward = {}
for i in range(len(symbols)):
forward[symbols[i]] = shuffled[i]
backward[shuffled[i]] = symbols[i]
#now let's check the code:
print ("forward = ")
print (forward)
print ("backward = ")
print (backward)
Hopefully, you get a bunch of letters and not an error message.
Here, forward is applied to each individual symbol of the message "Hello Summit!" The result is a list of the substituted letters, but we'd prefer a string. A python idiom for converting a list of strings into a single string is illustrated by the following:
x = ["H", "i"]
"".join(x)
Let's use this idiom:
print("".join([forward[x] for x in "Hello Summit!"]))
'Rhuuls ?KK,rW'
You will get a different outcome than I did. To decipher this ciphertext, we use the backward map. In my case, I type
print("".join([backward[x] for x in "Rhuuls ?KK,rW"]))
'Hello Summit!'
You should replace "Rhuuls ?KK,rW" with the ciphertext that your substitution produced.
Building a Substitution (and deSubstitution) Map with random
Okay, now we want to start keeping everything we do. Let's start a new module called "Cipher DRAFT". This isn't going to be out actual program; it will be our workspace to make sure the pieces work before putting everything together.
Building maps by hand isn't fun, so let's explore a method to instruct the computer to build one. First, let's build a list of letters and common punctuation:
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ,.;?!:")
Now let's build a second list that is a shuffling of symbols:
import random
shuffled = symbols[:]
random.shuffle(shuffled)
print (shuffled)
You should an output of letters: ['t', 'S', 'M', 'T', 'u', ... You will (probably) get a different outcome than I did for your shuffled list.
The first line imports the random module, which you can read about here, if you wish. The second line should be familiar from this exploration; if not, reread the section on "Copying and Modifying Lists." The third line uses the random module's shuffle function, which shuffles a list, much as you shuffle a deck of cards.
The idea is that the two lists symbols and shuffled represent a substitution cipher. For example, symbols[0]that is, the letter 'a'is replaced with shuffled[0]that is, the letter 'v'in messages. Let's build forward and backward maps to accomplish this substitution (and desubstitution for deciphering):
forward = {}
backward = {}
for i in range(len(symbols)):
forward[symbols[i]] = shuffled[i]
backward[shuffled[i]] = symbols[i]
#now let's check the code:
print ("forward = ")
print (forward)
print ("backward = ")
print (backward)
Hopefully, you get a bunch of letters and not an error message.
Here, forward is applied to each individual symbol of the message "Hello Summit!" The result is a list of the substituted letters, but we'd prefer a string. A python idiom for converting a list of strings into a single string is illustrated by the following:
x = ["H", "i"]
"".join(x)
Let's use this idiom:
print("".join([forward[x] for x in "Hello Summit!"]))
'Rhuuls ?KK,rW'
You will get a different outcome than I did. To decipher this ciphertext, we use the backward map. In my case, I type
print("".join([backward[x] for x in "Rhuuls ?KK,rW"]))
'Hello Summit!'
You should replace "Rhuuls ?KK,rW" with the ciphertext that your substitution produced.
Keys
Because of the use of shuffling to build our substitution map, it seems that a cipher text produced on one computer by one friend will not be able to be deciphered on another computer by another friend. The substitution that the receiver builds is unlikely to be the same as the one that the sender built.
Knowing the "key" to deciphering a text is a fundamental necessity in ciphers. In our case, we can control the shuffling by controlling the random module:
x = [1, 2, 3]
random.seed(42)
random.shuffle(x)
print (x)
=[2, 1, 3]
I bet that your final list is the same as mine. (Note: On platforms other than Windows 7, you may get a different result. For example, on a Linux installation at home, I get [3, 1, 2].) In fact, if you choose a random number between 0 and 1000 next, I'll bet you get the same thing as me:
Print(random.randint(0, 1000))
=25
True randomness is actually very difficult to achieve with computers, and so "pseudorandom functions" are used instead (which can vary by application or operating system). These functions can be controlled through the "seed" number. (Think of a plant growing from a seed: a random sequence "grows" from a seed.) Here, the seed number 42 put the random module into the same state on your computer as on my computer, so subsequent uses of it result in the same outcomes on our two computers.
The seed is the key that we have to give a friend along with the cipher text. Recall the second message in the introduction to this exploration:
42
QMNlFSMNoOsN;tNSOratOaAlONSySbO AzSOAO MlFApSOaAlONSySbOFbMSzOAN;FaMNmONSrR
The first line of the message is the "key": the number that we will use as the random seed when building the substitution map.
Putting it All Together: The cipher Module
We have now worked through all the concepts that we need to build the cipher module, so create a new Python module and save it as "cipher.py." Let's put the pieces together.
First we need to import the random module:
import random
Let's create our forward and backward maps:
forward = {}
backward = {}
Now let's define a function to build our substitution (and desubstitution) maps. A line of code that begins with a # symbol is a "comment," that is, a human (but not computer) readable message to someone reading the program. It has absolutely no effect on the program itself; rather, it is intended to explain to the reader what the program does. Notice how much easier it is for you to understand the following function with the comments than if the comments were missing:
# Builds the forward/backward substitution maps based on the given key.
def build(key):
# The key in our cipher system is used as the random seed.
random.seed(key)
# Build a list of letters and common punctuation.
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ,.;?!:")
# Copy the list...
shuffled = symbols[:]
# ... and shuffle it.
random.shuffle(shuffled)
# Now build the forward/backward substitution maps.
for i in range(len(symbols)):
# For each symbol, map it to the corresponding symbol in shuffled...
forward[symbols[i]] = shuffled[i]
# ... and back.
backward[shuffled[i]] = symbols[i]
The version without comments does exactly the same thing, but it's harder to read:
def build(key):
random.seed(key)
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ,.;?!:")
shuffled = symbols[:]
random.shuffle(shuffled)
for i in range(len(symbols)):
forward[symbols[i]] = shuffled[i]
backward[shuffled[i]] = symbols[i]
Although programs are written to be run on computers, they must be readable by people, as large programs are written by teams, not by individuals.
Now we can implement the encipher and decipher functions:
# Enciphers the given plaintext message using the forward substitution map.
def encipher(message):
return "".join([forward[x] for x in message])
# Deciphers the given ciphertext message using the backward substitution map.
def decipher(message):
return "".join([backward[x] for x in message])
The full cipher module, saved as "cipher.py," is thus:
import random
forward = {}
backward = {}
# Builds the forward/backward substitution maps based on the given key.
def build(key):
# The key in our cipher system is used as the random seed.
random.seed(key)
# Build a list of letters and common punctuation.
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ,.;?!:")
# Copy the list...
shuffled = symbols[:]
# ... and shuffle it.
random.shuffle(shuffled)
# Now build the forward/backward substitution maps.
for i in range(len(symbols)):
# For each symbol, map it to the corresponding symbol in shuffled...
forward[symbols[i]] = shuffled[i]
# ... and back.
backward[shuffled[i]] = symbols[i]
# Enciphers the given plaintext message using the forward substitution map.
def encipher(message):
return "".join([forward[x] for x in message])
# Deciphers the given ciphertext message using the backward substitution map.
def decipher(message):
return "".join([backward[x] for x in message])
Now I'll send you a message by using the cipher module at the interactive command line:
>>> import cipher
>>> cipher.build(2012)
>>> msg = cipher.encipher("Is the secret, that programming is fun, out?")
>>> msg
'daARpYAaYqNYRlARpBRASNK NBkk?s A?aAr:slAK:RX'
If I were to email the message to you, I might write
2012
daARpYAaYqNYRlARpBRASNK NBkk?s A?aAr:slAK:RX
Then you would decipher it (with Portable Python 3.2.1.1 on Windows 7) as follows:
>>> import cipher
>>> cipher.build(2012)
>>> cipher.decipher("daARpYAaYqNYRlARpBRASNK NBkk?s A?aAr:slAK:RX")
'Is the secret, that programming is fun, out?'
Of course, other people who have this module will also know to use 2012 as the key and thus be able to decipher the message. Therefore, a safer method is to email only the ciphertext and use a previously agreedupon key on both ends of the communication. Just be sure that nobody finds out what that key is!
We have now worked through all the concepts that we need to build the cipher module, so create a new Python module and save it as "cipher.py." Let's put the pieces together.
First we need to import the random module:
import random
Let's create our forward and backward maps:
forward = {}
backward = {}
Now let's define a function to build our substitution (and desubstitution) maps. A line of code that begins with a # symbol is a "comment," that is, a human (but not computer) readable message to someone reading the program. It has absolutely no effect on the program itself; rather, it is intended to explain to the reader what the program does. Notice how much easier it is for you to understand the following function with the comments than if the comments were missing:
# Builds the forward/backward substitution maps based on the given key.
def build(key):
# The key in our cipher system is used as the random seed.
random.seed(key)
# Build a list of letters and common punctuation.
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ,.;?!:")
# Copy the list...
shuffled = symbols[:]
# ... and shuffle it.
random.shuffle(shuffled)
# Now build the forward/backward substitution maps.
for i in range(len(symbols)):
# For each symbol, map it to the corresponding symbol in shuffled...
forward[symbols[i]] = shuffled[i]
# ... and back.
backward[shuffled[i]] = symbols[i]
The version without comments does exactly the same thing, but it's harder to read:
def build(key):
random.seed(key)
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ,.;?!:")
shuffled = symbols[:]
random.shuffle(shuffled)
for i in range(len(symbols)):
forward[symbols[i]] = shuffled[i]
backward[shuffled[i]] = symbols[i]
Although programs are written to be run on computers, they must be readable by people, as large programs are written by teams, not by individuals.
Now we can implement the encipher and decipher functions:
# Enciphers the given plaintext message using the forward substitution map.
def encipher(message):
return "".join([forward[x] for x in message])
# Deciphers the given ciphertext message using the backward substitution map.
def decipher(message):
return "".join([backward[x] for x in message])
The full cipher module, saved as "cipher.py," is thus:
import random
forward = {}
backward = {}
# Builds the forward/backward substitution maps based on the given key.
def build(key):
# The key in our cipher system is used as the random seed.
random.seed(key)
# Build a list of letters and common punctuation.
symbols = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ,.;?!:")
# Copy the list...
shuffled = symbols[:]
# ... and shuffle it.
random.shuffle(shuffled)
# Now build the forward/backward substitution maps.
for i in range(len(symbols)):
# For each symbol, map it to the corresponding symbol in shuffled...
forward[symbols[i]] = shuffled[i]
# ... and back.
backward[shuffled[i]] = symbols[i]
# Enciphers the given plaintext message using the forward substitution map.
def encipher(message):
return "".join([forward[x] for x in message])
# Deciphers the given ciphertext message using the backward substitution map.
def decipher(message):
return "".join([backward[x] for x in message])
Now I'll send you a message by using the cipher module at the interactive command line:
>>> import cipher
>>> cipher.build(2012)
>>> msg = cipher.encipher("Is the secret, that programming is fun, out?")
>>> msg
'daARpYAaYqNYRlARpBRASNK NBkk?s A?aAr:slAK:RX'
If I were to email the message to you, I might write
2012
daARpYAaYqNYRlARpBRASNK NBkk?s A?aAr:slAK:RX
Then you would decipher it (with Portable Python 3.2.1.1 on Windows 7) as follows:
>>> import cipher
>>> cipher.build(2012)
>>> cipher.decipher("daARpYAaYqNYRlARpBRASNK NBkk?s A?aAr:slAK:RX")
'Is the secret, that programming is fun, out?'
Of course, other people who have this module will also know to use 2012 as the key and thus be able to decipher the message. Therefore, a safer method is to email only the ciphertext and use a previously agreedupon key on both ends of the communication. Just be sure that nobody finds out what that key is!
Extensions
 Read about the Caesar cipher. Read about ASCII (google "ASCII") and Python's chr and ord functions. Play with the % operator, which returns the remainder of a division (e.g., 11 % 3 = 2). Then implement the Caesar cipher by shifting letters using their character codes.
 Explore and implement another cipher.
 Invent your own cipher and implement it.
 Singlesymbol substitution ciphers are easy to break with frequency analysis. Create a module that substitutes two or three (or more) symbols at a time, so that, for example, the 'a' and 'c' of 'ack' and 'ace' become different symbols based on their contexts.
 Write a program that does a frequency analysis of a given ciphertext and use it to break someone's cipher.