In a previous post we discussed the foundations of cryptography in terms of different types of uncertainty stemming from limited information or computation. We saw an example using the caesar cipher that demonstrated how an adversary may have all the information necessary to reveal a secret if the relationship between the message space, key space and length of the ciphertext satisfies certain conditions:

Note how we were able to pick out the correct message because none of the other attempts gave meaningful results. This happens because the space of possible keys is so small that only one of them decrypts to a possible message. In technical terms, the key space and message space[2] are small enough compared to the length of the message that only one key will decrypt.

The situation can be coarsely be classified into three cases:

  1. H(K | C) = H(K) – Perfect secrecy
  2. H(K | C) < H(K) – Information-theoretic security
  3. H(K | C) = 0 – Computational security

Information-theoretic security refers to the epistemic uncertainty of the previous entry. We can use a simple model to illustrate how the conditions governing information-theoretic privacy work and when these conditions fail reducing to computational security.

Elements of our model

The XOR function

Our toy model needs to specify how messages are encrypted from some plaintext language to ciphertexts, using a key that determines the exact transformation and can be used by the intended recipient to recover the secret. We can use the simplest language possible, a two character alphabet whose messages are equivalent to binary sequences. Our encryption function will be the XOR function, which takes two binary sequences and produces a third one as its output. This choice fixes our key space to also be binary sequences. Here’s an example encryption:

1010011000 XOR 1100111010 = 110100010

The XOR function takes as input a message (1010011000), a key (1100111010) and produces a ciphertext 110100010. Of course, the message in this case means nothing to us, but there is no difference as to the process, we can simply imagine that the 1010011000 above is some meaningful content like “WELL DONE YOU HAVE FOUND THE SECRET”. The important thing to note is that, just like in english, there is a subset of the combinations in the plaintext space (binary sequences) that make meaningful messages whereas the rest do not. This brings us to the notion of language entropy, which precisely quantifies how big the meaningful subset of messages is relative to the entire plaintext space.


The higher the language entropy, the bigger the blue region will be compared to the entire plaintext space. In the case of a binary language the entropy ranges from 0 to 1, since this quantity is measured in bits per character. So far our toy model has these elements:

  • Plaintext space: ∈ {0, 1}n
  • Message space: M ⊂ P
  • Key space: K ⊂ {0, 1}n
  • Ciphertext space: C ∈ {0, 1}n
  • Encryption function: XOR: P x K → C
  • Language entropy: HL ∈ {0.0-1.0}

The security properties of our system depends on three parameters related to the above:

  • n: the number of characters in the plaintext
  • |K|: the size of the key space
  • HL: the language entropy
  • RL = 1 – HL: the language redundancy

The last parameter, redundancy, is just a rewriting of the language entropy. The equation that describes the security in terms of these parameters is:


This equation gives a lower bound for the expected number of spurious keys, represented by the term sn. A spurious key, for a given ciphertext, is a key that decrypts said ciphertext into a message that does not correspond to the message which was encrypted with the real key. In the example encryption at the beginning of the post we saw that when trying all the keys to decrypt the ciphertext only one of them yielded a meaningful plaintext: the ciphertext had no spurious keys. If, on the other hand, one of the keys, s, had decrypted the ciphertext into something like


then that key s would be a spurious key. An attacker trying all keys would get two possible messages when trying all possible keys backwards and would not know which of them was the real one. The secret would remain somewhat protected. The existence and number of  expected spurious keys determines which of the three coarse categories above a cryptosystem belongs to. Looking at the spurious key equation we can see the following trends

  • sn increases with the size of the key space, |K|
  • sn decreases with the size of the plaintext, n
  • sn decreases with language redundancy, RL

A visual representation


Encryption representation with n = 2, H = 0.8, M = 3, K = 1

The left part of the image is visual representation of our toy model for parameter values, the left axis is the plaintext space, the right axis is the ciphertext space. A point on the plot represents an encryption, in other words, a mapping from the plaintext space to the ciphertext space. We have used n = 2, giving a plaintext space of size 4. Of these four, three are meaningful messages (for a language entropy of 0.8). The 3 red dots on the plot therefore correspond to the three encryptions of these 3 meaningful messages. Because K=1, there is only one ciphertext per plaintext, or visually, only one point on any given horizontal line. Compare with:


n = 2, H = 0.8, M = 3, K = 3

Here we can see 9 points, corresponding to 3 messages x 3 keys. A horizontal line thus represents all the encryptions for a given message under different keys. What about the color difference? Here’s another set of parameters:


n = 6, H = 0.8, M = 28, K = 3

With a higher plaintext length the number of meaningful messages rises to 28, resulting in a total of 28 x 3 = 84 encryptions, which show up as red and blue points in this case. Can you spot the pattern that explains this?

It’s not easy to see, but the answer lies in understanding what vertical lines mean in the representation. Points lying on the same horizontal line are different encryptions for the same message. Points lying on the same vertical line are different messages for the same encryption. As we saw before, this is exactly the situation where it is not possible for an adversary to determine the secret by trying all keys in reverse, as there is no way to tell which of the resulting messages is the original.

Blue points are ciphertexts for which there is more than one key that decrypts to a meaningful message, or equivalently, blue points are ciphertexts with one or more spurious keys.

sn > 0 ⇒ blue point

sn = 0 ⇒ red point

Now we can go see the visual equivalent the properties of information-theoretic security we mentioned before.

  • sn increases with the size of the key space, |K|

Fixed n, H, increasing values of K

Visually: the proportion of blue dots increases.

  • sn decreases with the size of the plaintext, n

Fixed K, H, increasing values of n

Visually: the proportion of red dots increases.

  • sn decreases with language redundancy, RL

Fixed n, K, decreasing values of H (increasing values of R)

Visually: the proportion of red dots increases.

Visualizing the categories

Besides these trends, we also spoke about three broad categories into which cryptosystems fit:

  • H(K | C) = H(K) – Perfect secrecy

n = 8, H = 0.55, K = 256

Visually: the number of blue dots per column is equal to the number of horizontal lines. This means that the adversary gains no information from the ciphertext. Note also that 2^8 = 256, which is the value of K.

  • H(K | C) < H(K) – Information-theoretic security

n = 9, H = 0.55, K = 106

Visually: there are only blue dots. Every ciphertext is partially protected in the sense that the adversary does not have enough information to reveal the secret unequivocally.

  • H(K | C) = 0 – Computational security

n = 13, H = 0.53, K = 29

Visually: there are red dots, these ciphertexts have no information-theoretic protection and depend on computational security for their confidentiality.

Try it yourself

In this post we have described some of the key ideas presented in Claude Shannon’s seminal Communication Theory of Secrecy Systems published in 1949. We have also constructed a toy model that allowed us to visualize the security properties of cryptosystems and how they vary with its main parameters. You can try playing around with the model here. If you are a teacher and find it helpful please let us know!