Programming Lewis Carroll's *Memoria Technica*
Charles Dodgson (pen name Lewis Carroll) had difficulty remembering numbers, such as dates. He developed a cipher to help him remember numbers by embedding them in couplets or phrases. For example, the couplet “Brass trumpet and brazen bassoon, will speedily mark you a tune” encodes the specific gravity of brass (8.39) in the last four consonants: r k t n
(y is treated as a vowel). In this article, we describe the cipher, present online tools for encoding and decoding, discuss how we implemented the algorithms in TypeScript, and the cipher’s relevancy to steganography.
Memoria Technica
The cipher encodes plaintext’s of sequences of digits. Using the cipher, consonants (except ‘y’) are mapped to digits. Vowels, punctuation, and ‘y’ are ignored and thus can be freely added to the ciphertext The mapping follows the figure:
In a three-page pamphlet published by Carroll in 1878, he describes how the digits and their associated letters came to be:
- "b" and "c", the first two consonants in the Alphabet.
- "d" from "duo"; "w" from "two".
- "t" from "tres"; the other may wait awhile.
- "f" from four; "q" from "quatuor".
- "l" and "v", because "L" and "V" are the Roman symbols for "fifty" and "five".
- "s" and "x", from "six".
- "p" and "m", from "septem".
- "h" from "huit"; and "k" from the Greek "okto".
- "n" from "nine"; and "g" because it is so like "9".
- "z" and "r", from "zero".
The letter “j” is associated with 3 because it is left-over.
Decoding ciphertext involves extracting the consonants from a message and finding the mapped integer. Vowels, y, and punctuation are ignored.
Encoding an integer involves choosing one of two consonants for each digit and inserting vowels, y, and punctuation to create valid words.
Decoding may also involve “meta” information. In the pamphlet, Carroll provides example couplets for encoding the specific gravities of various metals. The decoder will need to know that only the last four consonants are relevant.
Example
Given the ciphertext found
, we can first filter to the consonants: fnd
. f is 4, n is 9, and d is 2, so the plaintext is 492
. (What was found in 492? Christopher Columbus discovered America in 1492 and Dodgson believed the leading 1 could be assumed.)
Given the plaintext 492
, we are interested in words that include the consonants f or q, n or g, and d or w, in that order, with vowels, ‘y’, or punctuation in-between. Using the second edition Scrabble dictionary, we find 15 single words that match these constraints:
fend | fiend | find | fined | foined |
fond | fondu | fondue | found | fugued |
fund | fundi | queened | quinoid | quoined |
Using our encoding tool, we found 74,151 possibilities from fend
to quey yuga yowie
.
Decoding and Encoding Tools
Note 1: These computations run on your computer.
Note 2: The JavaScript code for
Encode
uses import attributes which, as of December 2024, are not supported on Firefox, but are expected to be available soon.
Note 3: Negative limits are treated as no limits (i.e. Infinity). Negative offsets are treated as no offset (i.e. zero).
Implementation
Mapping Consonants to Digits
Our implementation of the mapping is a fairly straight-forward switch case with fall-through:
|
|
The original version (as suggested by the commented code) compared strings against strings (JavaScript lacks a character type). Profiling revealed that more than half the time was being spent in StringEqual calls. By switching the comparisons from string equality to integer equality, the new code takes approximately 15% of the time than the older version or a speed-up greater than 6x.
Decoding Ciphertext
The decoding process is essentially a functional collect
: map across all elements and keep those that are non-null.
|
|
Alternatively, TextEncoder
could be used to build a Uint8Array
of byte values and consonant_value
could be given a byte value as an argument. However, since a character in UTF-8 may span multiple bytes, we would still need to capture the complexity of Unicode encoding somewhere.
Encoding a (Specific) Plaintext Value
Using (Moby 2002), we previously developed a dictionary by mapping each word into a cipher value and then grouping words by a shared cipher value. Thus, generating a list of words that match a given plaintext digit string is a look-up:
|
|
Using a JavaScript Map
for the dictionary rather than a JavaScript Object
is a bit of a personal quirk. JavaScript’s Map
s are restricted to string keys, which in this case is not a restriction for our domain. We appreciate Map
’s clear separation between keys and properties and likely improved performance given its less flexibility compared to Object
s.
The exclamation point after the get(n)
call is a TypeScript type assertion informing the compiler that the return value is neither null nor undefined. The compiler is (not yet) smart enough to recognize that since has
returned true, get
cannot return undefined. At least, unless the dictionary is modified by another process.
Encoding a (General) Plaintext Value
While encode
returns single word ciphertexts that fully match a given plaintext, we will often need groups of partial matches for a given plaintext. encode_all
is a generator function that will yield one ciphertext at a time, where each ciphertext contains one or more words. As described in the ‘English Language Coverage/Branching Factor’ section below, most plaintexts will yield large numbers of potential ciphertexts (e.g. thousands for three digits). Since the consumer of encode_all
is unlikely to need all the outputs at the same time, we use a generator or iterable design to avoid loading all the outputs into memory at once.
(If encode_all
is a generator, why not encode
? In encode
’s case, the dictionary provides all outputs at once so we do not save any resources by yielding only a single entry from that dictionary at a time.)
|
|
Line 9 separates a string of digits into an array of digit characters. This function assumes digits will only contain the characters ‘0’ to ‘9’.
This function needs to check all ordered sub-sequences of the sequence, digits
. For example, if digits is ['1', '2', '3']
, then we will attempt to encode
: ['1', '2', '3'], ['1'] + ['2', '3], ['1', '2'] + ['3'],
and ['1'] + ['2'] + ['3']
. To do this, we incrementally add zero or more divisors that separate the array. A divisor splits an array from a given index (inclusive) to the right. Each split is referred to as a ‘slice’.
In lines 21 through 25, we encode
each slice (the slice.join('')
transforms an array of digits into a string of digits) and collect all possible words under an array of arrays termed phrases
.
If all positions within a phrase contain at least one word (lines 27 through 31), then we perform a cartesian join between each index. A cartesian join or product is similar to a nested for-loop and generates all possible tuples. The yield
gives the caller a single value (a space-separated list of words) while retaining state so the next value can be retrieved without re-computing earlier elements.
The code for subsets
, sliceIndices
, and cartesian
(among other functions) is in util.ts
in this project’s gitlab repository. Knuth’s The Art of Programming, Volume 4A is a useful resource for these kinds of algorithms.
Suitability for Steganography
Steganography is the art of hiding information in plain sight. While cryptography obscures the message so the attackers have difficulty reading it, attackers will know that the message is not meant to be read. In contrast, steganography may trick the attackers into believing the message is benign or not noticing the message at all. For example, invisible ink and microdots are forms of steganography. Similarly, a message may be hidden by distorting specific pixels of an image. Steganography and cryptography may be combined with the message first encrypted and then hidden.
While Dodgson did not present this technique as a cipher nor as a tool for steganography, but only as an aid to memory, we believe it has potential utility.
Let’s try to hide phi, 1.61803, as a classified ad. The ad will start with ‘For sale: ' and the item for sale will be an encoding of the number. We will assume the reader can place the decimal point appropriately (or we transmit that data via some other way). Using the encode tool above, 161803
yields some promising outputs within the first few hundred cases:
- “box chart”
- “boys chariot”
- “cosy chariot”
Would an attacker suspect ads like these?
- For sale: Box Chart 2016. Software library for R. $1 + postage.
- For sale: Boy’s Chariot. Clean, blue, ages 3-6. $100 OBO.
- For sale: Cosy Chariot. Padded seats, yellow. One seater.
We chose these three examples after scanning about 3,000 outputs. The dictionary used by the tool contains almost 118,000 words (including plural and forms of tenses).
English Language Coverage/Branching Factor
The branching factor (number of words in the dictionary that match a given sequence) is numerous for single and double digit sequences. (Note that leading zeros are allowed.) The first table below represents the number of words that perfectly match the single digit sequence while the second table represents two digit sequences.
Digit | Count |
---|---|
0 | 63 |
1 | 43 |
2 | 53 |
3 | 45 |
4 | 21 |
5 | 54 |
6 | 46 |
7 | 57 |
8 | 51 |
9 | 64 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
0x | 31 | 36 | 59 | 44 | 12 | 61 | 82 | 43 | 23 | 58 |
1x | 65 | 41 | 53 | 61 | 9 | 69 | 71 | 43 | 46 | 61 |
2x | 61 | 31 | 54 | 37 | 9 | 70 | 65 | 43 | 27 | 59 |
3x | 44 | 25 | 37 | 29 | 6 | 38 | 48 | 45 | 30 | 52 |
4x | 53 | 10 | 21 | 27 | 8 | 35 | 26 | 11 | 7 | 33 |
5x | 50 | 34 | 61 | 41 | 12 | 65 | 75 | 44 | 24 | 95 |
6x | 37 | 20 | 27 | 37 | 6 | 46 | 42 | 40 | 29 | 37 |
7x | 78 | 41 | 64 | 49 | 9 | 71 | 94 | 67 | 36 | 107 |
8x | 34 | 13 | 41 | 23 | 8 | 41 | 50 | 38 | 28 | 37 |
9x | 56 | 35 | 55 | 48 | 8 | 68 | 83 | 56 | 20 | 84 |
There are 78 three-digit sequences where do we not see any single word in the dictionary (although these three-digits could be separated into one and two-digit subsequences): 008, 014, 048, 084, 088, 128, 147, 148, 204, 224, 234, 248, 264, 294, 314, 324, 334, 342, 347, 348, 349, 354, 364, 394, 404, 411, 414, 421, 423, 424, 427, 428, 434, 437, 447, 464, 467, 471, 474, 481, 483, 484, 487, 494, 534, 541, 547, 564, 574, 581, 584, 624, 634, 644, 661, 747, 748, 774, 784, 814, 817, 821, 827, 834, 837, 841, 845, 847, 848, 849, 864, 874, 881, 914, 948, 974, 981, 984.
If the specific words do not matter, then any given plaintext can be encoded into a reasonable selection of words. If specific words do matter (i.e. the ciphertext is being hidden), then this branching factor may not be sufficient to cover all contexts. For instance, it may not be possible to consistently hide messages within recipes since the number of ingredient-related words is restrictive.
Weakness? Benford’s Law versus Letter Frequencies
A potential weakness of this approach as a steganographic technique is that the frequency of digits does not match the frequency of consonants. For instance, the digit 4 maps to ‘f’ and ‘q’. In studies of Benford’s Law, which measures the frequency of digits in real-world contexts, a 4 is predicted as the leading digit in 9.7% of all numbers (Berger 2015). However, ‘f’ is seen as the leading letter in 2% of words and ‘q’ is very rare. Thus, a study of letter frequencies in a passage could detect an anomaly based on the imbalance. However, unless the volume of messages is high or the plaintext is very long, we suspect this would fall into the noise.
References
(Abeles 2005) Abeles, Francine F. 2005. Lewis Carroll’s ciphers: The literary connections. Advances in Applied Mathematics, Volume 34, Issue 4. Pages 697-708. ISSN 0196-8858, https://doi.org/10.1016/j.aam.2004.06.006.
(Berger 2015) Berger, Arno, and Theodore Preston Hill. 2015. An Introduction to Benford’s Law. Princeton, New Jersey: Princeton University Press. First Chapter
(Gardner 1996) Gardner, Martin, and Lewis Carroll. 1996. The Universe in a Handkerchief : Lewis Carroll’s Mathematical Recreations, Games, Puzzles, and Word Plays. New York: Copernicus. Pages 31-36.
(Moby 2002) Ward, Grady. 2002. Moby Word II. https://www.gutenberg.org/ebooks/3201
Gitlab Repository. https://gitlab.com/jeffrey_starr/memoria_technica