Programming Lewis Carroll's Memoria Technica

2024-12-26

Charles Dodgson (pen name Lewis Carroll) had difficulty remembering numbers, such as dates. He developed a cipher to help him remember numbers by embedding them in couplets or phrases. For example, the couplet “Brass trumpet and brazen bassoon, will speedily mark you a tune” encodes the specific gravity of brass (8.39) in the last four consonants: r k t n (y is treated as a vowel). In this article, we describe the cipher, present online tools for encoding and decoding, discuss how we implemented the algorithms in TypeScript, and the cipher’s relevancy to steganography.

Memoria Technica

The cipher encodes plaintext’s of sequences of digits. Using the cipher, consonants (except ‘y’) are mapped to digits. Vowels, punctuation, and ‘y’ are ignored and thus can be freely added to the ciphertext The mapping follows the figure:

In a three-page pamphlet published by Carroll in 1878, he describes how the digits and their associated letters came to be:

"b" and "c", the first two consonants in the Alphabet.
"d" from "duo"; "w" from "two".
"t" from "tres"; the other may wait awhile.
"f" from four; "q" from "quatuor".
"l" and "v", because "L" and "V" are the Roman symbols for "fifty" and "five".
"s" and "x", from "six".
"p" and "m", from "septem".
"h" from "huit"; and "k" from the Greek "okto".
"n" from "nine"; and "g" because it is so like "9".
"z" and "r", from "zero".

The letter “j” is associated with 3 because it is left-over.

Decoding ciphertext involves extracting the consonants from a message and finding the mapped integer. Vowels, y, and punctuation are ignored.

Encoding an integer involves choosing one of two consonants for each digit and inserting vowels, y, and punctuation to create valid words.

Decoding may also involve “meta” information. In the pamphlet, Carroll provides example couplets for encoding the specific gravities of various metals. The decoder will need to know that only the last four consonants are relevant.

Example

Given the ciphertext found, we can first filter to the consonants: fnd. f is 4, n is 9, and d is 2, so the plaintext is 492. (What was found in 492? Christopher Columbus discovered America in 1492 and Dodgson believed the leading 1 could be assumed.)

Given the plaintext 492, we are interested in words that include the consonants f or q, n or g, and d or w, in that order, with vowels, ‘y’, or punctuation in-between. Using the second edition Scrabble dictionary, we find 15 single words that match these constraints:

fend	fiend	find	fined	foined
fond	fondu	fondue	found	fugued
fund	fundi	queened	quinoid	quoined

Using our encoding tool, we found 74,151 possibilities from fend to quey yuga yowie.

Decoding and Encoding Tools

Note 1: These computations run on your computer.

Note 2: The JavaScript code for Encode uses import attributes which, as of December 2024, are not supported on Firefox, but are expected to be available soon.

Note 3: Negative limits are treated as no limits (i.e. Infinity). Negative offsets are treated as no offset (i.e. zero).

Implementation

Mapping Consonants to Digits

Our implementation of the mapping is a fairly straight-forward switch case with fall-through:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


/**
 * For a given letter, return the cipher digit based on Carroll's Memoria Technica scheme, or null if there
 * is no associated digit. This function will return null for vowels (including y), punctuation marks, and
 * any other non-consonants. Only single letter English consonants have values; case is ignored.
 *
 * @param chr a single character
 */
export function consonant_value(chr: string): number | null {
    const code16: number | undefined = chr.toLowerCase().codePointAt(0);
    switch (code16) {
        case 98 /*'b'*/:
        case 99 /*'c'*/: return 1;
        case 100 /*'d'*/:
        case 119 /*'w'*/: return 2;
        case 116 /*'t'*/:
        case 106 /*'j'*/: return 3;
        case 102 /*'f'*/:
        case 113 /*'q'*/: return 4;
        case 108 /*'l'*/:
        case 118 /*'v'*/: return 5;
        case 115 /*'s'*/:
        case 120 /*'x'*/: return 6;
        case 112 /*'p'*/:
        case 109 /*'m'*/: return 7;
        case 104 /*'h'*/:
        case 107 /*'k'*/: return 8;
        case 110 /*'n'*/:
        case 103 /*'g'*/: return 9;
        case 122 /*'z'*/:
        case 114 /*'r'*/: return 0;
        default: return null;
    }
}

The original version (as suggested by the commented code) compared strings against strings (JavaScript lacks a character type). Profiling revealed that more than half the time was being spent in StringEqual calls. By switching the comparisons from string equality to integer equality, the new code takes approximately 15% of the time than the older version or a speed-up greater than 6x.

Decoding Ciphertext

The decoding process is essentially a functional collect: map across all elements and keep those that are non-null.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


/**
 * Return an array of numbers based on decoding s. If there are no (valid) consonants within s, returns
 * an empty array.
 *
 * @param s word or phrase
 */
export function decode(s: string): number[] {
    let result: number[] = [];

    for(const chr of s) {
        const v: number | null = consonant_value(chr);
        if(v !== null) {
            result.push(v);
        }
    }

    return result;
}

Alternatively, TextEncoder could be used to build a Uint8Array of byte values and consonant_value could be given a byte value as an argument. However, since a character in UTF-8 may span multiple bytes, we would still need to capture the complexity of Unicode encoding somewhere.

Encoding a (Specific) Plaintext Value

Using (Moby 2002), we previously developed a dictionary by mapping each word into a cipher value and then grouping words by a shared cipher value. Thus, generating a list of words that match a given plaintext digit string is a look-up:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


/**
 * Return from the dictionary all words that have the plaintext value of n.
 *
 * @param n plaintext value
 * @param dictionary plaintext value to array of words
 */
export function encode(n: string, dictionary: Map<string, string[]>): string[] {
    if(dictionary.has(n)) {
        return dictionary.get(n)!;
    } else {
        return [];
    }
}

Using a JavaScript Map for the dictionary rather than a JavaScript Object is a bit of a personal quirk. JavaScript’s Maps are restricted to string keys, which in this case is not a restriction for our domain. We appreciate Map’s clear separation between keys and properties and likely improved performance given its less flexibility compared to Objects.

The exclamation point after the get(n) call is a TypeScript type assertion informing the compiler that the return value is neither null nor undefined. The compiler is (not yet) smart enough to recognize that since has returned true, get cannot return undefined. At least, unless the dictionary is modified by another process.

Encoding a (General) Plaintext Value

While encode returns single word ciphertexts that fully match a given plaintext, we will often need groups of partial matches for a given plaintext. encode_all is a generator function that will yield one ciphertext at a time, where each ciphertext contains one or more words. As described in the ‘English Language Coverage/Branching Factor’ section below, most plaintexts will yield large numbers of potential ciphertexts (e.g. thousands for three digits). Since the consumer of encode_all is unlikely to need all the outputs at the same time, we use a generator or iterable design to avoid loading all the outputs into memory at once.

(If encode_all is a generator, why not encode? In encode’s case, the dictionary provides all outputs at once so we do not save any resources by yielding only a single entry from that dictionary at a time.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


/**
 * Return all phrases (space delimited words from the dictionary) that encode the sequence of
 * digits in n.
 *
 * @param n sequence of digits (plaintext)
 * @param dictionary plaintext value to array of words
 */
export function* encode_all(n: string, dictionary: Map<string, string[]>): Generator<string> {
    const digits = n.split('');
    const nonEmpty = (seq: any[]) => seq.length > 0;

    let divisors: number[] = [];
    for(let i = 1; i < digits.length; i++) {
        divisors.push(i);
    }

    for(let divisor of subsets(divisors)) {
        divisor.unshift(0);  // include all digits by starting the slice at index 0
        let slices: string[][] = sliceIndices(digits, divisor);

        let phrases: string[][] = [];
        for(let slice of slices) {
            let words = encode(slice.join(''), dictionary)
            phrases.push(words);
        }

        if(phrases.every(nonEmpty)) {
            for(let wrappedPhrase of cartesian(phrases)) {
                yield wrappedPhrase.underlying.join(' ');
            }
        }
    }
}

Line 9 separates a string of digits into an array of digit characters. This function assumes digits will only contain the characters ‘0’ to ‘9’.

This function needs to check all ordered sub-sequences of the sequence, digits. For example, if digits is ['1', '2', '3'], then we will attempt to encode: ['1', '2', '3'], ['1'] + ['2', '3], ['1', '2'] + ['3'], and ['1'] + ['2'] + ['3']. To do this, we incrementally add zero or more divisors that separate the array. A divisor splits an array from a given index (inclusive) to the right. Each split is referred to as a ‘slice’.

In lines 21 through 25, we encode each slice (the slice.join('') transforms an array of digits into a string of digits) and collect all possible words under an array of arrays termed phrases.

If all positions within a phrase contain at least one word (lines 27 through 31), then we perform a cartesian join between each index. A cartesian join or product is similar to a nested for-loop and generates all possible tuples. The yield gives the caller a single value (a space-separated list of words) while retaining state so the next value can be retrieved without re-computing earlier elements.

The code for subsets, sliceIndices, and cartesian (among other functions) is in util.ts in this project’s gitlab repository. Knuth’s The Art of Programming, Volume 4A is a useful resource for these kinds of algorithms.

Suitability for Steganography

Steganography is the art of hiding information in plain sight. While cryptography obscures the message so the attackers have difficulty reading it, attackers will know that the message is not meant to be read. In contrast, steganography may trick the attackers into believing the message is benign or not noticing the message at all. For example, invisible ink and microdots are forms of steganography. Similarly, a message may be hidden by distorting specific pixels of an image. Steganography and cryptography may be combined with the message first encrypted and then hidden.

While Dodgson did not present this technique as a cipher nor as a tool for steganography, but only as an aid to memory, we believe it has potential utility.

Let’s try to hide phi, 1.61803, as a classified ad. The ad will start with ‘For sale: ' and the item for sale will be an encoding of the number. We will assume the reader can place the decimal point appropriately (or we transmit that data via some other way). Using the encode tool above, 161803 yields some promising outputs within the first few hundred cases:

“box chart”
“boys chariot”
“cosy chariot”

Would an attacker suspect ads like these?

For sale: Box Chart 2016. Software library for R. $1 + postage.
For sale: Boy’s Chariot. Clean, blue, ages 3-6. $100 OBO.
For sale: Cosy Chariot. Padded seats, yellow. One seater.

We chose these three examples after scanning about 3,000 outputs. The dictionary used by the tool contains almost 118,000 words (including plural and forms of tenses).

English Language Coverage/Branching Factor

The branching factor (number of words in the dictionary that match a given sequence) is numerous for single and double digit sequences. (Note that leading zeros are allowed.) The first table below represents the number of words that perfectly match the single digit sequence while the second table represents two digit sequences.

Digit	Count
0	63
1	43
2	53
3	45
4	21
5	54
6	46
7	57
8	51
9	64

	0	1	2	3	4	5	6	7	8	9
0x	31	36	59	44	12	61	82	43	23	58
1x	65	41	53	61	9	69	71	43	46	61
2x	61	31	54	37	9	70	65	43	27	59
3x	44	25	37	29	6	38	48	45	30	52
4x	53	10	21	27	8	35	26	11	7	33
5x	50	34	61	41	12	65	75	44	24	95
6x	37	20	27	37	6	46	42	40	29	37
7x	78	41	64	49	9	71	94	67	36	107
8x	34	13	41	23	8	41	50	38	28	37
9x	56	35	55	48	8	68	83	56	20	84

There are 78 three-digit sequences where do we not see any single word in the dictionary (although these three-digits could be separated into one and two-digit subsequences): 008, 014, 048, 084, 088, 128, 147, 148, 204, 224, 234, 248, 264, 294, 314, 324, 334, 342, 347, 348, 349, 354, 364, 394, 404, 411, 414, 421, 423, 424, 427, 428, 434, 437, 447, 464, 467, 471, 474, 481, 483, 484, 487, 494, 534, 541, 547, 564, 574, 581, 584, 624, 634, 644, 661, 747, 748, 774, 784, 814, 817, 821, 827, 834, 837, 841, 845, 847, 848, 849, 864, 874, 881, 914, 948, 974, 981, 984.

If the specific words do not matter, then any given plaintext can be encoded into a reasonable selection of words. If specific words do matter (i.e. the ciphertext is being hidden), then this branching factor may not be sufficient to cover all contexts. For instance, it may not be possible to consistently hide messages within recipes since the number of ingredient-related words is restrictive.

Weakness? Benford’s Law versus Letter Frequencies

A potential weakness of this approach as a steganographic technique is that the frequency of digits does not match the frequency of consonants. For instance, the digit 4 maps to ‘f’ and ‘q’. In studies of Benford’s Law, which measures the frequency of digits in real-world contexts, a 4 is predicted as the leading digit in 9.7% of all numbers (Berger 2015). However, ‘f’ is seen as the leading letter in 2% of words and ‘q’ is very rare. Thus, a study of letter frequencies in a passage could detect an anomaly based on the imbalance. However, unless the volume of messages is high or the plaintext is very long, we suspect this would fall into the noise.

References

(Abeles 2005) Abeles, Francine F. 2005. Lewis Carroll’s ciphers: The literary connections. Advances in Applied Mathematics, Volume 34, Issue 4. Pages 697-708. ISSN 0196-8858, https://doi.org/10.1016/j.aam.2004.06.006.

(Berger 2015) Berger, Arno, and Theodore Preston Hill. 2015. An Introduction to Benford’s Law. Princeton, New Jersey: Princeton University Press. First Chapter

(Gardner 1996) Gardner, Martin, and Lewis Carroll. 1996. The Universe in a Handkerchief : Lewis Carroll’s Mathematical Recreations, Games, Puzzles, and Word Plays. New York: Copernicus. Pages 31-36.

(Moby 2002) Ward, Grady. 2002. Moby Word II. https://www.gutenberg.org/ebooks/3201

Gitlab Repository. https://gitlab.com/jeffrey_starr/memoria_technica

	0	1	2	3	4	5	6	7	8	9
0x	31	36	59	44	12	61	82	43	23	58
1x	65	41	53	61	9	69	71	43	46	61
2x	61	31	54	37	9	70	65	43	27	59
3x	44	25	37	29	6	38	48	45	30	52
4x	53	10	21	27	8	35	26	11	7	33
5x	50	34	61	41	12	65	75	44	24	95
6x	37	20	27	37	6	46	42	40	29	37
7x	78	41	64	49	9	71	94	67	36	107
8x	34	13	41	23	8	41	50	38	28	37
9x	56	35	55	48	8	68	83	56	20	84

	0	1	2	3	4	5	6	7	8	9
0x	31	36	59	44	12	61	82	43	23	58
1x	65	41	53	61	9	69	71	43	46	61
2x	61	31	54	37	9	70	65	43	27	59
3x	44	25	37	29	6	38	48	45	30	52
4x	53	10	21	27	8	35	26	11	7	33
5x	50	34	61	41	12	65	75	44	24	95
6x	37	20	27	37	6	46	42	40	29	37
7x	78	41	64	49	9	71	94	67	36	107
8x	34	13	41	23	8	41	50	38	28	37
9x	56	35	55	48	8	68	83	56	20	84

Programming Lewis Carroll's *Memoria Technica*