Method of Differences Add'l Applications
In my post on the Method of Differences, I focused on using the technique to generate mathematical tables. As some readers noted, there are additional and interesting modern applications of the technique. This post will discuss two of them: sequences to functions, a method of analyzing sequences, and strength reduction, a compiler optimization technique.
Sequences to Functions
Rather than having a function that you want to evaluate at many positions, let’s say you have a sequence of values and you want to know how they are generated. For instance, say you are given the sequence 1, 5, 12, 22, 35, 51, 70, 92, … If you cannot just look up the sequence in the Online Encyclopedia of Integer Sequences, then the Encyclopedia of Integer Sequences recommends the use of “Analysis of Differences” as the best method to analyze a sequence “by hand” (section 2.5, page 10).
Since the Encyclopedia’s explanation is quite clear, rather than repeat it, I suggest reading section 2.5 to understand how to derive the answer of \(\frac{1}{2}(n + 1)(3n + 2)\).
Strength Reduction
Strength reduction is a code optimization that replaces expensive operations with less expensive ones. For example, integer multiplication takes about three times the amount of time as an integer add, although real-world performance, particularly with modern processors, may make the difference neglible. The authors of the chapter “Reduction of Operator Stength” attribute the basis of the technique to Babbage and his use in the Difference Engine.
Similar to Bailey’s long list of application areas, Allen, Cocke, and Kennedy list scenarios where strength reduction can be appplied:
- Multiplication of an induction variable by a region constant
- Multiplication of an induction variable by another induction variable
- Multiplication of an induction variable by itself
- Integer division of an induction variable by a region constant
- Integer modulo of an induction variable
- Exponentiation to an induction variable
- Integer addition (eliminating extraneous variables)
- General order n polynomials
- Trigonometric functions at equidistant points
- Continuous differentiable functions
The first three cases and the 8th case (which corresponds to the mathematical tables that motivated the Victorians) are reducible to pure additions, while the others will require some other operators.
To demonstrate how it works, we will show a simple example leveraging memory lookups. Memory lookups within arrays are very common operations in scientific computing. Let’s assume our code is summing the last column in a 4x3 matrix. The matrix is represented in row-major order and contains 32-bit signed integers.
The matrix for our example:
$$ \begin{pmatrix} 2 & 10 & 18 \\ 4 & 12 & 20 \\ 6 & 14 & 22 \\ 8 & 16 & 24 \end{pmatrix} $$
In memory, assuming the matrix begins at address 1000, the layout will be:
Address | Value |
---|---|
1000 | 2 |
1004 | 10 |
1008 | 18 |
1012 | 4 |
1016 | 12 |
1020 | 20 |
1024 | 6 |
1028 | 14 |
1032 | 22 |
1036 | 8 |
1040 | 16 |
1044 | 24 |
Thus, the address (A) of a cell within the matrix is the starting address (M) plus the zero-indexed row (r) multiplied by the size of a value in bytes (s) times the number of columns (C) plus the zero-indexed column (c) multiplied by the size of the value:
$$ A = M + r \times s \times C + c \times s $$
So, the address at zero-indices (1, 2) is 1000 + 1 * 4 * 3 + 2 * 4 or 1020, which is the value 20.
If we simplify by replacing known values with constants, we get:
$$ A = M + r \times s \times C + c \times s \\ A = 1000 + 12r + 4c $$
So, to lookup a value, we are looking at two multiplications and three additions.
To sum the last column in the zero-indexed matrix, a C snippet could be (using i and j instead of r and c to be more idiomatic):
int sum = 0;
int j = 2;
for(i = 0; i < 4; i++) {
sum += *(matrix + 12i + 4*j); // matrix[i][2]
}
return sum;
The 4*2 would be pre-computed by most compilers since the column is fixed in this example.
Using a strength reduction, we can insert a temporary variable, t1, that is initialized to the base address. Then, since i is being incremented by one each time and the “stride” is 12, we can eliminate the multiplication of i by 12 and just increment t1 by 12 directly.
int sum = 0;
int j = 2;
int t1 = matrix + 4*j;
for(i = 0; i < 4; i++) {
sum += *(t1); // memory load from address t1
t1 = t1 + 12;
}
return sum;
Thus, while the original code in the loop would be performing (at least) 4 multiplications and 12 additions, the strength reduced code is performing only 12 additions.
The FORTRAN I compiler, completed in 1957, included strength reduction to transform subscript calculations into index register increments and decrements. In F. E. Allen’s “A technological review of the FORTRAN I Compiler”, she states that this was one of most impactful optimizations.
Scalar Evolution
If you look at the list of optimizations in the LLVM compiler, you will find Scalar Evolution listed. Scalar Evolution is a way to model how the value of a scalar changes with the execution of code, particularly within a loop. It is based on the mathematical model of Chains of Recurrences, which is a generalized form of forward differencing. Theory has continued to evolve and improve.
Some additional references:
- Chains of Recurrences - a method to expedite the evaluation of closed form functions
- How LLVM Optimizes Power Sums
- Scalar Evolutions
References and Further Reading
Donald Knuth. 1998. “Tabulating polynomial values”, section 4.6.4, The Art of Computer Programming, Volume 2, 3rd Edition: Seminumerical Algorithms.
John Cocke and Ken Kennedy. 1977. An algorithm for reduction of operator strength. Commun. ACM 20, 11 (Nov. 1977), 850–856. https://doi.org/10.1145/359863.359888
Steven Muchnick and Neil Jones. 1981. Program Flow Analysis: Theory and Applications. Prentice-Hall, Inc.
Includes
- Allen, Cocke, and Kennedy. “Reduction of Operator Strength.”
Steven S. Muchnick. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann.