Reduced and Refined Karatsuba - Simple Micro-optimization of Karatsuba Multiplication

Introduction

Karatsuba multiplication is a famous algorithm for efficient bignum multiplication. In most descriptions, it’s said to reduce 4 multiplications to 3 multiplications, 4 additions and 2 subtractions.

In fact, there’s an extremely simple and less-known micro-optimization known as Refined Karatsuba that only uses 3.5 additions. It can be useful in high-speed cryptography in general, and in embedded systems with limited resource. A reader may find it’s trivial, but it’s almost never documented in most textbooks or tutorials, thus often missed. This article attempts to raise its awareness.

For modular multiplication in a finite field, an extra technique called Reduced Refined Karatsuba further reduces the cost to 3 additions.

Standard Karatsuba

First, let’s have a quick review. In standard Karatsuba multiplication, we have two integers aa and bb. For simplicity, assume both integers have even numbers of digits, not odd.

a=7670b=2196a=7670b=2196

First, we “split” both integers at the middle into two parts of equal size: ahah, alal, bhbh, and blbl. Specifically, we rewrite the integer aa in the form of a=ah×10n+ala=ah×10n+al, and same for bb.

a=ah×10n+ala=76×102+70b=bh×10n+blb=21×102+96a=ah×10n+ala=76×102+70b=bh×10n+blb=21×102+96

Second, independently multiply the low and high parts of the result.

L=al×blL=70×96=6720H=ah×bhH=76×21=1596L=al×blL=70×96=6720H=ah×bhH=76×21=1596

Third, calculate the middle part. There are two equivalent ways to calculate this: addition and subtraction. We use the subtractive variant.

M=(alah)×(bhbl)M=(7076)×(2196)=(6)×(75)=450M=(alah)×(bhbl)M=(7076)×(2196)=(6)×(75)=450

The final result is:

H×102n+(M+H+L)×10n+L1596×104+8766×102+6720=16843320H×102n+(M+H+L)×10n+L1596×104+8766×102+6720=16843320

This is the Karatsuba Identity. Using the schoolbook algorithm, it requires 4 half word multiplications (16 digits in total), as the high and low parts of the results must all be multiplied with each other. The Karatsuba algorithm needs just 3 half-word multiplications (12 digit) at the expense of 4 additions and 2 subtractions. But we can do better.

Refined Karatsuba

Note that in our final formula,

H×102n+(M+H+L)×10n+LH×102n+(M+H+L)×10n+L

both HH and LL have 2n2n digits. And in H×102nH×102n, we add exactly 2n2n digits of zero at the end. Therefore, computing H×102n+LH×102n+L is simple - write HH on the left and LL on the right.

H×102n+L1596×104+6720=HL=15966720H×102n+L1596×104+6720=HL=15966720

Then, we can calculate (M+H+L)×10n(M+H+L)×10n in place, by adding it in the middle of our intermediate result. Let lili, hihi and mimi represents each digit of the number, then see the visualization below.

h4h3h2h1l4l3l2l115966720+h4h3h2h1+0159600+l4l3l2l1+0672000+m4m3m2m1+0045000h4h3h2h1l4l3l2l115966720+h4h3h2h1+0159600+l4l3l2l1+0672000+m4m3m2m1+0045000

It should be obvious by now: the standard Karatsuba formula has a hidden inefficiency: a duplicated addition. If HH and LL has 2n2n digit, the lowest nn digits in H and highest nn digit in LL are always added together, twice.

We can solve the problem by precomputing the result. For maximum efficiency, HH can be computed on-the-fly in the middle of the calculation of HH, with tight assembly code.

H=(h4,h3,h2,h1)+(0,0,l4,l3)=1596+67=1663H=(h4,h3,h2,h1)+(0,0,l4,l3)=1596+67=1663

Then the final summation simplifies to:

h4h3h2h1l2l1l2l116632020+h4h3h2h1+0166300+m4m3m2m1+0045000p8p7p6p5p4p3p2p116843320h4h3h2h1l2l1l2l116632020+h4h3h2h1+0166300+m4m3m2m1+0045000p8p7p6p5p4p3p2p116843320

This is the Refined Karatsuba Identity, it requires only 7 half-word (3.5 full-word) additions instead of 8. This micro-optimization was originally proposed by Daniel J. Bernstein.

Reduced Refined Karatsuba

In cryptography applications, arithmetic operations are modular (i.e. the numbers “wrap back” after reaching a maximum value) because they represent elements in a finite field. Once the value went off the scale, a modular reduction is needed to bring it back within the proper range. In elliptic curve cryptography, extremely efficiency special-purpose reduction algorithms can be used, because the field size of a curve is often carefully chosen.

For example, in base-10 arithmetic, if the modulus is n=10k1n=10k1, casting out 9 is such an efficient algorithm. We simply split the number into k1k1 parts and add them together - repeat if necessary. For example: 16843320 mod 9999=16843320 mod 9999= 1684+33201684+3320 =5004=5004.

Fast reduction algorithms in cryptography is outside the scope of the article, I only want to convince you that sometimes using an extra reduction step is reasonable, which brings us to the Reduced Refined Karatsuba algorithm. It reduces the number of additions from 3.5 to 3.