Karatsuba's algorithm is a way to multiply large numbers faster than
O(n^2), the first such algorithm (invented by Karatsuba, a Russian
mathematician, in 1960).

If b is the base in use and n is the smallest value such that b^n
exceeds both arguments (ie, the size of the arguments, measured in
digits), let K be b^ceil(n/2).  For example, if the input numbers are
200 digits long, n would be 200 and K would be b^100.  In these terms,
Karatsuba multiplication is based on splitting the input numbers in
half (eg, A = Ah K + Al, Ah and Al being the high and low halves) and
writing the product as

A B =
[1]	(Ah K + Al) (Bh K + Bl) =
[2]	Ah Bh K^2 + Al Bh K + Ah Bl K + Al Bl =
[3]	Ah Bh K^2 + (Al Bh + Ah Bl) K + Al Bl =
[4]	Ah Bh K^2 + (Ah Bh + Al Bh + Ah Bl + Al Bl - Ah Bh - Al Bl) K +	Al Bl =
[5]	Ah Bh K^2 + ((Ah + Al) (Bh + Bl) - Ah Bh - Al Bl) K + Al Bl

`Baseline' multiplication refers to any of the various O(n^2)
algorithms with low constant factors.  The form we use is a
digit-by-digit computation rather like the usual grade-school method.
But we can also think of it in terms of the above, as just using [2]
and computing the four sub-products involved either directly or by
recursing; this leads to an O(n^2) algorithm too.  (It's most obviously
O(4^(lg n)), but 4^(lg n) = n^2.)

But if we use line [5] instead, we can see that, at the cost of a few
more O(n) adds and subtracts, we can compute the full product while
needing only three half-size sub-products.  The resulting algorithm is
O(3^(lg n)), or O(n^(lg 3)), about O(n^1.585).

There is one minor issue in that the (Ah+Al)(Bh+Bl) recursion
potentially needs to be one bit wider than the Ah Bh and Al Bl
products.  In serious large-number arithmetic, this isn't an issue,
because that bloats the numbers by only one bit per level of recursion,
and `digits' are typically something like 30 bits.  (If the numbers are
big enough for the recursion to approach 30 levels, you really want to
use something even faster, like Schönhage-Strassen; this program
doesn't handle numbers anything like that large.)  But for us, digits
are typically much smaller.  To avoid digit-count bloat as the
recursion happens, we rewrite [5] as

[6]	Ah Bh K^2 + (Ah Bh + Al Bl - (Ah - Al) (Bh - Bl)) K + Al Bl

which requires us to keep track of signs when recursing.  However,
that's easier to spread over the stack in the form of automatic
variables in the recursive calls, rather than larger digit buffers.

As an example of Karatsuba, let's multiply 1620984393738118 by
3656235198324125 (b=10 and N=16).  Baseline multiplication, in its
recursive form, does this as

1620984393738118 · 3656235198324125 =
	(16209843·100000000 + 93738118) · (36562351·100000000 + 98324125) =
	(16209843·36562351)·10000000000000000 +
		93738118·36562351·100000000 +
		16209843·98324125·100000000 +
		93738118·98324125 =
(recurse for 16209843·36562351, giving 592669969420893)
(recurse for 93738118·36562351, giving 3427285972395418)
(recurse for 16209843·98324125, giving 1593818629362375)
(recurse for 93738118·98324125, giving 9216718431496750)
	5926699694208930000000000000000 +
		342728597239541800000000 +
		159381862936237500000000 +
		9216718431496750 =
	5926700196319399392497731496750

This involves four half-size multiplications.  The Karatsuba approach
instead transforms this into

1620984393738118 · 3656235198324125 =
	(16209843·100000000 + 93738118) · (36562351·100000000 + 98324125) =
	(16209843·36562351)·10000000000000000 +
		( (16209843+93738118)·(36562351+98324125)
		  - 16209843·36562351 - 93738118·98324125 ) · 100000000 +
		93738118·98324125 =
	(16209843·36562351)·10000000000000000 +
		( 109947961·134886476 - 16209843·36562351 - 93738118·98324125 ) · 100000000 +
		93738118·98324125 =
(recurse for 16209843·36562351, giving 592669969420893)
(recurse for 109947961·134886476, giving 14830493002675436)
(recurse for 93738118·98324125, giving 9216718431496750)
	5926699694208930000000000000000 +
		(14830493002675436 - 592669969420893 - 9216718431496750) · 100000000 +
		9216718431496750 =
	5926699694208930000000000000000 +
		502110460175779300000000 +
		9216718431496750 =
	5926700196319399392497731496750

where only three half-size multiplications are called for (though two
of them have their results used twice each).  In the "use signs instead
of an extra digit" form, using [6] instead of [5], it's

1620984393738118 · 3656235198324125 =
	(16209843·100000000 + 93738118) · (36562351·100000000 + 98324125) =
	(16209843·36562351)·10000000000000000 +
		( 16209843·36562351 + 93738118·98324125
		  - (16209843-93738118)·(36562351-98324125) ) · 100000000 +
		93738118·98324125 =
	(16209843·36562351)·10000000000000000 +
		( 16209843·36562351 + 93738118·98324125
		  - -77528275·-61761774 ) · 100000000 +
		93738118·98324125 =
(recurse for 16209843·36562351, giving 592669969420893)
(recurse for 77528275·61761774, giving 4788283799159850)
(recurse for 93738118·98324125, giving 9216718431496750)
	5926699694208930000000000000000 +
		(592669969420893 + 9216718431496750 - 4788283799159850) · 100000000 +
		9216718431496750 =
	5926699694208930000000000000000 +
		502110460175779300000000 +
		9216718431496750 =
	5926700196319399392497731496750

It is possible to split A and B into more than two pieces and pull
basically the same trick, leading to an even further reduction in the
exponent - this is Toom-Cook multiplication.  However, what partial
products are needed and how to combine them get correspondingly more
complicated.  While the larger splitting factors do give asymptotically
faster algorithms, the overhead is high enough that the point at which
they become faster in practice rapidly exceeeds the size of numbers
this program is intended for, to the point where it's not worth the
bother of going Toom-Cook (and definitely not worth using
Schönhage-Strassen or Furer).