Martin's Blog

Fast elliptic curve arithmetic in Java

Having an efficient implementation of finite field arithmetic is an important prerequisite in elliptic curve systems because curve operations are performed using arithmetic operations in the underlying field.

Three kinds of finite fields are typically used:

A field is a set of elements along with the usual operations +, -, ×, and /. If the set is finite the field is called a finite field.

Here, we are mostly concerned with prime fields. Given a prime p > 3 the elements in the field are the numbers 0 to p-1 and where the operations are performed modulo p. This means that all operations end by dividing the result with p and only using the remainder; this operation is also called reduction modulo p.

Prime field arithmetic in Java

In Java prime field arithmetic can be implemented using the BigInteger class. However, this class has a number of undesirable properties:

It would therefore be desirable to implement prime field arithmetic without using the BigInteger class.

Custom field arithmetic in Java

We assume that the implementation platform has a W-bit architecture where W is a multiple of 8. Workstations are commonly 64- or 32-bit architectures. Low-power or inexpensive components may have smaller W, for example, some embedded systems are 16-bit and smartcards may have W = 8. The bits of a W-bit word U are numbered from 0 to W −1, with the rightmost bit of U designated as bit 0.

The Java JVM supports the types byte, short, int, and long, whose values are 8-bit, 16-bit, 32-bit, and 64- bit signed two’s-complement integers, respectively, and char, whose values are 16-bit unsigned integers representing Unicode characters. The JVM itself is a big-endian machine.

The base class could be implemented like this:

 1final class BN {
 2    int signum; // sign; -1, 0, or 1
 3    int[] digits; // digits; msb representation, radix 32
 4    static void add (BN r, BN a, BN b); // r = a + b
 5    static void addp (BN r, BN a, BN b, BN p); // r = a + b mod p
 6    static void sub (BN r, BN a, BN b); // r = a – b
 7    static void subp (BN r, BN a, BN b, BN p); // r = a – b mod p
 8    static void mul (BN r, BN a, BN b); // r = a * b
 9    static void invMul (BN r, BN a, BN b, BN p); // r = a * b^-1 mod p
10    static void pow (BN r, int exponent); // r = r^exponent
11    static void powp (BN r, int exponent, BN p); // r = r^exponent mod p
12    static void shl (BN r, int n); // r = r << n
13    static void shr (BN r, int n); // r = r >> n
14    static void negate (BN r); // r = -r
15}

Note that this class is mutable.

Using the 64 bit integer type, most operations are implemented easily:

1static void add (BN r, BN a, BN b) {
2    long sum = 0L;
3    int idx = a.digits.length;
4    while (--idx >= 0) {
5        sum = (a.digits[idx]&UINT) + (b.digits[idx]&UINT) + (sum>>>32);
6        r.digits[idx] = (int)sum;
7    }
8}

and

1public static void addp (BN r, BN a, BN b, BN p) {
2    add(r,a,b);
3    if (compareTo(r,p) == 1) // r > p
4        sub(r,r,p);
5}

The following sections dive into point multiplication and several other important aspects of elliptic curve arithmetic.

Point multiplication

The performance of any ECC scheme depends on scalar multiplication:

Q = kP = P + P + … + P

where a point P is added to itself k times.

This computation may be accelerated by:

Point representation

The choice of point representation affects the number of expensive field operations, particularly inversions. Common representations include:

Scalar representation

The representation of the scalar k should be chosen in such as a way that the number of add’s are minimized during scalar multiplication. The possibilities include:

In addition, various time–memory tradeoffs can improve performance by using precomputation:

Point representation

If inversion is expensive in the underlying field (which is typically the case), it is advantageous to use projective coordinates. In Java, BigInteger.modInverse is approximately 70 times more expensive than the next most costly operation.

The following table summarizes the relative costs of common coordinate systems:

Double Cost Addition Cost Mixed Cost
2A → A 1I, 2M, 2S A + A 1I, 2M, 1S J + A → J 8M, 3S
2P → P 7M, 3S P + P 12M, 2S J + C → J 11M, 3S
2J → J 4M, 4S J + J 12M, 4S C + A → C 8M, 3S
2C → C 5M, 4S C + C 11M, 3S

A = affine, P = standard projective, J = Jacobian projective, C = Chudnovsky, I = inversion, M = multiplication, S = squaring.

Modular inversion

The two most commonly used ways to compute the inverse of an integer a^-1 mod p are:

  1. Fermat’s little theorem: a^-1 mod p = a^{p-2} mod p.
  2. Euclid’s extended algorithm: ax + py = gcd(a,p) = 1. In this case x is the multiplicative inverse of p.

Experiments seem to suggest that 1. is the fastest for integers less than ~ 512 bit. And this is in fact the case for all NIST prime fields.

Implementation

The initial implementation will be based on the following:

We want to measure the raw performance of the implementation and compare it against OpenSSL without any precomputations. ECDH is a straightforward benchmark for this purpose, as it validates that the underlying algorithms are implemented efficiently. Additionally, we expect IBM’s JVM to outperform Sun’s in these tests.

Final remarks

References

  1. Guide to Elliptic Curve Cryptography, Hankerson, Menezes and Vanstone
  2. Ellipic Curve Cryptography, SECG, http://www.secg.org/download/aid-385/sec1_final.pdf
  3. Recommended Elliptic Curve Domain Parameters, SECG, http://www.secg.org/download/aid-386/sec2_final.pdf
  4. ANSI X9.62-1998
  5. ANSI X9.63-2001
  6. IEEE P1363-2000