Fast elliptic curve arithmetic in Java

22 Feb, 2006

Having an efficient implementation of finite field arithmetic is an important prerequisite in elliptic curve systems because curve operations are performed using arithmetic operations in the underlying field.

Three kinds of finite fields are typically used:

Prime fields
Binary fields
Optimal extension fields

A field is a set of elements along with the usual operations +, -, ×, and /. If the set is finite the field is called a finite field.

Here, we are mostly concerned with prime fields. Given a prime p > 3 the elements in the field are the numbers 0 to p-1 and where the operations are performed modulo p. This means that all operations end by dividing the result with p and only using the remainder; this operation is also called reduction modulo p.

Prime field arithmetic in Java

In Java prime field arithmetic can be implemented using the BigInteger class. However, this class has a number of undesirable properties:

All instances are immutable: new objects are always allocated to hold the result of an operation.
The class implements arbitrary precision arithmetic: when e.g. multiplying two 192 bit integers you may get a 384 integer. However, since we always must end an operation with modulo p this precision is not needed: we only need precision up to the size of p in bits.
It is not possible to perform certain fast reductions (ref Solinas) since we are not allowed access to the BigInteger class’ internal representation.

It would therefore be desirable to implement prime field arithmetic without using the BigInteger class.

Custom field arithmetic in Java

We assume that the implementation platform has a W-bit architecture where W is a multiple of 8. Workstations are commonly 64- or 32-bit architectures. Low-power or inexpensive components may have smaller W, for example, some embedded systems are 16-bit and smartcards may have W = 8. The bits of a W-bit word U are numbered from 0 to W −1, with the rightmost bit of U designated as bit 0.

The Java JVM supports the types byte, short, int, and long, whose values are 8-bit, 16-bit, 32-bit, and 64- bit signed two’s-complement integers, respectively, and char, whose values are 16-bit unsigned integers representing Unicode characters. The JVM itself is a big-endian machine.

The base class could be implemented like this:

 1final class BN {
 2    int signum; // sign; -1, 0, or 1
 3    int[] digits; // digits; msb representation, radix 32
 4    static void add (BN r, BN a, BN b); // r = a + b
 5    static void addp (BN r, BN a, BN b, BN p); // r = a + b mod p
 6    static void sub (BN r, BN a, BN b); // r = a – b
 7    static void subp (BN r, BN a, BN b, BN p); // r = a – b mod p
 8    static void mul (BN r, BN a, BN b); // r = a * b
 9    static void invMul (BN r, BN a, BN b, BN p); // r = a * b^-1 mod p
10    static void pow (BN r, int exponent); // r = r^exponent
11    static void powp (BN r, int exponent, BN p); // r = r^exponent mod p
12    static void shl (BN r, int n); // r = r << n
13    static void shr (BN r, int n); // r = r >> n
14    static void negate (BN r); // r = -r
15}

Note that this class is mutable.

Using the 64 bit integer type, most operations are implemented easily:

1static void add (BN r, BN a, BN b) {
2    long sum = 0L;
3    int idx = a.digits.length;
4    while (--idx >= 0) {
5        sum = (a.digits[idx]&UINT) + (b.digits[idx]&UINT) + (sum>>>32);
6        r.digits[idx] = (int)sum;
7    }
8}

and

1public static void addp (BN r, BN a, BN b, BN p) {
2    add(r,a,b);
3    if (compareTo(r,p) == 1) // r > p
4        sub(r,r,p);
5}

The following sections dive into point multiplication and several other important aspects of elliptic curve arithmetic.

Point multiplication

The performance of any ECC scheme depends on scalar multiplication:

Q = kP = P + P + … + P

where a point P is added to itself k times.

This computation may be accelerated by:

Choosing an efficient point representation
Choosing an efficient scalar representation

Point representation

The choice of point representation affects the number of expensive field operations, particularly inversions. Common representations include:

Affine coordinates
Standard projective coordinates
Jacobian projective coordinates
Chudnovsky coordinates

Scalar representation

The representation of the scalar k should be chosen in such as a way that the number of add’s are minimized during scalar multiplication. The possibilities include:

Binary
Non-adjacent form (NAF)
- Binary
- Window

In addition, various time–memory tradeoffs can improve performance by using precomputation:

When the base point is fixed
When the scalar is fixed

Point representation

If inversion is expensive in the underlying field (which is typically the case), it is advantageous to use projective coordinates. In Java, BigInteger.modInverse is approximately 70 times more expensive than the next most costly operation.

The following table summarizes the relative costs of common coordinate systems:

Double	Cost	Addition	Cost	Mixed	Cost
2A → A	1I, 2M, 2S	A + A	1I, 2M, 1S	J + A → J	8M, 3S
2P → P	7M, 3S	P + P	12M, 2S	J + C → J	11M, 3S
2J → J	4M, 4S	J + J	12M, 4S	C + A → C	8M, 3S
2C → C	5M, 4S	C + C	11M, 3S

A = affine, P = standard projective, J = Jacobian projective, C = Chudnovsky, I = inversion, M = multiplication, S = squaring.

Modular inversion

The two most commonly used ways to compute the inverse of an integer a^-1 mod p are:

Fermat’s little theorem: a^-1 mod p = a^{p-2} mod p.
Euclid’s extended algorithm: ax + py = gcd(a,p) = 1. In this case x is the multiplicative inverse of p.

Experiments seem to suggest that 1. is the fastest for integers less than ~ 512 bit. And this is in fact the case for all NIST prime fields.

Implementation

The initial implementation will be based on the following:

Affine coordinates: plain group laws from [1], page 80 for adding points.
Mixed coordinates: example 3.20 from [1], page 88-89 for adding a point in projective Jacobian coordinates with a point in affine coordinates.
Projective Jacobian coordinates: example 3.20 from [1], page 88-89 for doubling.
NAF computation: algorithm 3.30 from [1], page 98 for computing the NAF representation of a scalar.
Multiply: algorithm 3.31 from [1], pages 98-99 for performing scalar multiplication using a NAF.

We want to measure the raw performance of the implementation and compare it against OpenSSL without any precomputations. ECDH is a straightforward benchmark for this purpose, as it validates that the underlying algorithms are implemented efficiently. Additionally, we expect IBM’s JVM to outperform Sun’s in these tests.

Final remarks

ECC scales better than existing schemes
- However, signature verification is more expensive than signature generation
  - And certainly more expensive than RSA
- Performance of basic ECC over a prime field at small key lengths is comparable to RSA
  - But gains at larger key lengths
- Performance is highly dedendent on the JVM
Perhaps more code than RSA
- Keys are decoupled from the crypto system leading to more complicated key handling
  - RSA keys carry this information with them, i.e. (n,e) specifies both key and crypto system
- Parameters should be verified before use
When to use ECC
- When the security level required makes RSA infeasible
- When the performance of signature / key generation matters

References

Guide to Elliptic Curve Cryptography, Hankerson, Menezes and Vanstone
Ellipic Curve Cryptography, SECG, http://www.secg.org/download/aid-385/sec1_final.pdf
Recommended Elliptic Curve Domain Parameters, SECG, http://www.secg.org/download/aid-386/sec2_final.pdf
ANSI X9.62-1998
ANSI X9.63-2001
IEEE P1363-2000