Fast elliptic curve arithmetic in Java
Having an efficient implementation of finite field arithmetic is an important prerequisite in elliptic curve systems because curve operations are performed using arithmetic operations in the underlying field.
Three kinds of finite fields are typically used:
- Prime fields
- Binary fields
- Optimal extension fields
A field is a set of elements along with the usual operations +, -, ×, and /. If the set is finite the field is called a finite field.
Here, we are mostly concerned with prime fields. Given a prime p > 3 the elements in the field are the numbers 0 to p-1 and where the operations are performed modulo p. This means that all operations end by dividing the result with p and only using the remainder; this operation is also called reduction modulo p.
Prime field arithmetic in Java
In Java prime field arithmetic can be implemented using the BigInteger class.
However, this class has a number of undesirable properties:
- All instances are immutable: new objects are always allocated to hold the result of an operation.
- The class implements arbitrary precision arithmetic: when e.g. multiplying two 192 bit integers you may get a 384 integer. However, since we always must end an operation with modulo p this precision is not needed: we only need precision up to the size of p in bits.
- It is not possible to perform certain fast reductions (ref Solinas) since we
are not allowed access to the
BigIntegerclass’ internal representation.
It would therefore be desirable to implement prime field arithmetic without
using the BigInteger class.
Custom field arithmetic in Java
We assume that the implementation platform has a W-bit architecture where W is a multiple of 8. Workstations are commonly 64- or 32-bit architectures. Low-power or inexpensive components may have smaller W, for example, some embedded systems are 16-bit and smartcards may have W = 8. The bits of a W-bit word U are numbered from 0 to W −1, with the rightmost bit of U designated as bit 0.
The Java JVM supports the types byte, short, int, and long, whose values are 8-bit, 16-bit, 32-bit, and 64- bit signed two’s-complement integers, respectively, and char, whose values are 16-bit unsigned integers representing Unicode characters. The JVM itself is a big-endian machine.
The base class could be implemented like this:
1final class BN {
2 int signum; // sign; -1, 0, or 1
3 int[] digits; // digits; msb representation, radix 32
4 static void add (BN r, BN a, BN b); // r = a + b
5 static void addp (BN r, BN a, BN b, BN p); // r = a + b mod p
6 static void sub (BN r, BN a, BN b); // r = a – b
7 static void subp (BN r, BN a, BN b, BN p); // r = a – b mod p
8 static void mul (BN r, BN a, BN b); // r = a * b
9 static void invMul (BN r, BN a, BN b, BN p); // r = a * b^-1 mod p
10 static void pow (BN r, int exponent); // r = r^exponent
11 static void powp (BN r, int exponent, BN p); // r = r^exponent mod p
12 static void shl (BN r, int n); // r = r << n
13 static void shr (BN r, int n); // r = r >> n
14 static void negate (BN r); // r = -r
15}Note that this class is mutable.
Using the 64 bit integer type, most operations are implemented easily:
1static void add (BN r, BN a, BN b) {
2 long sum = 0L;
3 int idx = a.digits.length;
4 while (--idx >= 0) {
5 sum = (a.digits[idx]&UINT) + (b.digits[idx]&UINT) + (sum>>>32);
6 r.digits[idx] = (int)sum;
7 }
8}and
1public static void addp (BN r, BN a, BN b, BN p) {
2 add(r,a,b);
3 if (compareTo(r,p) == 1) // r > p
4 sub(r,r,p);
5}The following sections dive into point multiplication and several other important aspects of elliptic curve arithmetic.
Point multiplication
The performance of any ECC scheme depends on scalar multiplication:
Q = kP = P + P + … + P
where a point P is added to itself k times.
This computation may be accelerated by:
- Choosing an efficient point representation
- Choosing an efficient scalar representation
Point representation
The choice of point representation affects the number of expensive field operations, particularly inversions. Common representations include:
- Affine coordinates
- Standard projective coordinates
- Jacobian projective coordinates
- Chudnovsky coordinates
Scalar representation
The representation of the scalar k should be chosen in such as a way that the number of add’s are minimized during scalar multiplication. The possibilities include:
- Binary
- Non-adjacent form (NAF)
- Binary
- Window
In addition, various time–memory tradeoffs can improve performance by using precomputation:
- When the base point is fixed
- When the scalar is fixed
Point representation
If inversion is expensive in the underlying field (which is typically the case),
it is advantageous to use projective coordinates. In Java,
BigInteger.modInverse is approximately 70 times more expensive than the next
most costly operation.
The following table summarizes the relative costs of common coordinate systems:
| Double | Cost | Addition | Cost | Mixed | Cost |
|---|---|---|---|---|---|
| 2A → A | 1I, 2M, 2S | A + A | 1I, 2M, 1S | J + A → J | 8M, 3S |
| 2P → P | 7M, 3S | P + P | 12M, 2S | J + C → J | 11M, 3S |
| 2J → J | 4M, 4S | J + J | 12M, 4S | C + A → C | 8M, 3S |
| 2C → C | 5M, 4S | C + C | 11M, 3S |
A = affine, P = standard projective, J = Jacobian projective, C = Chudnovsky, I = inversion, M = multiplication, S = squaring.
Modular inversion
The two most commonly used ways to compute the inverse of an integer a^-1 mod p are:
- Fermat’s little theorem: a^-1 mod p = a^{p-2} mod p.
- Euclid’s extended algorithm: ax + py = gcd(a,p) = 1. In this case x is the multiplicative inverse of p.
Experiments seem to suggest that 1. is the fastest for integers less than ~ 512 bit. And this is in fact the case for all NIST prime fields.
Implementation
The initial implementation will be based on the following:
- Affine coordinates: plain group laws from [1], page 80 for adding points.
- Mixed coordinates: example 3.20 from [1], page 88-89 for adding a point in projective Jacobian coordinates with a point in affine coordinates.
- Projective Jacobian coordinates: example 3.20 from [1], page 88-89 for doubling.
- NAF computation: algorithm 3.30 from [1], page 98 for computing the NAF representation of a scalar.
- Multiply: algorithm 3.31 from [1], pages 98-99 for performing scalar multiplication using a NAF.
We want to measure the raw performance of the implementation and compare it against OpenSSL without any precomputations. ECDH is a straightforward benchmark for this purpose, as it validates that the underlying algorithms are implemented efficiently. Additionally, we expect IBM’s JVM to outperform Sun’s in these tests.
Final remarks
- ECC scales better than existing schemes
- However, signature verification is more expensive than signature generation
- And certainly more expensive than RSA
- Performance of basic ECC over a prime field at small key lengths is
comparable to RSA
- But gains at larger key lengths
- Performance is highly dedendent on the JVM
- However, signature verification is more expensive than signature generation
- Perhaps more code than RSA
- Keys are decoupled from the crypto system leading to more complicated key
handling
- RSA keys carry this information with them, i.e. (n,e) specifies both key and crypto system
- Parameters should be verified before use
- Keys are decoupled from the crypto system leading to more complicated key
handling
- When to use ECC
- When the security level required makes RSA infeasible
- When the performance of signature / key generation matters
References
- Guide to Elliptic Curve Cryptography, Hankerson, Menezes and Vanstone
- Ellipic Curve Cryptography, SECG, http://www.secg.org/download/aid-385/sec1_final.pdf
- Recommended Elliptic Curve Domain Parameters, SECG, http://www.secg.org/download/aid-386/sec2_final.pdf
- ANSI X9.62-1998
- ANSI X9.63-2001
- IEEE P1363-2000