Java Numeric Computing: Performance Discussion

haki · Sep 18, 2006

the purpose of this thread is to be a placeholder for discussion about the performance of Java in Numerical Computations.

Started as a discussion between CRGreathouse and me about weather or not Java can be used effectively in doing computations.

The plan we have:

Phase 1:

Solve a computational problem in

o)Java using BigInteger and test it.

o)in C++ using GMP and test it.

And then compare the results.

Phase 2:

TBD.

haki · Sep 18, 2006

The Java program for testing,

trying to calculate a contradiction for the Dysons hipothesis,that the reverse of a power of 2 number is never a power of five number,sounds like a easy and a good way to test Java and the performance of BigInteger

http://www.edge.org/q2005/q05_9.html

Code:

import java.math.BigInteger;

public class Dyson
{
    private static BigInteger five = new BigInteger("5");

    private static BigInteger two = new BigInteger("2");

    private static BigInteger starter = new BigInteger("2");

    public static void main(String[] args)
    {

	long start = System.currentTimeMillis();
	for (int i = 0; i < 5e4; i++)
	{
	    starter = starter.multiply(two);
	    if (isPowerOfFive(reverseBigInteger(starter)))
	    {
		System.out.println("HUREKA!");
		System.out.println("Found a power of five!");
		System.out.println(starter);
	    }

	}
	long end = System.currentTimeMillis();

	System.out.println("It took " + (end - start)
		+ "to test first 5e4 reverses");

    }

    public static boolean isPowerOfFive(BigInteger bi)
    {
	if (bi.mod(five).compareTo(BigInteger.ZERO) != 0)
	{
	    return false;
	}

	while (bi.compareTo(five) != 0)
	{
	    bi = bi.divide(five);
	    if (bi.mod(five).compareTo(BigInteger.ZERO) != 0)
	    {
		return false;
	    }

	}

	return true;
    }

    public static BigInteger reverseBigInteger(BigInteger integer)
    {

	return new BigInteger(reverseString(integer.toString()));

    }

    public static String reverseString(String toReverse)
    {

	char[] out = new char[toReverse.length()];

	for (int i = 0, j = out.length - 1; i < out.length; i++, j--)
	{
	    out[i] = toReverse.charAt(j);
	}

	return new String(out);
    }
}

chroot · Sep 18, 2006

I can already predict the answer: it'll be nearly a tie, but C++ will win by a small margin.

Java's BigInteger is based on a popular C++ library, bn if memory serves me right. Since compiled Java just-in-time compiled bytecode has only a tiny penalty over native machine code, and because bn is likely a little less optimized than GMP, I suspect the Java version of the code will fall with 5% of the performance of the C++ version.

- Warren

CRGreathouse · Sep 18, 2006

I ran that Java code on my machine, and it ran in 5,279,141 milliseconds -- about an hour and a half. How long does it take for you?

As a note on the algorithm, I'm writing a C# program designed for this task in particular. I'm not using a bignum, but a custom class that's efficient for this task in particular. (It ues shifts, uses an efficient check for overflow based on the fact that the numbers are increasing slowly, stores numbers in a format that is easy to convert to decimal, etc.) It took 0.69 seconds to check up to 50,000. Note that my C# code is incomplete, and techically only valid (at the moment) up to 2,000,000 (which takes a little over a quarter-hour).

CRGreathouse · Sep 18, 2006

I'm having trouble properly setting up GMP for C++, so as a stopgap here's a GP program modeled after your Java program. GP uses GMP as a back-end, and it's certainly not as fast as C++.

Code:

Dyson(mx=50000)={
	local(n);
	n = 1;
	for(i=1,mx,
		n = n << 1;
		if (isPowerOfFive(reverse(n)), print("Found something at i = "i));
	);
}

isPowerOfFive(n)={
	while(n%5==0,
		n = n \ 5;
	);
	n == 1
}

reverse(n)={
	local(a);
	a=0;
	while(n > 0,
		a = a * 10 + n%10;
		n = n \ 10;
	);
	a
}

Edit: I'm toying with the code to reduce the number of levels of recursion, since GP is bad at that. There's no timing code because that's built into PARI which I'm using as a front-end.

CRGreathouse · Sep 19, 2006

Unfortunately, PARI/GP is interpreted, which means it's not a good speed comparison here. It's actually quite slow with this code -- it has no way to manipulate strings (that I know about), so it's spending most of its time doing modular arithmetic to calculate the reverse of the number. It's not even going to be as fast as Java unless I can find a better way to reverse numbers quickly.

shmoe · Sep 19, 2006

I couldn't find a way to manipulate strings in pari/gp either. You can convert a string to a vector and do your manipulations, but this seems to be a horrible memory hog and I ran into problems when the exponents hit 10^5 or so (it was faster in the lower ranges though).

You can avoid having to reverse most of the numbers by just looking at the first few digits of your power of 2. The last digit of a power of 5 is "5", the last two are "25", the last three are "125" or "625", and so on. In general there are 2^(n-2) possibilities for the last n digits (this is assuming your power of 5 is larger than 10^n), so you can quickly eliminate the need to reverse a good chunk by checking the first few digits of your power of 2 with a small list. This modified algorithm would of course change what you are comparing between the languages.

CRGreathouse · Sep 19, 2006

shmoe said:

You can avoid having to reverse most of the numbers by just looking at the first few digits of your power of 2. The last digit of a power of 5 is "5", the last two are "25", the last three are "125" or "625", and so on. In general there are 2^(n-2) possibilities for the last n digits (this is assuming your power of 5 is larger than 10^n), so you can quickly eliminate the need to reverse a good chunk by checking the first few digits of your power of 2 with a small list. This modified algorithm would of course change what you are comparing between the languages.

This is one of the reasons that my C# program takes less than a second to do what the base program does in ~87 minutes.

shmoe · Sep 19, 2006

CRGreathouse said:

This is one of the reasons that my C# program takes less than a second to do what the base program does in ~87 minutes.

Ah, I had thought you were using the same algorithm in C# as the Java one above to compare the languages.

CRGreathouse · Sep 19, 2006

shmoe said:

Ah, I had thought you were using the same algorithm in C# as the Java one above to compare the languages.

The idea was to use the same algorithm in C++ to compare those, but I happened to also write one (in C#) that was designed for efficiency. It has no real bearing on the others.

PARI/GP just seems to be less efficient, more's the pity. That's often the case, sadly.

CRGreathouse · Sep 20, 2006

The C# version still isn't quite done, but it works correctly now up to 8 million. This takes 17.3 ks, or about 4 hours 49 minutes.

CRGreathouse · Sep 20, 2006

See this link for discussion of practical use of bignums in Java:
http://www.luschny.de/math/factorial/conclusions.html

nmtim · Sep 21, 2006

Couple questions:

1. Why evaluate every power of two? The only ones that will be candidates are ones that begin with the digit 5. That looks like only every 9th or 10th power of 2. (I'm too dumb to count.)

2. Big integers--bit of a niche market? Why not look at floating point performance?

3. What fraction of time is being spent on string manipulation?

4. What time are you measuring? Wall clock? User or CPU time? How quiet is the test system? What is the test system?

Cheers,
Tim

CRGreathouse · Sep 21, 2006

1. Because the first code posted tested every one, so anything else would have to do the same to be comparable.

2. Testing bignums is the purpose of this topic. Anything else would be irrelevant.

3. Surely over 99.9%. Compare my C# program's time, which is designed to minimize time spent in string manipulation.

4. System time. We're assuming the process is the only one using the CPU (or in my case, the only process using the core).

I don't have the specs handy at the moment, but I do have the specs for boh systems, as I recall. Mine has 1.66 GHz per core (I'm only using one core for the test) and effectively unlimited memory.

CRGreathouse · Sep 22, 2006

Well, further testing has shown that the algorithm in my C# program really is only good to 10,000,000. I found one point it can't decide on: 10,683,363.* This is the only point it has trouble on below 12 million -- I'm working on testing up to 16 million now.

Here's a table of timing information. The number on the left is the size to check up to, in millions; the number on the right is the number of seconds.

0.05 0.66
1 253.78
2 956.38
4 3591.09
8 17326.44
12 37802.53
16 67328 (est.)

As you can see the algorithm is clearly quadratic.

* 2^10683363 = 5218752935731...84919808, with a total of 3,216,013 digits. Its reverse is divisible by 5⁹ but not by 5¹⁰.

haki · Oct 9, 2006

hey there,

been busy with studies, it is very hard to pass when the math prof' thinks that "--5" is a valid arithmetic expression.

Anyway, I have done a small tweak to the BigInteger class, namely I got rid of the toString method to access the int[] directly, that is not the standard OOP practice - encapsulation(you should hide the fields and use setters and getters) but oddly the only way to get the value is either with the toString() method or with the toByteArray(), neither do the trick, I don't want to get the String representation nor do I fancy for a two's-complement representation of the BigInteger.

Here is the revised code;

Code:

public class Dyson {
    private static BigInteger five = new BigInteger("5");

    private static BigInteger two = new BigInteger("2");

    private static BigInteger starter = new BigInteger("2");

    public static void main(String[] args) {

	long start = System.currentTimeMillis();
	for (int i = 0; i < 5e4; i++) {
	    starter = starter.multiply(two);
	    if (isPowerOfFive(reverseBigInteger(starter))) {
		System.out.println("HUREKA!");
		System.out.println("Found a power of five!");
		System.out.println(starter);
	    }

	}
	long end = System.currentTimeMillis();

	System.out.println("It took " + (end - start) + "to test first 5e4 reverses");

    }

    public static boolean isPowerOfFive(BigInteger bi) {
	if (bi.mod(five).compareTo(BigInteger.ZERO) != 0) {
	    return false;
	}

	BigInteger temp = new BigInteger(bi.mag);
	while (temp.compareTo(five) != 0) {
	    temp = temp.divide(five);
	    if (temp.mod(five).compareTo(BigInteger.ZERO) != 0) {
		return false;
	    }

	}

	return true;
    }

    public static BigInteger reverseBigInteger(BigInteger integer) {

	return new BigInteger(reverseIntArray(integer.mag));

    }

    public static int[] reverseIntArray(int[] array) {

	int[] returnArray = new int[array.length];

	for (int i = 0; i < returnArray.length; i++) {
	    returnArray[i] = reverseNumber(array[returnArray.length - 1 - i]);
	}

	return returnArray;
    }

    public static int reverseNumber(int arg) {
	int i = 0;
	int a = 0;
	while (arg != 0) {
	    i = arg % 10;
	    arg = arg / 10;
	    a = (a * 10) + i;
	}
	return a;
    }
}

Note the BigInteger is modified, namely the mag field is public and the BigInteger(int[] mag) constructor is as well public.

Now it takes about 2,2 seconds(access to the array - no toString()) rather than hours to do the first 50k iterations.

Iterations(successive powers of 2), time taken
5e3, 62ms
5e4, 2343ms
5e5, 244281ms

My conclusion is that out-of-the-box BigInteger is not very good, namely it should if nothing else has method that modifies the instance itself rather then create a new object, e.g.

BigInteger two = new BigInteger("2");
BigInteger three = new BigInteger("3");

now if I wan't to do two * three I would prefer to do it this way

two.multiply(three); now the instance two will get modified! after the call the value of two should be 6 but unfortuately it doesn't work that way a new Objects gets created with the value of 6 and the two instance is unchanged, this sounds like a good OOP idea, but doing 5e5 Iteration that means that 5e5 objects are going to be created and garbage collected! That makes very little sense.

I was surprised to find out that there are some other "hidden" classes in the standard library namely MutableBigInteger, SignedMutableBigInteger.

If the BigInteger were implemented better and less OOP I think it should still be able to cut the mustard, but I admit that BigInteger is slow and inefficient.

But I think that array operations in Java should be of similar performace of that in C/C++ therefore I still think that one could write a fast and efficient BigInteger class.

Thank you for the link to http://www.luschny.de/math/factorial/conclusions.html I will take a look at the LargeInteger.java looks interesting. Porhaps if I have some time I will write my own BigInteger class just for practice, I never get tired of Java

haki · Oct 22, 2006

One interesting find

http://jscience.org/api/org/jscience/mathematics/numbers/LargeInteger.html

from JScience.org uses

http://javolution.org/

will see how this benchmark application will preform with the usage of JScience and Javolution, to come.

Hurkyl · Oct 22, 2006

I was surprised to find out that there are some other "hidden" classes in the standard library namely MutableBigInteger, SignedMutableBigInteger.

If they're hidden, they're probably not standard! You have no guarantee they will even exist, let alone behave the same, in other implementations of java, or in future upgrades.

But I think that array operations in Java should be of similar performace of that in C/C++ therefore I still think that one could write a fast and efficient BigInteger class.

I'm under the opposite impression -- Java insists on doing bounds checking on every array access, so it's necessarily slower. If you do little work per array access, the difference will stand out.

Anyways, you should test it out first -- write programs that do lots of array accesses (and very simple computations) to compare speed. Maybe modern compilers are allowed to, and can, optimize away the bounds checking.

Java Numeric Computing: Performance Discussion

1. What is Java Numeric Computing?

2. Why is performance important in Java Numeric Computing?

3. How can performance be improved in Java Numeric Computing?

4. Are there any limitations to Java Numeric Computing performance?

5. How does Java compare to other programming languages in terms of Numeric Computing performance?

Similar threads

Hot Threads

Recent Insights