TylerH said:
It's open. That's the reason I bought it over some existing GPU. GPU's are a blackbox, you can't do anything with them except through the APIs provided without some serious reverse engineering.
To be clear, I'm not criticizing anybody who can get something done and deliver a product to market.
Now, I'm not sure I understand this "open." Their silicon comes documentation and you write code to use it. Nvidia comes with documentation and you write code to use it. Neither one of them let you change the silicon just because you don't like that it uses floating point instead of integer. The both seem to be a bunch of processors in silicon with documentation and you write code to drive the silicon.
TylerH said:
I think the point, rather than having a bunch of 1024 bit integer CPUs (that would be big and consume a lot of power).
The 1024 bit is really a separate issue, and since you can't buy it then perhaps it doesn't matter. All I was trying to say was that if I could get the same integer performance that they provide as floating point performance then this would have lots of other applications beyond graphics game programming. The 660ti delivers 2500 gigaflops. If there were an integer part that delivered 2500*10^9 big integer add,sub,mul,div,mod operations per second (is that 2500 gips? :) this would be really interesting. Even if, because the 1024 bits is 16 or 32 times wider, it went 16 or 32 times slower and only gave 80*10^9 integer operations per second, with support to do multiple precision, then there are lots of things where doing 80 billion big integer calculations a second would be worth buying. Integer should be easier, you don't need to do all those convoluted things to exactly match the IEEE 754 floating point math standard. But this is mostly off topic because I can't buy these.
TylerH said:
is to have a processor that is relatively small and relatively slow but that scales. It's the on chip network that sets these chips apart from the regular SMP on your CPU. To communicate from one to another, you have to use RAM, which is much slower than their purported on chip communication speed.
Each core in the Nvidia is relatively small, they have 1300 of them in there, and really fast. And it scales, buy two or four of the cards and chain them, just like this open source project tells you to buy multiple cards and chain them when one card isn't enough. None of this has anything to do with using your CPU to do any of this, that is thousands of times slower. The Nvidia CUDA cores communicate within the chip and never take thousands of nanoseconds to go off chip to send a message back on chip.
TylerH said:
I can't access the document either. It says the server can't find it.
Thank you. At least that says it isn't something I'm doing wrong.
If anyone has the time and knowledge to write up a side-by-side performance comparison then please do so.
Thank you