# Parallel Computing Project on Kickstarter (computer for $99) Check this out: http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone. It's a project that aims to create open source hardware for scalable parallel computing. If you pledge$99, you get one of the computer they are building.

In case anyone is wondering about how Kickstarter works, if they don't reach the goal of $750,000, you don't get anything; but they don't charge the amount you pledged, either. ## Answers and Replies Borek Mentor For some reason makes me think about Raspberry Pi. AlephZero Science Advisor Homework Helper Except the Raspberry Pi actually exists. Good luck to them, but make sure you understand the risks involved before you get your credit card out. Remember they take your money when they reach the funding target, not when they have finished developing the product. http://www.bbc.co.uk/news/technology-20003916 The first question that comes to my mind is: How does this compare to buying an Nvidia 6xx graphics card which has lots of parallel cores, does fast floating point, has program development tools available and seems to be roughly about the same price? If someone did a side-by-side comparison of these two options that would be interesting to study. If some company would announce a product with lots of equally fast 1024 bit integer CPUs that would be even more interesting for some projects. But wide integer CPUs can't be found. The Playstation Cell processor, the Nvidia cores, these only support floating point calculations. To attempt to emulate wide integer calculations with these slows down performance by an order of magnitude or more. Can anyone else see the processor reference manual behind the link on the kickstart page? Every time I try I get "cannot be displayed." The first question that comes to my mind is: How does this compare to buying an Nvidia 6xx graphics card which has lots of parallel cores, does fast floating point, has program development tools available and seems to be roughly about the same price? If someone did a side-by-side comparison of these two options that would be interesting to study. If some company would announce a product with lots of equally fast 1024 bit integer CPUs that would be even more interesting for some projects. But wide integer CPUs can't be found. The Playstation Cell processor, the Nvidia cores, these only support floating point calculations. To attempt to emulate wide integer calculations with these slows down performance by an order of magnitude or more. Can anyone else see the processor reference manual behind the link on the kickstart page? Every time I try I get "cannot be displayed." It's open. That's the reason I bought it over some existing GPU. GPU's are a blackbox, you can't do anything with them except through the APIs provided without some serious reverse engineering. I think the point, rather than having a bunch of 1024 bit integer CPUs (that would be big and consume a lot of power) is to have a processor that is relatively small and relatively slow but that scales. It's the on chip network that sets these chips apart from the regular SMP on your CPU. To communicate from one to another, you have to use RAM, which is much slower than their purported on chip communication speed. I can't access the document either. It says the server can't find it. It's open. That's the reason I bought it over some existing GPU. GPU's are a blackbox, you can't do anything with them except through the APIs provided without some serious reverse engineering. To be clear, I'm not criticizing anybody who can get something done and deliver a product to market. Now, I'm not sure I understand this "open." Their silicon comes documentation and you write code to use it. Nvidia comes with documentation and you write code to use it. Neither one of them let you change the silicon just because you don't like that it uses floating point instead of integer. The both seem to be a bunch of processors in silicon with documentation and you write code to drive the silicon. I think the point, rather than having a bunch of 1024 bit integer CPUs (that would be big and consume a lot of power). The 1024 bit is really a separate issue, and since you can't buy it then perhaps it doesn't matter. All I was trying to say was that if I could get the same integer performance that they provide as floating point performance then this would have lots of other applications beyond graphics game programming. The 660ti delivers 2500 gigaflops. If there were an integer part that delivered 2500*10^9 big integer add,sub,mul,div,mod operations per second (is that 2500 gips? :) this would be really interesting. Even if, because the 1024 bits is 16 or 32 times wider, it went 16 or 32 times slower and only gave 80*10^9 integer operations per second, with support to do multiple precision, then there are lots of things where doing 80 billion big integer calculations a second would be worth buying. Integer should be easier, you don't need to do all those convoluted things to exactly match the IEEE 754 floating point math standard. But this is mostly off topic because I can't buy these. is to have a processor that is relatively small and relatively slow but that scales. It's the on chip network that sets these chips apart from the regular SMP on your CPU. To communicate from one to another, you have to use RAM, which is much slower than their purported on chip communication speed. Each core in the Nvidia is relatively small, they have 1300 of them in there, and really fast. And it scales, buy two or four of the cards and chain them, just like this open source project tells you to buy multiple cards and chain them when one card isn't enough. None of this has anything to do with using your CPU to do any of this, that is thousands of times slower. The Nvidia CUDA cores communicate within the chip and never take thousands of nanoseconds to go off chip to send a message back on chip. I can't access the document either. It says the server can't find it. Thank you. At least that says it isn't something I'm doing wrong. If anyone has the time and knowledge to write up a side-by-side performance comparison then please do so. Thank you Last edited: One benefit of the openness of their SDK is that you can see/edit how the code is actually compiled. If I understand correctly, the OpenCL (or CUDA) implementations for proprietary GPUs compile the code sent to them with some proprietary compiler (actually, I think it might be a modified LLVM, but I don't know for sure). In the end, you have no idea how well the compiler is making use of the hardware's parallel features. The cores in Nvidia's GPUs are actually slower, based on this: http://en.wikipedia.org/wiki/GeForce_600_Series. It lists most of the clock rates at about 800MHz. But they do have A LOT more, but I think the point is more to be open and still parallel. Also, Nvidia's documentation isn't so good: http://en.wikipedia.org/wiki/Nvidia#Open-source_software_support. Nor ATI's: http://en.wikipedia.org/wiki/Radeon#Linux. That is, if you don't want to sign a NDA. To me, open means you can play with it without violating some implicit agreement with the company or excessive difficulty and with some documentation to help. This definition isn't met by any existing many-core processor. I don't take what you're saying as criticism. (It wouldn't matter if I did, I have no connection to the company.) I just really like the idea of a (really) open parallel platform. :) Nvidia's documentation isn't so good. To me, open means you can play with it without violating some implicit agreement with the company or excessive difficulty and with some documentation to help. This definition isn't met by any existing many-core processor. I understand and appreciate that Nvidia's not releasing all the source code for their graphics drivers has p***ed off Linux people. But CUDA programming might be a different subject. http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html http://docs.nvidia.com/cuda/kepler-compatibility-guide/index.html http://docs.nvidia.com/cuda/kepler-tuning-guide/index.html http://docs.nvidia.com/cuda/cuda-dynamic-parallelism/index.html http://docs.nvidia.com/cuda/parallel-thread-execution/index.html and on and on and on http://docs.nvidia.com/cuda/index.html None of that is for a graphic driver. All that appears to be available without you accepting a legal contract for your behavior or use. Everything about the parallel computing project reminded me of a processor from twenty years ago, but repeated Google queries just couldn't turn it up. Then standing in the shower and it hit me. This sounds almost exactly like the old Inmos Transputer. Simple, small, embedded memory, communications links between them, carpet circuit boards with them, etc. http://en.wikipedia.org/wiki/Transputer Last edited: The parallela comes with a ready OS "Ubuntu" that you can use right out of the box. Some one posted this in another thread but here are his views "Very cool! Its weird, I was JUST reading a book where Feynman was giving a lecture in the late 70's about parallel low power processing in the future" Borek Mentor The parallela comes with So far it doesn't come with anything. Borek Mentor That's a prototype, not something that you can buy and it will "come" to you. Well these guys have already proved it with a full fledged working prototype and all they need is some money to get the flow going. This idea is huge and this company could as well reshape the face of personal computer. To bring some logic into the senario i have researched into their processors and the fastest cpu currently available to the public. For example my phone has an ARM Cortex-A9 quad core at 1.4ghz, going by their logic my phone alone runs at 5.6ghz in their equivilent to gain the same speed you would need 7 cpu's @ 800mhz(the speed they stated their processors to run at) my phone cost$750 when i first purchased it, now its worth $500, their price$693 (not including screen, apps, os ect.)

so this makes me think, what is the point of using microprocessors to try and compete with standard desktop computers, servers and their elledged "super computers" when just to compete with a phone they would need to take up more space than a standard atx motherboard. (7x cpus work out to be 60.452cm x 37.338cm)

Okay lets compare how many of these processors they would need to get the theoretical speed of an Intel E7-8870.
Specs
2.4ghz, 30mb cache, 10 cores, 20 threads.
Theoretical speed of 48ghz.
You can get servers with multiple physical cpu's but for this ill just stick to the single processor, this also supports 4tb of ram but i wont go into that.

the size of one of these cpu's is 49.17mm x 56.47mm

Now onto their equivilent, They would need 60 processors to do the equivilent 48ghz of speed now for the dimensions a Single processor is 86.36mm x 53.34mm, So for 60 of their physical cpus the amount of space would be 518.16cm x 320.04cm.

Now for the cache, the Intel E7-8870 has 30MB of cache with only 60 processors they would only get 1.92mb L1 cache, to get the equivilent in cache they would need close to 1000 processors. taking up 8.636m x 53.34m of course this would give a whopping 800ghz but then again it all comes down to the proccesses per second.

I really dislike the way a lot of companies and people in general try to compare just the Ghz of two cpus to determine the speed of a computer. its like saying the amd processor that holds the world record for the fastest clocked cpu is the best in the world.
thats getting a bit off topic but personally i wouldnt waste money on getting a p3 Tillamook equivilent processor to run a personal computer, on the other side of things as im not actually looking at this negatively i have just analised information, this technology has been around for years and years and it would be nice to see it in the open source area but for all those that are thinking of getting a "super computer" you would have to purchase 1000 of their microprocessors to even come close to a $4000 intel E7-8870 meaning you would have to spend$99,000 to get the same speeds and if you only go by ghz you would need 60 and need to spend $5,940. Regards, Logic. Last edited by a moderator: The parallela comes with a ready OS "Ubuntu" that you can use right out of the box. Ubuntu (Unix coded/Linux) OS is Free. http://www.ubuntu.com/ To bring some logic into the senario i have researched into their processors and the fastest cpu currently available to the public. For example my phone has an ARM Cortex-A9 quad core at 1.4ghz, going by their logic my phone alone runs at 5.6ghz in their equivilent to gain the same speed you would need 7 cpu's @ 800mhz(the speed they stated their processors to run at) my phone cost$750 when i first purchased it, now its worth $500, their price$693 (not including screen, apps, os ect.)

so this makes me think, what is the point of using microprocessors to try and compete with standard desktop computers, servers and their elledged "super computers" when just to compete with a phone they would need to take up more space than a standard atx motherboard. (7x cpus work out to be 60.452cm x 37.338cm)

Hi, the size of their proposed computer of 3.4x 2.1 inches already includes the Epiphany accelerator core that covers 16 to 64 cores. So within the size of 8.636 X 5.33 you get a performance of 16 x 700 Mhz and 64 x 700 Mhz that is 11.2 Ghz to 44.8 Ghz. So no need to stack them up like the Atx boards as you suggest and thus there is not problem with size. Still at this credit card size it will outperform a smart phone.

Intel E7-8870.
Specs
2.4ghz, 30mb cache, 10 cores, 20 threads.
Theoretical speed of 48ghz.
You can get servers with multiple physical cpu's but for this ill just stick to the single processor, this also supports 4tb of ram but i wont go into that.

the size of one of these cpu's is 49.17mm x 56.47mm

Now onto their equivilent, They would need 60 processors to do the equivilent 48ghz of speed now for the dimensions a Single processor is 86.36mm x 53.34mm, So for 60 of their physical cpus the amount of space would be 518.16cm x 320.04cm.

Now going by the same reasoning because within the given size it already comes with the Ephipahny acelerator chip it even out performs that Intel E7-8870. This is again with in the same size of a credit card and same power consumption of 3W .

Even going by the added Ghz logic still you do not require multiple boards. The specs itself clearly say that the single credit card sized computer can provide 45 Ghz performance.

For the reference again here are the stated specs
Zynq-7010 Dual-core ARM A9 CPU
Epiphany Multicore Accelerator (16 or 64 cores)
1GB RAM
MicroSD Card
USB 2.0 (two)
Two general purpose expansion connectors
Ethernet 10/100/1000
HDMI connection
Ships with Ubuntu OS
Ships with free open source Epiphany development tools that include C compiler, multicore debugger, Eclipse IDE, OpenCL SDK/compiler, and run time libraries.
Dimensions are 3.4'' x 2.1''

Within the 99$itself it will give you a maximum performance of 64 cores X 700 Mhz i.e 45 Ghz. So Parallela still wins. I understand that Ghz is not the only measure to rate computation capability. Coming to RAM it is not an issue. The same was the problem with Raspbery PI with their intial models having 128MB and 64Mb RAM. They now have upgraded the model with 512 MB RAM just for a fraction of cost. So parallela still wins by Speed, Size and power consumption. By the way the single core speed is 700 Mhz. So on a credit card size computer you are getting 700 x 64 = 45Ghz max. there was an error in my calculations, to get 30MB or ram from the 32Kib ram, (i read it as KB) would mean 10,000 cpu's not 1000, so you would need to spend close to$1milion to get the same performance as a $4000 cpu. Ghz is not everything, as stated in my original post. Just because you have a theoretical speed of 48ghz does not mean you have a speed of 48ghz. theoretical speeds become more accurate when dealing with just phyiscal cores but goes well out of perportion when talking threads, you could have a gogolplex threads but if each of them are running at 0.000.....(100 0's)5ghz you would get a theroetical speed of 5ghz but when a computer only uses one thread to perform a task it would come to a stand still, yes this has taken it to the extreme but its mainly to show you that regardless of how many threads you get on a single physical cpu it has limitations. So to remake the "theoretical" with physical cores using the same specs as before but only using physical cores. 10x2.4ghz = 24ghz theoretical speed 1x700mhz = 700mhz theoretical speed. this is a lot more accurate(not including cache.) so in that regard you would still need 30 physical cpu's to match one intel E-8870. In this regard yes this is a cheaper alternative pricing in at$2970 vs the \$4000~ for the intel.

BUT bare in mind that it all doesnt come down to just the frequency that the cpu's are running, the cache is a big part of processing power.

I still beleive you would need 10,000 of those cpu's to come close to an intel E7-8870 due to cache and processing power. Threads are only as good as the program is designed to use them.