Python, scientific computing, and supercomputers


by stargazer3
Tags: parallel processing, python
stargazer3
stargazer3 is offline
#1
Nov18-12, 08:01 AM
P: 44
Hi there!

I'm currently having great fun using numpy/scipy in python for astronomical data analysis. (I've been using C for this before, but it takes too much time to implement simple things that are in numpy/scipy already)

Recently I've been told that most of people are using C or Fortran for running their code on supercomputers (which I am planning to do in the future), and the reason was given to me as follows: "high-level languages are poorly suited for this purpose". Are they? If so, why? Is it only because python does not handle parallel processing effieciently enough, or is there another reason?

I was glad to jump to python (it's like matlab and gcc in one package!), and the perspective of falling down to C or Fortran looks a little bit gloomy for me now:)

Cheers!
Phys.Org News Partner Science news on Phys.org
Cougars' diverse diet helped them survive the Pleistocene mass extinction
Cyber risks can cause disruption on scale of 2008 crisis, study says
Mantis shrimp stronger than airplanes
Borek
Borek is online now
#2
Nov18-12, 09:22 AM
Admin
Borek's Avatar
P: 22,705
Python is interpreted, which makes it inefficient. Using supercomputer to run an inefficient program is kind of putting things on the head, as you are using super expensive hardware to speed up calculations that can be speed up (probably by orders of magnitude) just by switching to a different language.
stargazer3
stargazer3 is offline
#3
Nov18-12, 09:50 AM
P: 44
Oh, okay, I get it. So the interpreter is slower because it is not only executing, but also analysing the code, right?
Also, is it possible to compile the python code? Uhm, it may be a dumb question, but I really can not figure it out.
Thanks!

Borek
Borek is online now
#4
Nov18-12, 11:36 AM
Admin
Borek's Avatar
P: 22,705

Python, scientific computing, and supercomputers


Quote Quote by stargazer3 View Post
So the interpreter is slower because it is not only executing, but also analysing the code, right?
Yes.

Also, is it possible to compile the python code?
Technically it should be possible, but I am not aware of any compiler. That is, my understanding is that there are compilers that prepare a bytecode - kind of an intermediate language, execution is faster than in the case of the pure python code, but definitely slower than if the program was directly translated into machine code.

But I can be wrong, I have never used python for any serious programming.
AlephZero
AlephZero is online now
#5
Nov18-12, 04:47 PM
Engineering
Sci Advisor
HW Helper
Thanks
P: 6,383
The issue with "interpreted languages" is not just the fact that they interpreted, but how much work you can do with one language statement (or function call) from the part that is interpreted to the part of the system that is NOT written in Python. I'm not famuliar with Python, but if the numpy/scipy libraries contain routines for operating on matrices, solving equations, doing fourier transforms, etc, it's quite possible those routines could run very efficiently on a supercomputer.

The first thing to do is measure which parts of your code are taking the time, and then decide what options you have to do something about it. The closer you can get to your program spending 100% of its run time in library routines, the better.

One of Amdahl's "laws" of computing optimization applies here: if you could magically reduce 50% of your code's execution time to zero, overall your program would only run twice as fast. But if you could do the same with 99% of your code, it would run 100 times as fast.

Don't forget the best way to magically reduce your code's execution time to "zero" is at the highest level, by selecting the best algorithms that solve the problem with the least amount of work. As a trivial example, the difference between sorting ##N## items in a time proportional to ##N^2## and ## N \log_2 N## doesn't matter much if ##N = 10##, but it is a lot more important if ##N = 10,000,000##.
rcgldr
rcgldr is offline
#6
Nov18-12, 06:31 PM
HW Helper
P: 6,929
Quote Quote by stargazer3 View Post
I'm currently having great fun using numpy/scipy in python for astronomical data analysis. (I've been using C for this before, but it takes too much time to implement simple things that are in numpy/scipy already)
Doing a web search for Python to C / C++ translators gets a few hits, but I don't know if you could access the numpy / scipy library with a translated and compiled code. I tried a web search for Python compiler, but didn't get that many hits.

Quote Quote by stargazer3 View Post
Recently I've been told that most of people are using C or Fortran for running their code on supercomputers (which I am planning to do in the future), and the reason was given to me as follows: "high-level languages are poorly suited for this purpose".
The issue isn't "high-level" language, since modern Fortran implementations could be considered "high-level" compared to classic C. In the case of Fortran, extensions have been made to the Fortran language, some processor specific, in order to take advantage of parallel and/or vector oriented processors used in supercomputers.
phyzguy
phyzguy is offline
#7
Nov18-12, 07:38 PM
P: 2,071
Stargazer3 - Python is much more efficient than you have been led to believe in these responses. First of all, the number-crunching parts of numpy and scipy are already written in C, so they are nearly as efficient as native C. Try running some benchmarks. Second, there is a great addition to Python called Cython that compiles your Python code into C after you have made a few simple changes. I find the best approach is:

(1) Write the code in Python and get it working
(2) Figure out where your code is spending most of its time - usually in the innermost parts of the loops.
(3) "Cythonize" this part of your code and compile it with Cython

This gives you working code with nearly the same speed as C, but with much less development time.
stargazer3
stargazer3 is offline
#8
Nov23-12, 02:15 AM
P: 44
Thanks for all the responses, that's quite a feedback!
coalquay404
coalquay404 is offline
#9
Nov24-12, 04:39 PM
P: 218
Quote Quote by phyzguy View Post
Stargazer3 - Python is much more efficient than you have been led to believe in these responses. First of all, the number-crunching parts of numpy and scipy are already written in C, so they are nearly as efficient as native C. Try running some benchmarks. Second, there is a great addition to Python called Cython that compiles your Python code into C after you have made a few simple changes. I find the best approach is:

(1) Write the code in Python and get it working
(2) Figure out where your code is spending most of its time - usually in the innermost parts of the loops.
(3) "Cythonize" this part of your code and compile it with Cython

This gives you working code with nearly the same speed as C, but with much less development time.
Unless you give specific examples, discussions such as this are pointless. Python will be fine for lots of use cases; for others it will be utterly horrible. A case in point is the sort of code that one is likely to put on hardware that qualifies as a supercomputer. Python is *horrible* at concurrency thanks to its global interpreter lock. As a result, you're not going to see it being used for the sorts of things that the OP is talking about.


Register to reply

Related Discussions
How to make scientific notation the default in the Python interpreter? Programming & Computer Science 2
Scientific Computing Career Guidance 1