Python Python, scientific computing, and supercomputers

AI Thread Summary
Python, particularly with libraries like numpy and scipy, is gaining traction in scientific computing, especially for tasks like astronomical data analysis, due to its ease of use compared to C or Fortran. However, concerns about Python's efficiency on supercomputers arise from its interpreted nature, which can hinder performance, particularly in parallel processing. While the core functionalities of numpy and scipy are implemented in C, making them efficient, Python's global interpreter lock limits its concurrency capabilities. Tools like Cython can help bridge the gap by compiling Python code into C, allowing for enhanced performance while maintaining development speed. Ultimately, the choice of language should depend on the specific use case and performance requirements.
stargazer3
Messages
44
Reaction score
3
Hi there!

I'm currently having great fun using numpy/scipy in python for astronomical data analysis. (I've been using C for this before, but it takes too much time to implement simple things that are in numpy/scipy already)

Recently I've been told that most of people are using C or Fortran for running their code on supercomputers (which I am planning to do in the future), and the reason was given to me as follows: "high-level languages are poorly suited for this purpose". Are they? If so, why? Is it only because python does not handle parallel processing effieciently enough, or is there another reason?

I was glad to jump to python (it's like MATLAB and gcc in one package!), and the perspective of falling down to C or Fortran looks a little bit gloomy for me now:)

Cheers!
 
Technology news on Phys.org
Python is interpreted, which makes it inefficient. Using supercomputer to run an inefficient program is kind of putting things on the head, as you are using super expensive hardware to speed up calculations that can be speed up (probably by orders of magnitude) just by switching to a different language.
 
Oh, okay, I get it. So the interpreter is slower because it is not only executing, but also analysing the code, right?
Also, is it possible to compile the python code? Uhm, it may be a dumb question, but I really can not figure it out.
Thanks!
 
stargazer3 said:
So the interpreter is slower because it is not only executing, but also analysing the code, right?

Yes.

Also, is it possible to compile the python code?

Technically it should be possible, but I am not aware of any compiler. That is, my understanding is that there are compilers that prepare a bytecode - kind of an intermediate language, execution is faster than in the case of the pure python code, but definitely slower than if the program was directly translated into machine code.

But I can be wrong, I have never used python for any serious programming.
 
The issue with "interpreted languages" is not just the fact that they interpreted, but how much work you can do with one language statement (or function call) from the part that is interpreted to the part of the system that is NOT written in Python. I'm not famuliar with Python, but if the numpy/scipy libraries contain routines for operating on matrices, solving equations, doing Fourier transforms, etc, it's quite possible those routines could run very efficiently on a supercomputer.

The first thing to do is measure which parts of your code are taking the time, and then decide what options you have to do something about it. The closer you can get to your program spending 100% of its run time in library routines, the better.

One of Amdahl's "laws" of computing optimization applies here: if you could magically reduce 50% of your code's execution time to zero, overall your program would only run twice as fast. But if you could do the same with 99% of your code, it would run 100 times as fast.

Don't forget the best way to magically reduce your code's execution time to "zero" is at the highest level, by selecting the best algorithms that solve the problem with the least amount of work. As a trivial example, the difference between sorting ##N## items in a time proportional to ##N^2## and ## N \log_2 N## doesn't matter much if ##N = 10##, but it is a lot more important if ##N = 10,000,000##.
 
stargazer3 said:
I'm currently having great fun using numpy/scipy in python for astronomical data analysis. (I've been using C for this before, but it takes too much time to implement simple things that are in numpy/scipy already)
Doing a web search for Python to C / C++ translators gets a few hits, but I don't know if you could access the numpy / scipy library with a translated and compiled code. I tried a web search for Python compiler, but didn't get that many hits.

stargazer3 said:
Recently I've been told that most of people are using C or Fortran for running their code on supercomputers (which I am planning to do in the future), and the reason was given to me as follows: "high-level languages are poorly suited for this purpose".
The issue isn't "high-level" language, since modern Fortran implementations could be considered "high-level" compared to classic C. In the case of Fortran, extensions have been made to the Fortran language, some processor specific, in order to take advantage of parallel and/or vector oriented processors used in supercomputers.
 
Stargazer3 - Python is much more efficient than you have been led to believe in these responses. First of all, the number-crunching parts of numpy and scipy are already written in C, so they are nearly as efficient as native C. Try running some benchmarks. Second, there is a great addition to Python called Cython that compiles your Python code into C after you have made a few simple changes. I find the best approach is:

(1) Write the code in Python and get it working
(2) Figure out where your code is spending most of its time - usually in the innermost parts of the loops.
(3) "Cythonize" this part of your code and compile it with Cython

This gives you working code with nearly the same speed as C, but with much less development time.
 
Thanks for all the responses, that's quite a feedback!
 
phyzguy said:
Stargazer3 - Python is much more efficient than you have been led to believe in these responses. First of all, the number-crunching parts of numpy and scipy are already written in C, so they are nearly as efficient as native C. Try running some benchmarks. Second, there is a great addition to Python called Cython that compiles your Python code into C after you have made a few simple changes. I find the best approach is:

(1) Write the code in Python and get it working
(2) Figure out where your code is spending most of its time - usually in the innermost parts of the loops.
(3) "Cythonize" this part of your code and compile it with Cython

This gives you working code with nearly the same speed as C, but with much less development time.

Unless you give specific examples, discussions such as this are pointless. Python will be fine for lots of use cases; for others it will be utterly horrible. A case in point is the sort of code that one is likely to put on hardware that qualifies as a supercomputer. Python is *horrible* at concurrency thanks to its global interpreter lock. As a result, you're not going to see it being used for the sorts of things that the OP is talking about.
 
Back
Top