Python, scientific computing, and supercomputers

In summary: Python interpreter. Finally, you can compile your Python code with a Python compiler, which will give you machine code.
  • #1
stargazer3
44
3
Hi there!

I'm currently having great fun using numpy/scipy in python for astronomical data analysis. (I've been using C for this before, but it takes too much time to implement simple things that are in numpy/scipy already)

Recently I've been told that most of people are using C or Fortran for running their code on supercomputers (which I am planning to do in the future), and the reason was given to me as follows: "high-level languages are poorly suited for this purpose". Are they? If so, why? Is it only because python does not handle parallel processing effieciently enough, or is there another reason?

I was glad to jump to python (it's like MATLAB and gcc in one package!), and the perspective of falling down to C or Fortran looks a little bit gloomy for me now:)

Cheers!
 
Technology news on Phys.org
  • #2
Python is interpreted, which makes it inefficient. Using supercomputer to run an inefficient program is kind of putting things on the head, as you are using super expensive hardware to speed up calculations that can be speed up (probably by orders of magnitude) just by switching to a different language.
 
  • #3
Oh, okay, I get it. So the interpreter is slower because it is not only executing, but also analysing the code, right?
Also, is it possible to compile the python code? Uhm, it may be a dumb question, but I really can not figure it out.
Thanks!
 
  • #4
stargazer3 said:
So the interpreter is slower because it is not only executing, but also analysing the code, right?

Yes.

Also, is it possible to compile the python code?

Technically it should be possible, but I am not aware of any compiler. That is, my understanding is that there are compilers that prepare a bytecode - kind of an intermediate language, execution is faster than in the case of the pure python code, but definitely slower than if the program was directly translated into machine code.

But I can be wrong, I have never used python for any serious programming.
 
  • #5
The issue with "interpreted languages" is not just the fact that they interpreted, but how much work you can do with one language statement (or function call) from the part that is interpreted to the part of the system that is NOT written in Python. I'm not famuliar with Python, but if the numpy/scipy libraries contain routines for operating on matrices, solving equations, doing Fourier transforms, etc, it's quite possible those routines could run very efficiently on a supercomputer.

The first thing to do is measure which parts of your code are taking the time, and then decide what options you have to do something about it. The closer you can get to your program spending 100% of its run time in library routines, the better.

One of Amdahl's "laws" of computing optimization applies here: if you could magically reduce 50% of your code's execution time to zero, overall your program would only run twice as fast. But if you could do the same with 99% of your code, it would run 100 times as fast.

Don't forget the best way to magically reduce your code's execution time to "zero" is at the highest level, by selecting the best algorithms that solve the problem with the least amount of work. As a trivial example, the difference between sorting ##N## items in a time proportional to ##N^2## and ## N \log_2 N## doesn't matter much if ##N = 10##, but it is a lot more important if ##N = 10,000,000##.
 
  • #6
stargazer3 said:
I'm currently having great fun using numpy/scipy in python for astronomical data analysis. (I've been using C for this before, but it takes too much time to implement simple things that are in numpy/scipy already)
Doing a web search for Python to C / C++ translators gets a few hits, but I don't know if you could access the numpy / scipy library with a translated and compiled code. I tried a web search for Python compiler, but didn't get that many hits.

stargazer3 said:
Recently I've been told that most of people are using C or Fortran for running their code on supercomputers (which I am planning to do in the future), and the reason was given to me as follows: "high-level languages are poorly suited for this purpose".
The issue isn't "high-level" language, since modern Fortran implementations could be considered "high-level" compared to classic C. In the case of Fortran, extensions have been made to the Fortran language, some processor specific, in order to take advantage of parallel and/or vector oriented processors used in supercomputers.
 
  • #7
Stargazer3 - Python is much more efficient than you have been led to believe in these responses. First of all, the number-crunching parts of numpy and scipy are already written in C, so they are nearly as efficient as native C. Try running some benchmarks. Second, there is a great addition to Python called Cython that compiles your Python code into C after you have made a few simple changes. I find the best approach is:

(1) Write the code in Python and get it working
(2) Figure out where your code is spending most of its time - usually in the innermost parts of the loops.
(3) "Cythonize" this part of your code and compile it with Cython

This gives you working code with nearly the same speed as C, but with much less development time.
 
  • #8
Thanks for all the responses, that's quite a feedback!
 
  • #9
phyzguy said:
Stargazer3 - Python is much more efficient than you have been led to believe in these responses. First of all, the number-crunching parts of numpy and scipy are already written in C, so they are nearly as efficient as native C. Try running some benchmarks. Second, there is a great addition to Python called Cython that compiles your Python code into C after you have made a few simple changes. I find the best approach is:

(1) Write the code in Python and get it working
(2) Figure out where your code is spending most of its time - usually in the innermost parts of the loops.
(3) "Cythonize" this part of your code and compile it with Cython

This gives you working code with nearly the same speed as C, but with much less development time.

Unless you give specific examples, discussions such as this are pointless. Python will be fine for lots of use cases; for others it will be utterly horrible. A case in point is the sort of code that one is likely to put on hardware that qualifies as a supercomputer. Python is *horrible* at concurrency thanks to its global interpreter lock. As a result, you're not going to see it being used for the sorts of things that the OP is talking about.
 

1. What is Python and how is it used in scientific computing?

Python is a high-level, interpreted programming language that is widely used in scientific computing. It is a versatile language that is popular for its simplicity, readability, and flexibility. In scientific computing, Python is used to process, analyze, and visualize large and complex data sets, as well as to create simulations and models for scientific research.

2. What are the advantages of using Python in scientific computing?

There are several advantages to using Python in scientific computing. First, Python has a large and active community of users, which means there is a vast amount of resources and support available. Additionally, Python has a wide range of libraries and packages specifically designed for scientific computing, making it a powerful tool for data analysis and visualization. It is also easy to learn and use, making it accessible to both beginners and experts alike.

3. What is the role of supercomputers in scientific computing?

Supercomputers are high-performance computing systems that are used to solve complex and computationally demanding problems in various fields, including scientific computing. These machines have a significantly higher processing power and memory capacity than standard computers, which allows them to handle large and complex data sets and simulations. Supercomputers are essential in scientific computing as they can significantly speed up research and enable scientists to tackle more complex problems.

4. How does Python contribute to the development of supercomputers?

Python plays a significant role in the development of supercomputers. Many of the software tools and libraries used in supercomputers are written in Python, making it an integral part of the system. Additionally, Python's versatility and ease of use make it a popular choice for programming applications on supercomputers. It also has efficient interfaces to other languages, making it easier to integrate with other software components on the supercomputer.

5. Can Python be used for parallel computing on supercomputers?

Yes, Python can be used for parallel computing on supercomputers. Parallel computing involves breaking down a problem into smaller tasks and running them simultaneously on multiple processors. Python has built-in libraries and tools, such as multiprocessing and multithreading, that allow it to take advantage of the parallel processing capabilities of supercomputers. Additionally, there are also external libraries, such as MPI4py, that enable Python to communicate with other languages commonly used in parallel computing on supercomputers.

Similar threads

  • Programming and Computer Science
Replies
9
Views
2K
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
14
Views
4K
  • Programming and Computer Science
Replies
8
Views
1K
  • Programming and Computer Science
Replies
13
Views
2K
  • Programming and Computer Science
Replies
18
Views
5K
  • Programming and Computer Science
Replies
3
Views
4K
  • Programming and Computer Science
Replies
13
Views
5K
Back
Top