Bad programming skills = biggest hurdle to astronomy research

  • Thread starter Simfish
  • Start date

Nabeshin

Science Advisor
2,204
16
Seems very conceivable to me.

If you look at the amount of data we're currently producing, it's, to use exactly the right word, astronomical. A lot of it hasn't even been sifted through in the correct manner to yield some likely interesting science results. For example, you have a lot of interest right now in detecting extrasolar planets. But the data from this type of analysis, especially on the scale of hundreds of thousands of stars like Kepler, can be immensely useful to discover some stellar physics in its own right. It's just, to sort through it all is such a massive task (no doubt relegated to computers) that programming it all in is undoubtedly a major challenge.

And it's only likely to get worse, as next generation telescopes like the LSST are going to produce even larger amounts of data. In the case of LSST, ~20TB per night of observations. Figuring out how to store, reference, and analyze that is all a programming task of large magnitude.
 

D H

Staff Emeritus
Science Advisor
Insights Author
15,326
680
What are your thoughts on this?
Dead on. From my experience, most scientists and engineers make for incredibly bad programmers. My opinion: The only reason computer science majors are not used to write our scientific /engineering programs is that most computer scientists fare even worse at doing science and engineering than we do at doing programming.

It isn't that hard to program well. We know what a good engineering design or a solid scientific theory looks like. We can learn what a well-constructed program looks like. It does take some training, however. It is a bit arrogant on our part to think that training is not required.
 
Dead on. From my experience, most scientists and engineers make for incredibly bad programmers. My opinion: The only reason computer science majors are not used to write our scientific /engineering programs is that most computer scientists fare even worse at doing science and engineering than we do at doing programming.

It isn't that hard to program well. We know what a good engineering design or a solid scientific theory looks like. We can learn what a well-constructed program looks like. It does take some training, however. It is a bit arrogant on our part to think that training is not required.
It is a bit arrogant on our part to think that training is not required.
It's also a bit arrogant to think that it's not hard to program well. :wink:

I'd also comment on the computer scientist's inability to do science but I won't push those buttons. Computer scientists aren't inherently better programmers than scientists of any other kind. Here too programming is vaguely taught and encouraged but is not an inherent part of the course.
 

turbo

Gold Member
3,027
44
It might be a good time to reflect on the interaction. Astronomers need to tell the programmers/analysts what they are attempting to tease out of the mountain of data, and they need to explain what they think the tell-tale signs in the data might look like (variations in total flux, variations in peak wavelength, and so on). Programmers need to come up with algorithms that can sift through the data efficiently, and they need to communicate with the astronomers when their output isn't clean or as expected, so that they can get more guidance and modify their search.

It's not rocket-science. Observational astronomy is not hands-on. Your "subjects" are far away, in physical space and in time. You have a suite of instruments to make observations, and you have (often) a mountain of data (often with a high noise:signal ratio) from which to glean some information that may or may not support your preconceptions. It is short-sighted to lay research hurdles on "bad programming", IMO. "Bad communication" is more likely.
 
If they already have the programs that "work" why not just give them to a programmer and have them make it work well. That way the program does exactly what is intended and all the problems and slow running time can be removed. And who knows it might allow for a much broader or precise search. Much of these differences can be night and day.
 
If they already have the programs that "work" why not just give them to a programmer and have them make it work well..
Because bad programs cannot be fixed, they must be written as if the original had never existed. The difference between a program that works and one that works well is vastly greater than that between a program that works and one that doesn't work at all.
 
Because bad programs cannot be fixed, they must be written as if the original had never existed. The difference between a program that works and one that works well is vastly greater than that between a program that works and one that doesn't work at all.
It sounds like part of the problem was that they couldn't explain to the programmers exactly what they wanted so programmers couldn't do what the already written programs could do.

This is why you give the programs to the programmers and have them see what the program actually does. They can then make a program that gives the same results, but works much better and has more features.
 

D H

Staff Emeritus
Science Advisor
Insights Author
15,326
680
What programmers?

The people who write the astronomical codes discussed in the article are predominantly astronomy grad students. An astronomy department would have to cut two of those grad students to hire one programmer, and for that paltry sum they just might be able to hire a freshout with a BS in IT who graduated well into the bottom half of the class.
 
This is why you give the programs to the programmers and have them see what the program actually does. They can then make a program that gives the same results, but works much better and has more features.
That sounds awesome in theory, but works out rather badly when you actually have to figure out somebody's scientific computing code full of all sorts of crazy math, almost no comments, and lots of hacks to keep the code from crashing. I spend a good chunk of time using and rewriting a labmate's code to make it robust enough for my purposes and it's like pulling teeth to get an explanation of the code that makes any sense to me.

I'll chime in that by no means is this limited to astronomy research. I'm in applied CS-the one field where you'd expect to see halfway decent code-and I still see all the same problems 'cause many people assume that they're writing the code as a one off to do some number crunching and therefore don't think about maintainability at all. Actually, I think the biggest hindrance to good code probably is that very few researchers have the luxery of taking a week (or a few weeks) to properly write, test, document, and refactor their code.

The people who write the astronomical codes discussed in the article are predominantly astronomy grad students.
In theory code cleanup would be a great task to farm out to undergrads, but the math involved in the programming makes it totally unfeasible a lot of the time.

Figuring out how to store, reference, and analyze that is all a programming task of large magnitude.
One of the fun things about working with very large datasets is that I'm usually the only person in the room who cares about the space complexity as much (if not more than) the time complexity of any of the algorithms used to do the number crunching.
 
If they already have the programs that "work" why not just give them to a programmer and have them make it work well.
Because if you can precisely explain exactly what equations need to be programmed, then you've already written the program.

It's easier to teach an astrophysicist how to program well, than it is to teach a programmer astrophysics. While there are astrophysicists that are awful programmers, there are astrophysicists that can program extreme well.
 

Chronos

Science Advisor
Gold Member
11,398
731
I trust an astrophysict's 'plodding' algorithms more than I would ever trust a programmer's ability to figure out what it is they are trying to calculate. Yes, the astrophysicist will not write programs as efficiently as an IT major, but, they still work. I see, however, no reason not to run the program by an IT guy to ensure it is doing what they intend it to do.
 
Last edited:

Chronos

Science Advisor
Gold Member
11,398
731
It is not hard to write effective code, merely to write efficient code. This was an issue 30 years ago when memory was expensive. This is no longer true. You can now write horribly inefficienct code and no one cares - aside from waiting for it to process.
 

Simfish

Gold Member
814
2
It is not hard to write effective code, merely to write efficient code. This was an issue 30 years ago when memory was expensive. This is no longer true. You can now write horribly inefficienct code and no one cares - aside from waiting for it to process.
Haha so true. But what about code for supercomputers? (code that might take several days to process?) Or code that, say, requires 8 GM of RAM to process? (seriously, I once had to run code that required 8 GB of RAM for certain parameters).
 
Ideally the sciences should be re-structured as a business where there is an IT department that they can work with. The scientists then become the analysts and testers of the code leaving the actual programming to those who know exactly what they are doing.

In my 'field' (minor planets), the most 'efficient' software is generated by amateurs (amateur astronomers) who are expert programmers (do programming for a living) working side by side with the professional astronomers in the field. Yes, the Pros still have their own software but the amateurs software is generally all encompassing, user friendly and produces reliable results much quicker for those who don't have to be experts in the field. The Professionals software came first of course, but the amateurs took it and built it 'better' (better is of course relative to who the user is and the results we get from it)

Cheers

David
 

D H

Staff Emeritus
Science Advisor
Insights Author
15,326
680
It is not hard to write effective code, merely to write efficient code. This was an issue 30 years ago when memory was expensive. This is no longer true. You can now write horribly inefficienct code and no one cares - aside from waiting for it to process.
I have a number of problems with the above. I am having a very hard time parsing your first sentence. For one thing, you are using two words, effective and efficient, that are synonyms / near synonyms of one another. For another, that parenthetical remark is a bit hard to parse. I think you are saying "It is not hard to write effective code. What is hard is writing efficient code." If that is the correct interpretation, I take exception to it.

Moreover, people still do care about performance. Comparisons to what computers could do thirty years ago is a bit misleading. We are now doing things with computers that we simply could not do thirty years ago. A poorly designed, poorly implemented system means I cannot do some kinds of analyses (e.g., a statistically valid Monte Carlo simulation) that I could do were the system designed and implemented better. Instead I am limited to doing a poor man's Monte Carlo because of that poor design.

I'll define "effectiveness" as "doing the job, all of it, correctly" and efficiency as "minimizing use of some particular resource". With this definition, efficiency is but one part of effectiveness.

Performance (efficiency) can paradoxically be both under- and over-emphasized in scientific software. From my experience, scientists and engineers tend to underemphasize performance concerns during system design and overemphasize it late in the game (coding and maintenance). Properly worrying about performance during design can eliminate a lot of problems further on down the road. Where will the performance demons lie? Will the system be used in ways that require us to pay extra attention to resources or outfit the program so as to circumvent resource issues? First worrying about performance during the coding stage leads to implementing the system in a language such as Python or Matlab that is far too slow for the intended use and in programmers who optimize the 99.9% of the code that consumes 0.1% of the CPU time but miss the boat on the 0.1% of the code that consumes 99.9% of the CPU time. First worrying about performance after the system is built leads to even worse nightmares.

A big part of effectiveness is making a system that is understandable, testable, and maintainable. The most efficient code is often difficult to understand, very hard to test, and even harder to maintain. Efficiency goes against almost every other measure of software quality.
 
1,674
3
This isn't as either-or as it is seeming in this thread.

There are engineers who specialize in writing scientific code. They are fully capable of taking the differential equation (or whatever) and programming the discretized solution. The better ones can do it in any type of hardware. They are fully capable of debugging the physics on their own as long as the physicists supply the test problem and expected answer.

The terabytes/day of data problem is totally different. For this you must use computer scientists. It's just not what engineers or physicists do.
 

D H

Staff Emeritus
Science Advisor
Insights Author
15,326
680
A few are saying it's either-or. Those are the ones who are saying that programming should be turned over to the IT department. my opinion: Yech.

What I've been saying is that scientists and engineers can learn to program well. It's just not something that most can pick up on their own. Some training is needed. Colleges require students of science and engineering to take a minimum of two or three calculus classes, and often quite a bit more math beyond that. Very few require students of science and engineering to take anything beyond an introductory computer programming class. A lot don't require *any* classes in computer science.
 
Very few require students of science and engineering to take anything beyond an introductory computer programming class. A lot don't require *any* classes in computer science.
And personally I think that's a good thing, since CS classes are often terrible for teaching application programming. What's not surprising is the number of astronomy Ph.D.'s that are terrible programmers, what is more surprising is the number of CS Ph.D.'s that are terrible programmers.

And it really shouldn't be surprising once you think about it. Just because you are a professor of English literature doesn't mean that you can write good short stories.
 
I trust an astrophysict's 'plodding' algorithms more than I would ever trust a programmer's ability to figure out what it is they are trying to calculate. Yes, the astrophysicist will not write programs as efficiently as an IT major, but, they still work.
And it's likely to be much faster. Working on high-performance computing is not part of the typical IT major's curriculum.
 
It is not hard to write effective code, merely to write efficient code. This was an issue 30 years ago when memory was expensive.
It's actually quite hard and getting harder. The key to CPU programming is to keep everything on the L1 cache, which is quite limited and requires a lot of tricks. Then there is GPU and multi-core/multi-threaded programming which adds a different level of complexity.

This is no longer true. You can now write horribly inefficienct code and no one cares - aside from waiting for it to process.
People do care in astrophysics and finance. A simulation can take two weeks, and a factor of 2 speedup makes the difference between a calculation that you can't do and one you can. In finance, what options you can sell often limited by how much compute power that you have.
 

Simfish

Gold Member
814
2
And personally I think that's a good thing, since CS classes are often terrible for teaching application programming. What's not surprising is the number of astronomy Ph.D.'s that are terrible programmers, what is more surprising is the number of CS Ph.D.'s that are terrible programmers.
What about applied math courses?

These courses, in particular:

AMATH 581 Scientific Computing (5)
Project-oriented computational approach to solving problems arising in the physical/engineering sciences, finance/economics, medical, social, and biological sciences. Problems requiring use of advanced MATLAB routines and toolboxes. Covers graphical techniques for data presentation and communication of scientific results.

AMATH 582 Computational Methods for Data Analysis (5)
Exploratory and objective data analysis methods applied to the physical, engineering, and biological sciences. Brief review of statistical methods and their computational implementation for studying time series analysis, spectral analysis, filtering methods, principal component analysis, orthogonal mode decomposition, and image processing and compression. Offered: W.

AMATH 583 High-Performance Scientific Computing (5)
Introduction to hardware, software, and programming for large-scale scientific computing. Overview of multicore, cluster, and supercomputer architectures; procedure and object oriented languages; parallel computing paradigms and languages; graphics and visualization of large data sets; validation and verification; and scientific software development. Offered: Sp.

AMATH 584 Applied Linear Algebra and Introductory Numerical Analysis (5)
Numerical methods for solving linear systems of equations, linear least squares problems, matrix eigen value problems, nonlinear systems of equations, interpolation, quadrature, and initial value ordinary differential equations. Offered: jointly with MATH 584; A.

AMATH 585 Numerical Analysis of Boundary Value Problems (5)
Numerical methods for steady-state differential equations. Two-point boundary value problems and elliptic equations. Iterative methods for sparse symmetric and non-symmetric linear systems: conjugate-gradients, preconditioners. Prerequisite: AMATH 581 or MATH 584 which may be taken concurrently. Offered: jointly with MATH 585; W.

AMATH 586 Numerical Analysis of Time Dependent Problems (5)
Numerical methods for time-dependent differential equations, including explicit and implicit methods for hyperbolic and parabolic equations. Stability, accuracy, and convergence theory. Spectral and pseudospectral methods. Prerequisite: AMATH 581 or AMATH 584. Offered: jointly with ATM S 581/MATH 586; Sp.
And what about these ones? If you know computer systems, could that make you better at CPU programming?

CSE 410 Computer Systems (3)
Structure and components of hardware and software systems. Machine organization, including central processor and input-output architectures; assembly language programming; operating systems, including process, storage, and file management. Intended for non-majors. No credit to students who have completed CSE 351, CSE 378, or CSE 451. Prerequisite: CSE 373.

CSE 417 Algorithms and Computational Complexity (3)
Design and analysis of algorithms and data structures. Efficient algorithms for manipulating graphs and strings. Fast Fourier Transform. Models of computation, including Turing machines. Time and space complexity. NP-complete problems and undecidable problems. Intended for non-majors. Prerequisite: CSE 373.

CSE 446 Machine Learning (3)
Methods for designing systems that learn from data and improve with experience. Supervised learning and predictive modeling: decision trees, rule induction, nearest neighbors, Bayesian methods, neural networks, support vector machines, and model ensembles. Unsupervised learning and clustering. Prerequisite: either CSE 326 or CSE 332; either STAT 390, STAT 391, or CSE 312.

CSE 415 Introduction to Artificial Intelligence (3) NW
Principles and programming techniques of artificial intelligence: LISP, symbol manipulation, knowledge representation, logical and probabilistic reasoning, learning, language understanding, vision, expert systems, and social issues. Intended for non-majors. Not open for credit to students who have completed CSE 473. Prerequisite: CSE 373.

CSE 373 Data Structures and Algorithms (3)
Fundamental algorithms and data structures for implementation. Techniques for solving problems by programming. Linked lists, stacks, queues, directed graphs. Trees: representations, traversals. Searching (hashing, binary search trees, multiway trees). Garbage collection, memory management. Internal and external sorting
 
Last edited:
There are engineers who specialize in writing scientific code. They are fully capable of taking the differential equation (or whatever) and programming the discretized solution. The better ones can do it in any type of hardware. They are fully capable of debugging the physics on their own as long as the physicists supply the test problem and expected answer.
If you take a PDE and give it to someone that doesn't understand PDE's, the code won't work. Also this type of work is something that physics Ph.D.'s get hired to do.

The terabytes/day of data problem is totally different. For this you must use computer scientists. It's just not what engineers or physicists do.
Some do. Astrophysical CFD simulations can and do general gigabytes of data per second, and if you work on one of those projects, you can get very quickly familiar with the nitty-gritty of data storage. People that work on geological systems routinely deal with multi tetrabyte databases. And then there are the bioinformatics people. Once you've sequenced the human genome, storing that information is non-trivial.

Also if you take the attitude "that's not my job" you aren't going to last very long as a physics student. If you start generating multi-gigabyte/second data, and you don't know the CS to deal with that, then learn it.
 
What about applied math courses?
The big problem with those courses is that they generally don't give you experience in working on hundred-person project teams with millions of source lines of code. Coding is a form of writing, and you learn to write by writing.

Personally, I think a poetry course is pretty useful for writing good code, since some of the issues that you run into in writing elegant C++ are the same issues that you run into when you write English poetry.
 

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving
Top