Bad programming skills = biggest hurdle to astronomy research

  1. Simfish

    Simfish 825
    Gold Member

  2. jcsd
  3. Nabeshin

    Nabeshin 2,202
    Science Advisor

    Seems very conceivable to me.

    If you look at the amount of data we're currently producing, it's, to use exactly the right word, astronomical. A lot of it hasn't even been sifted through in the correct manner to yield some likely interesting science results. For example, you have a lot of interest right now in detecting extrasolar planets. But the data from this type of analysis, especially on the scale of hundreds of thousands of stars like Kepler, can be immensely useful to discover some stellar physics in its own right. It's just, to sort through it all is such a massive task (no doubt relegated to computers) that programming it all in is undoubtedly a major challenge.

    And it's only likely to get worse, as next generation telescopes like the LSST are going to produce even larger amounts of data. In the case of LSST, ~20TB per night of observations. Figuring out how to store, reference, and analyze that is all a programming task of large magnitude.
  4. D H

    Staff: Mentor

    Dead on. From my experience, most scientists and engineers make for incredibly bad programmers. My opinion: The only reason computer science majors are not used to write our scientific /engineering programs is that most computer scientists fare even worse at doing science and engineering than we do at doing programming.

    It isn't that hard to program well. We know what a good engineering design or a solid scientific theory looks like. We can learn what a well-constructed program looks like. It does take some training, however. It is a bit arrogant on our part to think that training is not required.
  5. It's also a bit arrogant to think that it's not hard to program well. :wink:

    I'd also comment on the computer scientist's inability to do science but I won't push those buttons. Computer scientists aren't inherently better programmers than scientists of any other kind. Here too programming is vaguely taught and encouraged but is not an inherent part of the course.
  6. turbo

    turbo 7,063
    Gold Member

    It might be a good time to reflect on the interaction. Astronomers need to tell the programmers/analysts what they are attempting to tease out of the mountain of data, and they need to explain what they think the tell-tale signs in the data might look like (variations in total flux, variations in peak wavelength, and so on). Programmers need to come up with algorithms that can sift through the data efficiently, and they need to communicate with the astronomers when their output isn't clean or as expected, so that they can get more guidance and modify their search.

    It's not rocket-science. Observational astronomy is not hands-on. Your "subjects" are far away, in physical space and in time. You have a suite of instruments to make observations, and you have (often) a mountain of data (often with a high noise:signal ratio) from which to glean some information that may or may not support your preconceptions. It is short-sighted to lay research hurdles on "bad programming", IMO. "Bad communication" is more likely.
  7. If they already have the programs that "work" why not just give them to a programmer and have them make it work well. That way the program does exactly what is intended and all the problems and slow running time can be removed. And who knows it might allow for a much broader or precise search. Much of these differences can be night and day.
  8. Because bad programs cannot be fixed, they must be written as if the original had never existed. The difference between a program that works and one that works well is vastly greater than that between a program that works and one that doesn't work at all.
  9. It sounds like part of the problem was that they couldn't explain to the programmers exactly what they wanted so programmers couldn't do what the already written programs could do.

    This is why you give the programs to the programmers and have them see what the program actually does. They can then make a program that gives the same results, but works much better and has more features.
  10. D H

    Staff: Mentor

    What programmers?

    The people who write the astronomical codes discussed in the article are predominantly astronomy grad students. An astronomy department would have to cut two of those grad students to hire one programmer, and for that paltry sum they just might be able to hire a freshout with a BS in IT who graduated well into the bottom half of the class.
  11. That sounds awesome in theory, but works out rather badly when you actually have to figure out somebody's scientific computing code full of all sorts of crazy math, almost no comments, and lots of hacks to keep the code from crashing. I spend a good chunk of time using and rewriting a labmate's code to make it robust enough for my purposes and it's like pulling teeth to get an explanation of the code that makes any sense to me.

    I'll chime in that by no means is this limited to astronomy research. I'm in applied CS-the one field where you'd expect to see halfway decent code-and I still see all the same problems 'cause many people assume that they're writing the code as a one off to do some number crunching and therefore don't think about maintainability at all. Actually, I think the biggest hindrance to good code probably is that very few researchers have the luxery of taking a week (or a few weeks) to properly write, test, document, and refactor their code.

    In theory code cleanup would be a great task to farm out to undergrads, but the math involved in the programming makes it totally unfeasible a lot of the time.

    One of the fun things about working with very large datasets is that I'm usually the only person in the room who cares about the space complexity as much (if not more than) the time complexity of any of the algorithms used to do the number crunching.
  12. Because if you can precisely explain exactly what equations need to be programmed, then you've already written the program.

    It's easier to teach an astrophysicist how to program well, than it is to teach a programmer astrophysics. While there are astrophysicists that are awful programmers, there are astrophysicists that can program extreme well.
  13. Chronos

    Chronos 10,348
    Science Advisor
    Gold Member

    I trust an astrophysict's 'plodding' algorithms more than I would ever trust a programmer's ability to figure out what it is they are trying to calculate. Yes, the astrophysicist will not write programs as efficiently as an IT major, but, they still work. I see, however, no reason not to run the program by an IT guy to ensure it is doing what they intend it to do.
    Last edited: Jan 7, 2011
  14. D H

    Staff: Mentor

  15. Chronos

    Chronos 10,348
    Science Advisor
    Gold Member

    It is not hard to write effective code, merely to write efficient code. This was an issue 30 years ago when memory was expensive. This is no longer true. You can now write horribly inefficienct code and no one cares - aside from waiting for it to process.
  16. Simfish

    Simfish 825
    Gold Member

    Haha so true. But what about code for supercomputers? (code that might take several days to process?) Or code that, say, requires 8 GM of RAM to process? (seriously, I once had to run code that required 8 GB of RAM for certain parameters).
  17. Ideally the sciences should be re-structured as a business where there is an IT department that they can work with. The scientists then become the analysts and testers of the code leaving the actual programming to those who know exactly what they are doing.

    In my 'field' (minor planets), the most 'efficient' software is generated by amateurs (amateur astronomers) who are expert programmers (do programming for a living) working side by side with the professional astronomers in the field. Yes, the Pros still have their own software but the amateurs software is generally all encompassing, user friendly and produces reliable results much quicker for those who don't have to be experts in the field. The Professionals software came first of course, but the amateurs took it and built it 'better' (better is of course relative to who the user is and the results we get from it)


  18. D H

    Staff: Mentor

    I have a number of problems with the above. I am having a very hard time parsing your first sentence. For one thing, you are using two words, effective and efficient, that are synonyms / near synonyms of one another. For another, that parenthetical remark is a bit hard to parse. I think you are saying "It is not hard to write effective code. What is hard is writing efficient code." If that is the correct interpretation, I take exception to it.

    Moreover, people still do care about performance. Comparisons to what computers could do thirty years ago is a bit misleading. We are now doing things with computers that we simply could not do thirty years ago. A poorly designed, poorly implemented system means I cannot do some kinds of analyses (e.g., a statistically valid Monte Carlo simulation) that I could do were the system designed and implemented better. Instead I am limited to doing a poor man's Monte Carlo because of that poor design.

    I'll define "effectiveness" as "doing the job, all of it, correctly" and efficiency as "minimizing use of some particular resource". With this definition, efficiency is but one part of effectiveness.

    Performance (efficiency) can paradoxically be both under- and over-emphasized in scientific software. From my experience, scientists and engineers tend to underemphasize performance concerns during system design and overemphasize it late in the game (coding and maintenance). Properly worrying about performance during design can eliminate a lot of problems further on down the road. Where will the performance demons lie? Will the system be used in ways that require us to pay extra attention to resources or outfit the program so as to circumvent resource issues? First worrying about performance during the coding stage leads to implementing the system in a language such as Python or Matlab that is far too slow for the intended use and in programmers who optimize the 99.9% of the code that consumes 0.1% of the CPU time but miss the boat on the 0.1% of the code that consumes 99.9% of the CPU time. First worrying about performance after the system is built leads to even worse nightmares.

    A big part of effectiveness is making a system that is understandable, testable, and maintainable. The most efficient code is often difficult to understand, very hard to test, and even harder to maintain. Efficiency goes against almost every other measure of software quality.
  19. This isn't as either-or as it is seeming in this thread.

    There are engineers who specialize in writing scientific code. They are fully capable of taking the differential equation (or whatever) and programming the discretized solution. The better ones can do it in any type of hardware. They are fully capable of debugging the physics on their own as long as the physicists supply the test problem and expected answer.

    The terabytes/day of data problem is totally different. For this you must use computer scientists. It's just not what engineers or physicists do.
  20. D H

    Staff: Mentor

    A few are saying it's either-or. Those are the ones who are saying that programming should be turned over to the IT department. my opinion: Yech.

    What I've been saying is that scientists and engineers can learn to program well. It's just not something that most can pick up on their own. Some training is needed. Colleges require students of science and engineering to take a minimum of two or three calculus classes, and often quite a bit more math beyond that. Very few require students of science and engineering to take anything beyond an introductory computer programming class. A lot don't require *any* classes in computer science.
  21. And personally I think that's a good thing, since CS classes are often terrible for teaching application programming. What's not surprising is the number of astronomy Ph.D.'s that are terrible programmers, what is more surprising is the number of CS Ph.D.'s that are terrible programmers.

    And it really shouldn't be surprising once you think about it. Just because you are a professor of English literature doesn't mean that you can write good short stories.
Know someone interested in this topic? Share this thead via email, Google+, Twitter, or Facebook

Have something to add?