Register to reply

Bad programming skills = biggest hurdle to astronomy research

by Simfish
Tags: inquilinekea
Share this thread:
D H
#19
Jan16-11, 12:27 PM
Mentor
P: 15,166
A few are saying it's either-or. Those are the ones who are saying that programming should be turned over to the IT department. my opinion: Yech.

What I've been saying is that scientists and engineers can learn to program well. It's just not something that most can pick up on their own. Some training is needed. Colleges require students of science and engineering to take a minimum of two or three calculus classes, and often quite a bit more math beyond that. Very few require students of science and engineering to take anything beyond an introductory computer programming class. A lot don't require *any* classes in computer science.
twofish-quant
#20
Jan17-11, 10:05 PM
P: 6,863
Quote Quote by D H View Post
Very few require students of science and engineering to take anything beyond an introductory computer programming class. A lot don't require *any* classes in computer science.
And personally I think that's a good thing, since CS classes are often terrible for teaching application programming. What's not surprising is the number of astronomy Ph.D.'s that are terrible programmers, what is more surprising is the number of CS Ph.D.'s that are terrible programmers.

And it really shouldn't be surprising once you think about it. Just because you are a professor of English literature doesn't mean that you can write good short stories.
twofish-quant
#21
Jan17-11, 10:07 PM
P: 6,863
Quote Quote by Chronos View Post
I trust an astrophysict's 'plodding' algorithms more than I would ever trust a programmer's ability to figure out what it is they are trying to calculate. Yes, the astrophysicist will not write programs as efficiently as an IT major, but, they still work.
And it's likely to be much faster. Working on high-performance computing is not part of the typical IT major's curriculum.
twofish-quant
#22
Jan17-11, 10:12 PM
P: 6,863
Quote Quote by Chronos View Post
It is not hard to write effective code, merely to write efficient code. This was an issue 30 years ago when memory was expensive.
It's actually quite hard and getting harder. The key to CPU programming is to keep everything on the L1 cache, which is quite limited and requires a lot of tricks. Then there is GPU and multi-core/multi-threaded programming which adds a different level of complexity.

This is no longer true. You can now write horribly inefficienct code and no one cares - aside from waiting for it to process.
People do care in astrophysics and finance. A simulation can take two weeks, and a factor of 2 speedup makes the difference between a calculation that you can't do and one you can. In finance, what options you can sell often limited by how much compute power that you have.
Simfish
#23
Jan17-11, 10:16 PM
PF Gold
Simfish's Avatar
P: 828
And personally I think that's a good thing, since CS classes are often terrible for teaching application programming. What's not surprising is the number of astronomy Ph.D.'s that are terrible programmers, what is more surprising is the number of CS Ph.D.'s that are terrible programmers.
What about applied math courses?

These courses, in particular:

AMATH 581 Scientific Computing (5)
Project-oriented computational approach to solving problems arising in the physical/engineering sciences, finance/economics, medical, social, and biological sciences. Problems requiring use of advanced MATLAB routines and toolboxes. Covers graphical techniques for data presentation and communication of scientific results.

AMATH 582 Computational Methods for Data Analysis (5)
Exploratory and objective data analysis methods applied to the physical, engineering, and biological sciences. Brief review of statistical methods and their computational implementation for studying time series analysis, spectral analysis, filtering methods, principal component analysis, orthogonal mode decomposition, and image processing and compression. Offered: W.

AMATH 583 High-Performance Scientific Computing (5)
Introduction to hardware, software, and programming for large-scale scientific computing. Overview of multicore, cluster, and supercomputer architectures; procedure and object oriented languages; parallel computing paradigms and languages; graphics and visualization of large data sets; validation and verification; and scientific software development. Offered: Sp.

AMATH 584 Applied Linear Algebra and Introductory Numerical Analysis (5)
Numerical methods for solving linear systems of equations, linear least squares problems, matrix eigen value problems, nonlinear systems of equations, interpolation, quadrature, and initial value ordinary differential equations. Offered: jointly with MATH 584; A.

AMATH 585 Numerical Analysis of Boundary Value Problems (5)
Numerical methods for steady-state differential equations. Two-point boundary value problems and elliptic equations. Iterative methods for sparse symmetric and non-symmetric linear systems: conjugate-gradients, preconditioners. Prerequisite: AMATH 581 or MATH 584 which may be taken concurrently. Offered: jointly with MATH 585; W.

AMATH 586 Numerical Analysis of Time Dependent Problems (5)
Numerical methods for time-dependent differential equations, including explicit and implicit methods for hyperbolic and parabolic equations. Stability, accuracy, and convergence theory. Spectral and pseudospectral methods. Prerequisite: AMATH 581 or AMATH 584. Offered: jointly with ATM S 581/MATH 586; Sp.
And what about these ones? If you know computer systems, could that make you better at CPU programming?

CSE 410 Computer Systems (3)
Structure and components of hardware and software systems. Machine organization, including central processor and input-output architectures; assembly language programming; operating systems, including process, storage, and file management. Intended for non-majors. No credit to students who have completed CSE 351, CSE 378, or CSE 451. Prerequisite: CSE 373.

CSE 417 Algorithms and Computational Complexity (3)
Design and analysis of algorithms and data structures. Efficient algorithms for manipulating graphs and strings. Fast Fourier Transform. Models of computation, including Turing machines. Time and space complexity. NP-complete problems and undecidable problems. Intended for non-majors. Prerequisite: CSE 373.

CSE 446 Machine Learning (3)
Methods for designing systems that learn from data and improve with experience. Supervised learning and predictive modeling: decision trees, rule induction, nearest neighbors, Bayesian methods, neural networks, support vector machines, and model ensembles. Unsupervised learning and clustering. Prerequisite: either CSE 326 or CSE 332; either STAT 390, STAT 391, or CSE 312.

CSE 415 Introduction to Artificial Intelligence (3) NW
Principles and programming techniques of artificial intelligence: LISP, symbol manipulation, knowledge representation, logical and probabilistic reasoning, learning, language understanding, vision, expert systems, and social issues. Intended for non-majors. Not open for credit to students who have completed CSE 473. Prerequisite: CSE 373.

CSE 373 Data Structures and Algorithms (3)
Fundamental algorithms and data structures for implementation. Techniques for solving problems by programming. Linked lists, stacks, queues, directed graphs. Trees: representations, traversals. Searching (hashing, binary search trees, multiway trees). Garbage collection, memory management. Internal and external sorting
twofish-quant
#24
Jan17-11, 10:20 PM
P: 6,863
Quote Quote by Antiphon View Post
There are engineers who specialize in writing scientific code. They are fully capable of taking the differential equation (or whatever) and programming the discretized solution. The better ones can do it in any type of hardware. They are fully capable of debugging the physics on their own as long as the physicists supply the test problem and expected answer.
If you take a PDE and give it to someone that doesn't understand PDE's, the code won't work. Also this type of work is something that physics Ph.D.'s get hired to do.

The terabytes/day of data problem is totally different. For this you must use computer scientists. It's just not what engineers or physicists do.
Some do. Astrophysical CFD simulations can and do general gigabytes of data per second, and if you work on one of those projects, you can get very quickly familiar with the nitty-gritty of data storage. People that work on geological systems routinely deal with multi tetrabyte databases. And then there are the bioinformatics people. Once you've sequenced the human genome, storing that information is non-trivial.

Also if you take the attitude "that's not my job" you aren't going to last very long as a physics student. If you start generating multi-gigabyte/second data, and you don't know the CS to deal with that, then learn it.
twofish-quant
#25
Jan17-11, 10:23 PM
P: 6,863
Quote Quote by Simfish View Post
What about applied math courses?
The big problem with those courses is that they generally don't give you experience in working on hundred-person project teams with millions of source lines of code. Coding is a form of writing, and you learn to write by writing.

Personally, I think a poetry course is pretty useful for writing good code, since some of the issues that you run into in writing elegant C++ are the same issues that you run into when you write English poetry.
chiro
#26
Jan17-11, 10:34 PM
P: 4,573
I'm going to side with two-fish here, especially with regard to working on the bigger projects with dozens and dozens of people.

Working on the bigger projects is where you get a lot of experience in a lot of things. Everything from large scale project design to optimization to effective integration of multiple code bases (think libraries or amalgamation of smaller repositories) is where people need to see the forest from the trees and have a depth that is a synonym for experience.

Also a lot of programming that is taught can be way too theoretical. If you're designing a GUI widget, you have to get your hands dirty and not be stuck in some analysis paralysis where you are overanalyzing the design, structure and so on.

Like twofish said with the writing, to get good at writing you have to write: you can only theorize so much before you have to physically do something to learn.
Chronos
#27
Jan18-11, 02:19 AM
Sci Advisor
PF Gold
Chronos's Avatar
P: 9,444
Waiting for data to process is not labor intensive. Waiting for physicists to gather good data is the labor intensive part. Data gathering algorithms are extremely important in this process. Researchers are not blind to this issue and grad students with excellent programming skills are not rare. This may have been issue 30 years ago, but, not now.
D H
#28
Jan18-11, 09:16 AM
Mentor
P: 15,166
Quote Quote by twofish-quant View Post
And personally I think that's a good thing, since CS classes are often terrible for teaching application programming.
You essentially are talking about the difference between software engineering and computer science. And yes, computer science classes past the introductory CS classes are for the most part terrible about teaching software engineering concepts. The introductory CS classes teach some very basic concepts common to computer science, software engineering, scientific programing, and even IT. Some schools require there students to take at least an introductory CS class, some don't (my undergrad school still does).

What's not surprising is the number of astronomy Ph.D.'s that are terrible programmers, what is more surprising is the number of CS Ph.D.'s that are terrible programmers.
Not all that surprising. I've learned the hard way that resumes from CS and IT grads with no education in engineering or the sciences are best filed circularly.


Quote Quote by twofish-quant View Post
The big problem with those courses is that they generally don't give you experience in working on hundred-person project teams with millions of source lines of code.
That problem does not pertain just to computer science classes. Failing to teach how to work collaboratively is in my opinion a shortcoming of a lot of science programs. Engineering curricula on the other hand offer lots of opportunities for undergrads to work in team projects, with a lot of the project lead activities performed by a grad student. Participation in such projects, and having some kind of lead role in particular, is something I look for in evaluating prospects.
twofish-quant
#29
Jan18-11, 09:16 PM
P: 6,863
Quote Quote by Simfish View Post
What about applied math courses?
The problem is not with course content, but course format. The way that courses are structured just doesn't lend itself to teach "real world" programming. Among the differences

1) real world problems are invariably team graded. Your "grade" depends a lot on how competent the person next to you is. So is it unfair that you get a bad "grade" because the person next to you is incompetent. It may be unfair, but it's real, and one skill is to figure out how to deal with that.

2) real world problems tend to be vague and ill-defined. Classes you get a well defined assignment. Much of the work in real work programming involves figuring out what you need to do. When you do get a set of marching orders, more often than not, those orders are either contradictory or flat out impossible, and dealing with that is part of real world programming.

3) class assignments are short and throw-away. in the real world, you have to work with a pre-existing system, and you never are in a situation in which you have to start from scratch. Also sometimes you have to deal with something that is badly written.

This means that you tend to have emotional reactions to good/bad code. In a class they teach you rules, but there is no emotional connection to through rules. In real world software development, you see bad code, and you react with horror since you know that you'll be spending the next three weeks going through ten thousand lines of code and fixing things.

You can get a lot more experience if you work on an open source project. Also I wasn't kidding when I said that it helps you if you take a course on writing poetry. Poetry classes are usually set up so that you write something and then you go to a room where everyone else in the class tells you how you can improve it. That's usually the dynamics of code reviews.
D H
#30
Jan18-11, 09:52 PM
Mentor
P: 15,166
Quote Quote by twofish-quant View Post
The problem is not with course content, but course format. The way that courses are structured just doesn't lend itself to teach "real world" programming.
You're looking at the wrong courses. The sciences really should take a look at engineering education. Cube sats, autonomous vehicles, robots, ... The students work as a part of a team on what is often a multi-year project. That might mean dropping a course on some advanced concept, but that's what grad school is for. Besides, the stereotype of a scientist being someone who works on his own with only a blackboard for company is for the most part fifty years out of date. Most scientists, like most engineers, work in large teams nowadays.

in the real world, you have to work with a pre-existing system, and you never are in a situation in which you have to start from scratch.
Have to start from scratch? Get to start from scratch is more like it. There is nothing like being able to start on something from scratch. No CMMI 3 stuff, very few constraints other than getting the job started. Some other saps have to deal with making that initial design real. Getting that kind of opportunity doesn't happen very often.
Also sometimes you have to deal with something that is badly written.
Or a poor design by the people who don't want to deal with all that CMMI 3 nonsense.
Deadstar
#31
Jan23-11, 08:13 AM
P: 106
This seems like a good thread to ask this question but what languages would you guys say are the 'best' for writing astrophysics tyoe simulations (i.e. 3+ body problem simulations).

I began with Maple before I had any intentions to program these types of problems and have moved onto Matlab as maple isn't the best for high end computing. I have a feeling C++ would perhaps be a good shout but since I have a course on Matlab this year I'll be mainly working with that. I also find that the more I work with languages like Matlab the more I seem to understand C++ code despite never having worked with it.

I read somewhere that you should look at programming less in terms of learning a language and more in terms of learning the basics and fundementals of programming such as OOP.
jsiples
#32
Jan23-11, 06:41 PM
P: 2
This is a website I found a while back, its a great comparison of programming languages and their speed.

http://shootout.alioth.debian.org/fa...g-language.php

I think you're right in choosing C++ as a language, as you can see, C is minimally faster, but slightly more annoying to code at times so C++ is a happy medium. Speaking of happy medium, I tend to fall into that category when it comes to this topic, my schooling (while not finished) was for Astro-Engineering, but I now hold a job as Linux Sysadmin and have done a rather extensive amount of programming. I think what someone mentioned before its like we have to learn all over again, we've gotten so used to fast computers and not caring about efficiency, but when you are dealing with this large of numbers, it is like going back 30 years with computing power. Unfortunately, I agree that it's a vicious cycle of astronomers not being programmers and vice versa, it makes it hard to program efficiently when you don't know what you're programming, and it takes years of practice to know how to program efficiently.

while(astronomer == programmer)
{
printf(&efficient_program);
}

EDIT: Coming from a background in a variety of languages, I agree with what you said about learning the basics of programming over the language, you'll learn quickly most languages are very similar and you can adapt to them quickly.
Simfish
#33
Jan23-11, 07:52 PM
PF Gold
Simfish's Avatar
P: 828
Wow, very interesting link! Why is FOTRAN Intel so slow? My professors always told me to use either C/C++ or FORTRAN since they were the fastest.

FORTRAN is nice for MPI/OpenMP integration too (if you want to parallelize things). I'm sure C also has MPI/OpenMP integration, but I'm not sure if it's as fluid.
D H
#34
Jan23-11, 08:24 PM
Mentor
P: 15,166
Quote Quote by jsiples View Post
This is a website I found a while back, its a great comparison of programming languages and their speed.
Eh. The question raised by Simfish immediately arises on seeing stuff like that:
Quote Quote by Simfish View Post
Wow, very interesting link! Why is FOTRAN Intel so slow? My professors always told me to use either C/C++ or FORTRAN since they were the fastest.
The answer is (at least) twofold:
  1. They are combining metrics from multiple programs, some of which are not Fortran's forte. Who cares how well Fortran performs in handling strings?
  2. The programs appear to be not so well-written. Some of the so-called benchmarks are so poorly written that they don't even compile. This is a failure of the benchmarking, not of the language.
The tests in which Fortran fares very poorly are in part a reflection of the limitations of Fortran (yes, Fortran isn't so good at handling strings), but also appear to be in part due to giving the programming assignment to someone not well-versed in Fortran.

Note well: I am not a Fortran advocate. Far from it; I gladly abandoned the language a couple of decades ago. One doesn't have to be a Fortran advocate to say that those benchmarks are more than a bit suspect. That said, Fortran does not always look so bad at debian.org. In the n-body problem Fortran is the winner: http://shootout.alioth.debian.org/u3...php?test=nbody.


Register to reply

Related Discussions
PhD and programming skills Academic Guidance 3
Astronomy Research? Career Guidance 1
Research Skills Academic Guidance 1
Necessary Programming Skills? Academic Guidance 8