Documentation of (only) source code

  • Thread starter Thread starter elias001
  • Start date Start date
AI Thread Summary
The discussion centers on the effectiveness of using pseudocode as a form of documentation in source code. While some argue that including pseudocode can clarify the intent of the code, others emphasize that well-structured, readable code should minimize the need for extensive comments. The balance between too many comments and insufficient documentation is highlighted, with a preference for comments that explain the rationale behind complex code rather than restating obvious functionality. Additionally, the importance of documenting inputs, outputs, and variable purposes is noted as essential for future maintainability. Ultimately, good documentation practices are seen as a skill that enhances code clarity and usability.
elias001
Messages
388
Reaction score
26
I have a question for everyone who writes code for a living, and knows that someone else will be reading your code long after you are gone from your place of work to long after you leave this earth.

Whatever language syntax you put down for your colleagues and for prosperity, are you allow to do the following: Before you start your first line your code, you insert as comment in pseudocode whatever it is you are about to write. I mean include the entire pseudocode.

I know there are books titled: Code Craft the art of writing excellent code,

The Art of readable code,

Code Reading the open source perspective.

One of the things about programming has to do with inserting inline comments. But doesn't inserting the entire pseudocode based solves this documenting problem of making sure whoever reads your code will for sure knows what your code does??
 
Technology news on Phys.org
Some projects require a change log in the first comment so you document what you changed.

Other projects don’t as the source mgmt tool does a better job.

I usually put my name but sometimes I don't.

—-

One issue that pops up is when you've changed groups, and someone later comes for help with the code you wrote, and now it becomes an albatross.

Another issue is when a programmer really hates the latest version of the code and verbally blames the original author to the team or mgmt, resulting in you getting a ding in your performance review.

Projects can be dog-eat-dog during the dev cycle, and often the test team gets the brunt of it, either for testing too much or not enough.

Too much could make a developer or the whole project miss a deadline, and too little could cause issues when the code is deployed and problem reports from customers are building up.

I know, as a test lead, I've lived through this trauma.
 
Last edited:
@jedishrfu i am sure at any programming course in college, there is always this thing about documenting or make comments in your code. I once asked the question on one of the stackexchange forum why it is necessary to document your code since don't coders just be able to read code the way a mathematicans can read math proofs. Suffice to day I poked a hornet's nest.

Anyways, i was also told there used to be a webpage that shows all the bad things that can happen if you don't document your code well or not at all. So since the point of documentation is to let others know what your code is all about, then isn't pseudocode the best way to go about it?
 
I don't recall ever having that need in any high-level language, but sometimes when a reference implementation or an algorithmic break-down exists as part of the requirement it may make good sense to refer to the relevant parts of that for sections of the code. The closest to pseudo-code I recall getting is to write up state invariants or assumptions, but when done right these can also be made executable as asserts or similar and would thus strictly speaking not be comments. In general, my actual comment tend to refer to intent and reasons for the following section of code that are not directly observable from the code itself, its immediate environment, or which does not follow the general patterns and code style that the reader is supposed to know for the system/application/component/sub-component in question. Easy, straight forward code should preferably have no comments. API's on the other hand should be well-documented.

Let me also add that getting the amount and level of comment just right is a tricky balance. I would must rather read well-structured undocumented code than complicated code that has been equipped with a tonne of comments with the implied risk of them adding confusion, contradictions. Over the years I have seen all sorts of wrong comments left behind in code (i.e. comments that imply a certain meaning that is not present in the code) making it much harder to convince yourself what the code actually does. Humans read the comment and skip understanding difficult code, while the compiler does the opposite. Guess who get the actual behavior right.

Also, the need for comments also ties into how well-tested a piece of code. With the proper code structure and sufficient unit tests to capture behavior there should be little need to provide detailed comments in most code.
 
@Filip Larsen if I just include the pseudocode in the source as documentation, that can not be grounds for employment termination?
 
elias001 said:
Whatever language syntax you put down for your colleagues and for prosperity, are you allow to do the following: Before you start your first line your code, you insert as comment in pseudocode whatever it is you are about to write. I mean include the entire pseudocode.
I learned decades ago that I sometimes have difficulty getting back into following my own code after a year to two so I ALWAYS start by writing high-level comments / pseudo code for what it is I'm going to do and then I may give it more detail after writing the code. It has sped up my later revisions many times.
 
@phinds ah kk. thank you for clarifying.
 
Good question. Pseudocode can be as obscure as the real code, so that is not the right way to describe good documentation. There are tools, like Doxygen which are a great help in generating documentation as long as you format your comments appropriately. There are a great many examples of good documentation style.

Good documentation is a learned skill. You are doing well if you are already trying to learn how to do it.

PS. I used to do a lot of conversions from MATLAB scripts to other languages. I liked to keep each MATLAB line as a comment before the code that implemented it in the other language. The MATLAB functions are well documented by MathWorks, but some of the I/O variable names still needed to be defined.
 
  • Like
Likes Klystron and jedishrfu
elias001 said:
if I just include the pseudocode in the source as documentation, that can not be grounds for employment termination?
I am not sure if you say this because you think pseudo-code somehow magically can elucidate details that code in a modern language cannot.

If your code implements an algorithm with a well-documented pseudo-code description then it can be perfectly fine and even preferable to refer to that for background or overview (in formal terms that would correspond to a requirement specification reference, as mention earlier), but it doesn't excuse you from making good human understandable code in the first place.

If I review code where the author has made subpar code interspersed with pseudo-code and comment that are essential to read the actual code, then I would definitely ask for more readable actual code and later in the review process likely also ask for removal of any comments that simply duplicate obvious semantics of the now readable code.
 
  • Like
Likes FactChecker
  • #10
An aside:
I always considered it to be a red flag if a programmer said that he wrote "self-documenting code" that didn't need comments. "Self-documenting code"might be a good first step, but there needs to be more.
 
  • Agree
Likes Klystron and phinds
  • #11
My experience has been that the algorithm is usually not the main question when looking at code. I'm agreeing with @Filip Larsen that the pseudocode is really no clearer than the actual code. I think the most important things to document are:
(1) What are the inputs and what datatype are they.
(2) Same for the outputs.
(3) For variables, unless the name makes it obvious what they are, document what each variable is.
 
  • Like
Likes FactChecker
  • #12
@FactChecker and @Filip Larsen well I have an idea, for every 1000 lines of code in whatever you are writing, you will force whoever reading it to watch one hour long video with Elmo being the person explaining how your code works, and then Gonzo will tell you in German in the patronizing style of Nietzche why the person reading your code should never complain about anything you have written.
 
Last edited:
  • Haha
Likes sbrothy, FactChecker and Filip Larsen
  • #13
There are a lot of coding guidelines and principles floating around, some probably also contradicting each other or of varying relevance depending on the environment as a whole, but my preferences with commenting mirrors that of convention over configuration (where comments here take the role as configuration). Using a term from the aviation industry I could similarly say that I believe code as far as possible should follow the dark cockpit principle where any "required comments" take on the role as a flashing red light.
 
  • #14
At IBM, code attribution was important. At the completion of a major project copyright statements would added as comments and sometimes as embedded strings.

Also code scans were done to identify non IBM code. In one case, the programmer had left an attribution to say what magazine or book he found the algorithm in. Lawyers needed to check if it was okay to use.

In my code, variable comments provided a brief description and units of measure. Something that is often overlooked and to future problems if the new programmer didn't follow the rules.

Windows SDK used Simonyi naming conventions because it was hard before IDE tools to know the datatype of a C++ variable.

Integers and numbers
  • i → integer (loop index, counter)
  • n → number (generic integer, count)
  • l → long integer
  • s → short integer
Booleans and flags
  • f → flag (true/false condition)
Characters and strings
  • ch → single character
  • sz → zero-terminated string (“string, zero-terminated”)
Pointers and handles
  • p → pointer (e.g., pfoo)
  • lp → long pointer (far pointer in early MS C compilers)
  • h → handle (opaque reference to a resource)
Floating point
  • x → x-coordinate (Simonyi often used as semantic example)
  • y → y-coordinate
  • r → real number (floating point)

And windows extended this Core with many more prefixes.
 
  • #15
I've found that when I'm documenting my code, one of the main people I'm writing the documentation for is me. So that when I come back years later and ask myself, "What the hell was I doing?" I get some sort of answer.
 
  • Agree
  • Haha
Likes Klystron, jedishrfu and phinds
  • #16
I only write very low-level and compact code, so I leave myself notes addressed like "We need to ...", because I will have forgotten which hump-backed bridges have land mines under them, and I am preparing the way for the next programmer. Warnings are good. It is too easy to forget why, if something is changed, the code will not work as required.

"We" is also me negotiating, reasoning, and proving to myself, that I understand what I am writing. "Here be Sea Monsters". As an example, 320 lines of program and comments, compiles to 37 machine instructions, which makes the kernel of an RTOS on a microcontroller.
 
  • Like
  • Haha
Likes nsaspook, jedishrfu and FactChecker
  • #17
phyzguy said:
I've found that when I'm documenting my code, one of the main people I'm writing the documentation for is me. So that when I come back years later and ask myself, "What the hell was I doing?" I get some sort of answer.
I have had occasions where I was looking at someone's old code, trying to figure it out so that I could modify it, drawing all sorts of conclusions about the SOB who wrote it, and then seeing my name as the author.
 
  • Haha
  • Like
Likes Klystron, sbrothy, nsaspook and 4 others
  • #18
A similar thing happens to me here at PF. After a couple of years, I get a reply notification to a watched thread, and so read back up the thread. Knowing nothing about the subject, while being interested in everything, I come across a three-paragraph summary as post #2. It is fascinating. When I get to the top of the post, I find my caricature, with a goose leading me by the nose.
 
  • Haha
Likes FactChecker and jedishrfu
  • #19
elias001 said:
I once asked the question on one of the stackexchange forum why it is necessary to document your code since don't coders just be able to read code the way a mathematicans can read math proofs. Suffice to day I poked a hornet's nest.
The short answer is "no." I don't know what your coding experience is, but unless one is writing relatively small programs, it can be difficult for someone else to understand what code does merely by reading through it.

elias001 said:
So since the point of documentation is to let others know what your code is all about, then isn't pseudocode the best way to go about it?
No. Better is a brief, high level explanation in human-readable language of what a section of code (typically a function or routine) does. In addition, the points below made by @phyzguy are important.

phyzguy said:
My experience has been that the algorithm is usually not the main question when looking at code. I'm agreeing with @Filip Larsen that the pseudocode is really no clearer than the actual code. I think the most important things to document are:
(1) What are the inputs and what datatype are they.
(2) Same for the outputs.
(3) For variables, unless the name makes it obvious what they are, document what each variable is.
For function inputs (parameters), the range of valid inputs is also useful and helpful information. Same goes for variables.

jedishrfu said:
Windows SDK used Simonyi naming conventions
Usually called Hungarian notation, as Charles Simonyi was Hungarian. To the best of my knowledge, Hungarian notation fell out of favor or at least, was not the panacea that people at the time thought it was. Steve McConnell in "Code Complete," lists several disadvantages of Hungarian naming convention. One is that it encouraged lazy, uninformative variable names, such as hwnd -- a handle for a window, but the name does not convey any information about that window. Another is that the name combines data meaning with data representation. If the type of a variable changes from integer to boolean, say, the programmer would need to rename the variable everywhere it appears.
 
  • Like
Likes jedishrfu, FactChecker and Filip Larsen
  • #20
Thinking about the usefulness of mixing pseudo-code with actual code somehow takes me back to the university in the 80's where we had Dines Bjørner (main author of the math-like Meta-IV specification language) try to convince us budding software engineers that formal specification methods, like VDM, would be the way of the future. Yet despite having done software ever since in segments to which this method was meant to improve, I have never seen one line of Meta-IV since.

I vaguely recall us student feeling that even if we had a fair grasp of the Meta-IV domain and a fair grasp at the language domain, we would still struggled mapping a Meta-IV specification to a "verified" implementation just about as much as if the spec had been written in concise natural language. For me, now, I think the concept of proper unit tests (such as they are used in test driven development) much better allows specification and verification of code at the lower level simply because (as a the main difference) its an executable specification, something both pseudo-code and languages like Meta-IV explicitely are not. (Note, I am not here saying that pseudo-code for an implementation is equivalent or can be replaced by unit test, just that, as mentioned earlier, even if a given piece of psedo-code makes perfect and unambigeous sense it may still not by itself help code reviewers to conclusively descide whether or not a particular complicated piece of code associated with the pseudo-code work as intended).
 
  • Like
Likes phyzguy and FactChecker
  • #21
Filip Larsen said:
the concept of proper unit tests (such as they are used in test driven development) much better allows specification and verification of code at the lower level simply because (as a the main difference) its an executable specification, something both pseudo-code and languages like Meta-IV explicitely are not.
I once had the job of integrating and testing code inputs from several engineers into flight critical code. I remember one time when a person's code DID NOT EVEN COMPILE. I think that was the day my blood pressure problems started. ;-)
 
  • Haha
Likes Filip Larsen
  • #22
Filip Larsen said:
For me, now, I think the concept of proper unit tests (such as they are used in test driven development) much better allows specification and verification of code at the lower level simply because (as a the main difference) its an executable specification, something both pseudo-code and languages like Meta-IV explicitely are not. (Note, I am not here saying that pseudo-code for an implementation is equivalent or can be replaced by unit test, just that, as mentioned earlier, even if a given piece of psedo-code makes perfect and unambigeous sense it may still not by itself help code reviewers to conclusively descide whether or not a particular complicated piece of code associated with the pseudo-code work as intended).
I strongly agree with the importance of unit tests. But there is more to it than that. I have had cases where the code writer said, "It passed the test suite, so it's good," and never tested the code on the actual application. It's important to run unit tests AND test the code as it is intended to be used.
 
  • Like
Likes Filip Larsen
  • #23
@jedishrfu and @FactChecker can i ask you two a few more questions, pseudocode is consider human readable code? I get the feeling for source code commenting and documentation, pseudocode is in between actual language syntax and english or whatever passes for human communicable languages.



Also source code documentation applies also to writing in assembly or any kind of Hardware description languages? The convention for how to do proper documentation for assembly to any HDL are the same as for software programming.



Also the term concurrent programming is the same as parallel programming regardless if it is for hardware or software.
 
  • #24
FactChecker said:
I once had the job of integrating and testing code inputs from several engineers into flight critical code. I remember one time when a person's code DID NOT EVEN COMPILE. I think that was the day my blood pressure problems started. ;-)
We had a rule that THOU SHALT NOT COMMIT ANYTHING THAT BREAKS THE BUILD.

The punishment was to be handed a scraper and have a go at the boss' office glass door to remove the obsolete self-adhesive privacy film coating.
 
  • Haha
Likes FactChecker
  • #25
elias001 said:
@jedishrfu and @FactChecker can i ask you two a few more questions, pseudocode is consider human readable code?
IMO, pseudocode does not necessarily serve the purpose of good documentation.
I don't know if there is an official, enforced, definition. I think of it as being human readable, without considering a computer language syntax, for the purpose of being able to write code. With that interpretation, something like this might be acceptable pseudocode:
Sort the array xxx into array zzz using the order given by function aaa
But in terms of documentation, you might want to know what the contents of xxx, yyy, and aaa are. This could be in a utility function where the specific contents are unknown or part of code where the arrays have very specific contents.
 
  • #26
Another point against pseudo code is updates.

Often programmers fix a defect and comment about what was wrong and how they fixed it, but they don't go back and adjust the pseudo code; they just don't.
 
  • Like
Likes FactChecker
  • #27
elias001 said:
Also the term concurrent programming is the same as parallel programming regardless if it is for hardware or software.
The two terms are very close to be used interchangeably, but I would say there is a slight tendency that concurrent programming covers the theory and practical mechanisms to allow things to happen in parallel (e.g. allow safe access to a shared resource, coordinating concurrent tasks/processes) whereas parallel programming is more about the theory and practical mechanism for ensuring work actually executes in parallel (e.g. algorithmic parallelization, multiprocessing, scheduling). In that sense parallel programming probably has more ties with actual hardware.
 
  • Like
Likes FactChecker
  • #28
jedishrfu said:
Another point against pseudo code is updates.

Often programmers fix a defect and comment about what was wrong and how they fixed it, but they don't go back and adjust the pseudo code; they just don't.
I was going to bring that up. Glad someone did.

Now you've got lines and lines of pseudocode that does not match what the code is doing. Better no comments than wrong comments.

You've got to find a balance. Most devs find that balance in just a couple of lines of comments explaining what the code is doing, not so much how it's doing it.
 
  • #29
jedishrfu said:
Another point against pseudo code is updates.

Often programmers fix a defect and comment about what was wrong and how they fixed it, but they don't go back and adjust the pseudo code; they just don't.
This isn't a point just against pseudocode; it's a general point against comments in the code that don't get updated when a bug is found and fixed.
 
  • Like
  • Agree
Likes FactChecker, Filip Larsen and jedishrfu
  • #30
Mark44 said:
This isn't a point just against pseudocode; it's a general point against comments in the code that don't get updated when a bug is found and fixed.
  1. If the comments are sufficently succinct and well thought out, they will need updating less, if ever. (A function will generally not change so much that a one or two line properly-written comment will need to be changed.)
  2. It is faster and easier, and therefore (at least somewhat) more likely that short comments will get updated as opposed to having to re-write a block of pseudocode.
The art of commenting is in finding that balance of explaining the code just enough, but no more.
 
  • #31
Wait, pseudocode is consider as legitimate form of comment. It is taught and discussed in everybody on algorithms, data structures and discrete mathematics. After more comments, I am getting the impression that between english language comments vs pseudocode, pseudocode is not preferred.
 
  • #32
Things discussed in your courses don't always reflect the reality of professional programmers and the stress they are under to meet schedules, deadlines, and fix test team defects...
 
  • #33
DaveC426913 said:
  1. If the comments are sufficently succinct and well thought out, they will need updating less, if ever. (A function will generally not change so much that a one or two line properly-written comment will need to be changed.)
  2. It is faster and easier, and therefore (at least somewhat) more likely that short comments will get updated as opposed to having to re-write a block of pseudocode.
Regarding #1, the function's basic purpose likely won't change much, but there is more that should be stated in comments than a brief description of what the function's purpose is. For example, it's very possible that the certain parameter values can cause unwanted behavior, and this should be documented, including which values are valid or not allowed. Also, the possible return values should be commented

Regarding #2, I agree, with the stipulation that comments should not be too short, for the reasons stated above.
elias001 said:
Wait, pseudocode is consider as legitimate form of comment. It is taught and discussed in everybody on algorithms, data structures and discrete mathematics.
But none of these are applicable to real-world applications programming or systems programming. IMO, the proper place for pseudocode is as a starting point for implementing the final code.
 
  • Agree
  • Like
Likes AlexB23, FactChecker and Filip Larsen
  • #34
elias001 said:
Wait, pseudocode is consider as legitimate form of comment. It is taught and discussed in everybody on algorithms, data structures and discrete mathematics. After more comments, I am getting the impression that between english language comments vs pseudocode, pseudocode is not preferred.
For implementation comments that only or mostly consisting of pseudo-code to make sense I would think that 1) the original pseudo-code algorithm is documented in full at the top of the module (or in the spec), 2) each psedo-code comment used later refer exactly to a line of this code, either by explicit line numbers or an exact copy of the pseudo-code line the follow actual code corresponds to. So in this case the pseudo-code comment act as a kind of index to what part of the the algorithm the code corresponds to.

What I would expect not to work well in general code is to explain every one or two lines of actual code with a pseudo-code "summary", mainly because the pseudo-code don't really convey intent very well so in effect you would explain only the desired effect of the code by a slightly more abstract (but possibly ambiguous) notation.

This doesn't mean that a nice pseudo-code comment here and there can be helpful in combination with "normal" comments describing assumptions and intent, but I would be very skeptical if only pseudo-code is used; to repeat my earlier position: if the code is so unreadable that someone feel a peudo-code comment helps, then make the code more readable and easy to understand instead; if the code already is readable with straight forward semantics then why have a pseudo-code comment at all (unless, as mentioned, it is just a reference to where in an algorithm we are).
 
  • #35
All of this depends on the software development environment that you are working in. If strict configuration management principles are followed, any code change will have associated change requests, change documentation, change reviews, etc. Changing the code without changing the comments to match would not be acceptable.
 
  • #36
elias001 said:
Wait, pseudocode is consider as legitimate form of comment.
And so it is.

But now we're discussing the finer points of its efficacy and pitfalls.

Legitimate doesn't necessarily mean 'best', or 'best in all circumstances' or 'practical in-the-wild'.
 
  • #37
elias001 said:
Wait, pseudocode is consider as legitimate form of comment. It is taught and discussed in everybody on algorithms, data structures and discrete mathematics. After more comments, I am getting the impression that between english language comments vs pseudocode, pseudocode is not preferred.
IMO, there is no official definition of where "pseudocode" ends and good comments begin. Books and academic discussions about algorithms may never go farther than the pseudocode, but that does not necessarily make it adequate for commenting final code.
I think of pseudocode as a good first step in writing code by describing the algorithm that code will use. And it might as well be left as comments. But I have not seen it described as adequate comments for the final result. Some might be, but don't count on it.
 
  • #38
@@FactChecker The reason I so concentrare on pseudocode also has to do with the introduction to Algorithms text by Thomas Cormen, et al. When CS was getting popular to super popular in the 2000s, I heard that if one wants to work in tech at those big tech companies in silicon valley, one can expect questions literally from that textbook or questions referencing that textbook in the technical portion of the job interview. in that book, pseudocode is used so that the algorithms' description are programming language independent.

Like I said in previous posts, commenting within source code was mentioned as a requirement in programming courses, but it was never really shown how is properly done or if there is a proper format that needs to be adhere to. Basically, studebts are left to figure it out for themseles. So I thought well there is a well defined convention for how pseudocode is written. Can't students just use that instead.
 
  • #39
elias001 said:
@@FactChecker The reason I so concentrare on pseudocode also has to do with the introduction to Algorithms text by Thomas Cormen, et al. When CS was getting popular to super popular in the 2000s, I heard that if one wants to work in tech at those big tech companies in silicon valley, one can expect questions literally from that textbook or questions referencing that textbook in the technical portion of the job interview.
I am not familiar with that book. There might be a more modern one now. I wouldn't expect exact questions from any text, but I haven't been job-hunting since Biblical times. ;-)
elias001 said:
in that book, pseudocode is used so that the algorithms' description are programming language independent.
That sounds OK. It describes the algorithm in a language agnostic way. But the details of implementation in a particular language and/or application is likely to require more details that should be in the comments.
elias001 said:
Like I said in previous posts, commenting within source code was mentioned as a requirement in programming courses, but it was never really shown how is properly done or if there is a proper format that needs to be adhere to. Basically, studebts are left to figure it out for themseles. So I thought well there is a well defined convention for how pseudocode is written. Can't students just use that instead.
I wouldn't expect any uniform requirements in the general public. If you work for a military contractor, there may be military standards. In any case, you should expect any large multi-programmer project to have some (possibly unique) project standards for satisfactory documentation.
 
Last edited:
  • #40
@FactChecker @Filip Larsen @jedishrfu the software engineering stackexchange post i previously referred to is here. More than a few of the initial responses there, I felt like I tossed a small bush fire into a yellow jacket hornets nest. Then it just hit me, wait why don't I use pseudocode since everyone who learned programming in college have came across pseudocode. Also the algorithms book i referred to is Introduction to Algorithms by Thomas H Cormen, Charles E Leiserson and Ronald L Rivest.
 
Back
Top